Statistics

Experimental Design

Last updated: March 2026 · Intermediate
Before you start

You should be comfortable with:

Real-world applications
💊
Nursing

Medication dosages, IV drip rates, vital monitoring

The design of a study determines what conclusions you can draw from it. A study can have perfect data collection, flawless calculations, and a huge sample — and still fail to answer the question it set out to answer if the design is wrong. The most important design distinction is between observational studies and experiments, because only one of these can establish cause and effect.

Observational Studies vs Experiments

In an observational study, the researcher watches, records, and measures — but does not intervene. Subjects are observed in their natural conditions, and the researcher has no control over who is exposed to what.

In an experiment, the researcher actively imposes a treatment on subjects and compares the outcomes against a control group. The researcher controls the conditions.

This distinction is not just academic — it determines the strongest conclusion you are allowed to draw.

FeatureObservational StudyExperiment
Researcher’s roleObserves and recordsAssigns treatments
Treatment assignmentNaturally occurringControlled by researcher
Strongest conclusionAssociation (correlation)Causation
ConfoundingDifficult to eliminateControlled via randomization
ExampleStudying whether coffee drinkers live longerRandomly assigning people to drink coffee or not

Example 1: Coffee and Longevity

Researchers track 50,000 adults for 20 years and find that those who drink 3 or more cups of coffee per day live an average of 2 years longer than non-drinkers.

Can we conclude that coffee causes longer life? No. This is an observational study. Coffee drinkers may differ from non-drinkers in many ways: income level, exercise habits, diet, access to healthcare, stress levels. Any of these could explain the longevity difference.

To establish causation, you would need an experiment: randomly assign thousands of people to drink coffee or abstain for decades and compare outcomes. This is ethically and practically difficult, which is why many important health questions rely on observational evidence — and why the conclusions are always stated as associations, not causes.

Key Elements of a Well-Designed Experiment

A well-designed experiment has five essential components: a control group, a treatment group, random assignment, replication, and blinding.

Control Group

The control group receives no treatment, a standard treatment, or a placebo. Its purpose is to provide a baseline — something to compare the treatment group against.

Without a control group, you cannot know whether the outcomes in the treatment group are actually caused by the treatment. Patients often improve on their own (natural recovery), and the passage of time changes many measurements. The control group accounts for all of these factors.

Treatment Group

The treatment group receives the intervention being tested — a new drug, a new teaching method, a new manufacturing process, or whatever the experiment is designed to evaluate.

The key is that the treatment and control groups should be as similar as possible in every respect except the treatment itself. Any difference in outcomes can then be attributed to the treatment rather than to pre-existing differences between the groups.

Random Assignment

Random assignment means using a chance process — like a random number generator — to decide which subjects go into the treatment group and which go into the control group.

This is the single most important element of experimental design. Random assignment does not guarantee that the groups are identical, but it ensures that any differences between them are due to chance, not to a systematic pattern. With enough subjects, random assignment distributes both known and unknown confounding variables approximately evenly across the groups.

Note the distinction between random sampling (how you select subjects from the population) and random assignment (how you place selected subjects into groups). A study can use one, both, or neither. The strongest designs use both.

Replication

Replication means having enough subjects in each group to detect real effects and to reduce the influence of individual variation. A study with 5 subjects per group is unlikely to produce reliable results because a single unusual outcome can dominate the averages.

As a rule of thumb, larger groups produce more precise estimates. The required number of subjects depends on the expected size of the treatment effect and the variability in the population. This calculation is called a power analysis, and it is performed before the experiment begins.

Blinding

Blinding prevents expectations from influencing results.

  • Single-blind: The subjects do not know whether they are in the treatment or control group.
  • Double-blind: Neither the subjects nor the researchers who interact with them know who is in which group.

Blinding addresses both the placebo effect (subjects improving because they believe they are being treated) and researcher bias (researchers unconsciously recording better results for the treatment group).

Confounding Variables

A confounding variable (or confounder) is a variable that is related to both the explanatory variable and the response variable, creating a false or misleading association between them.

If you observe a correlation between XX and YY, a confounder ZZ could be the real reason:

XZYX \leftarrow Z \rightarrow Y

In this diagram, ZZ causes both XX and YY, creating the illusion that XX causes YY.

Example 2: Ice Cream and Drowning

Data shows that ice cream sales and drowning deaths are positively correlated — when ice cream sales rise, so do drowning deaths.

Does ice cream cause drowning? Obviously not. The confounding variable is hot weather. When temperatures rise, people buy more ice cream and go swimming more often. Swimming more leads to more drowning incidents.

Ice cream salesHot weatherDrowning deaths\text{Ice cream sales} \leftarrow \text{Hot weather} \rightarrow \text{Drowning deaths}

Remove the confounder (control for temperature), and the ice cream–drowning correlation disappears.

Example 3: Shoe Size and Reading Level

Among children ages 5 to 15, shoe size correlates strongly with reading ability. Children with bigger feet read better.

The confounder is age. Older children have larger feet and more years of reading instruction. Shoe size does not cause better reading — age drives both variables.

Shoe sizeAgeReading level\text{Shoe size} \leftarrow \text{Age} \rightarrow \text{Reading level}

How Randomization Eliminates Confounding

In an experiment, random assignment distributes confounders approximately evenly between the treatment and control groups — even confounders you do not know about.

If you randomly assign 200 patients to receive a new drug or a placebo, the treatment and control groups will be approximately similar in age, weight, severity of illness, genetics, lifestyle, and every other variable. Any remaining differences are due to chance, and statistical methods account for chance variation.

This is why experiments can establish causation while observational studies cannot: randomization breaks the link between confounders and treatment assignment.

Correlation Does Not Imply Causation

This is the single most important principle in statistical reasoning. When two variables are correlated, there are always at least three possible explanations:

  1. XX causes YY: The explanatory variable directly produces the response.
  2. YY causes XX: The direction of causation is reversed — the response actually influences the explanatory variable.
  3. A confounding variable causes both: A third variable drives both XX and YY, creating the illusion of a direct relationship.

Real-world examples of spurious correlations:

  • Per-capita cheese consumption correlates with the number of people who die by becoming tangled in their bedsheets. Neither causes the other; both happen to trend upward over the same time period.
  • Countries that consume more chocolate per capita win more Nobel Prizes. The confounder is likely national wealth, which supports both chocolate consumption and research funding.
  • Students who eat breakfast tend to have higher GPAs. But students who eat breakfast may also come from stable households with involved parents — the family environment is the confounder.

The rule: Correlation between XX and YY is never sufficient on its own to establish causation. To establish causation, you need a well-designed experiment with random assignment, or a very strong body of observational evidence with all plausible confounders accounted for.

Types of Experimental Designs

Beyond the basic treatment-vs-control setup, there are several common designs that handle different research situations.

Completely Randomized Design

Subjects are randomly assigned to treatment or control with no additional grouping. This is the simplest experimental design and works well when subjects are relatively homogeneous.

Example: 100 patients are randomly divided into two groups of 50. One group receives the experimental drug; the other receives a placebo.

Matched Pairs Design

Subjects are grouped into pairs based on a variable that might influence the outcome (age, severity of condition, etc.). Within each pair, one subject is randomly assigned to treatment and the other to control.

Example: A study on a new pain medication pairs patients by pain severity — one patient with severe pain is paired with another patient with similar severity. Within each pair, one receives the drug and one receives the placebo. This ensures that treatment and control groups have the same distribution of pain severity.

A special case of matched pairs is the before-and-after design, where each subject serves as their own control. You measure the subject before treatment, apply the treatment, and measure again. Each subject’s “pair” is themselves at two different times.

Randomized Block Design

Subjects are divided into blocks based on a variable thought to affect the outcome, and then randomization occurs within each block. This is an extension of the matched pairs idea to more than two treatment groups.

Example: A study tests three different physical therapy protocols on knee surgery patients. Patients are blocked by age group (under 40, 40-60, over 60). Within each age block, patients are randomly assigned to one of the three protocols. This ensures each protocol gets tested across all age groups, preventing age from confounding the results.

Placebo Effect and Blinding

The placebo effect is a measurable improvement in a patient’s condition that results from the belief that they are receiving treatment, rather than from the treatment itself. It is not “imaginary” — it produces real, documented physiological changes. In clinical trials, placebo groups routinely show improvement rates of 20% to 40%.

This is precisely why control groups need placebos: if the treatment group improves by 60% and the placebo group improves by 35%, the treatment’s actual effect is the difference — roughly 25 percentage points. Without the placebo comparison, you would mistakenly attribute the entire 60% to the drug.

Single-blind studies keep subjects unaware of their group assignment. This controls for the placebo effect on the patient side.

Double-blind studies also keep the researchers who interact with subjects, collect data, and assess outcomes unaware of group assignments. This prevents unconscious researcher bias — a physician who knows a patient is receiving the real drug might evaluate symptoms more favorably.

Example 4: Drug Trial Design

A pharmaceutical company tests a new blood pressure medication.

Design:

  • Subjects: 200 patients with stage 1 hypertension (systolic BP between 130 and 139 mmHg)
  • Randomization: A computer randomly assigns 100 patients to the treatment group and 100 to the control group
  • Treatment group: Receives the new medication daily for 12 weeks
  • Control group: Receives an identical-looking sugar pill (placebo) daily for 12 weeks
  • Blinding: Double-blind — neither patients nor the nurses measuring blood pressure know who is in which group. The assignment key is held by an independent data safety board.
  • Outcome measure: Change in systolic blood pressure from baseline to week 12

After 12 weeks:

  • Treatment group mean BP reduction: 14 mmHg
  • Control group mean BP reduction: 5 mmHg
  • Difference: 145=914 - 5 = 9 mmHg attributable to the drug

The 5 mmHg reduction in the control group reflects the placebo effect plus natural variation. The 9 mmHg difference is the drug’s actual estimated effect. A statistical test determines whether this difference is large enough to rule out chance.

Real-World Application: Nursing — Evaluating a New Treatment Protocol

A hospital wants to test whether a new wound care protocol reduces surgical site infection rates compared to the current standard protocol.

Study design:

  1. Population: All patients undergoing scheduled abdominal surgery at the hospital over a 6-month period (approximately 400 patients).

  2. Random assignment: Each eligible patient is randomly assigned to either the new protocol (treatment group) or the current standard protocol (control group) using a computer-generated randomization sequence.

  3. Control group (200 patients): Receives the current wound care protocol — standard dressing changes at established intervals with standard antiseptic solutions.

  4. Treatment group (200 patients): Receives the new protocol — a different dressing material, more frequent changes in the first 48 hours, and a different antiseptic solution.

  5. Blinding: Single-blind. Patients are not told which protocol they receive (the dressings look similar). The nurses performing wound care know which protocol they are following (full blinding is not possible since the procedures differ), but the physicians who assess whether an infection has developed are blinded to group assignment.

  6. Outcome measure: Infection rate at 30 days post-surgery, assessed by blinded physicians using standard diagnostic criteria.

  7. Potential confounders controlled by randomization: Patient age, BMI, diabetes status, smoking history, type of surgery, surgeon skill level. Random assignment distributes these approximately evenly between groups.

Interpreting results: If the treatment group has a 4% infection rate and the control group has a 9% infection rate, the hospital can conclude that the new protocol caused a reduction in infections — because the randomized, controlled design rules out confounders. The difference (5 percentage points) is both clinically meaningful and, with 200 patients per group, likely statistically significant.

Without the control group: If the hospital simply switched all patients to the new protocol and observed a 4% infection rate, it could not conclude the protocol worked. The rate might have been 4% anyway — perhaps a new ventilation system was installed, or a particularly skilled surgeon joined the staff. The control group isolates the effect of the protocol from everything else that might have changed.

Practice Problems

Test your understanding with these problems. Click to reveal each answer.

Problem 1: Researchers find that people who own pets have lower rates of depression than people who do not own pets. Is this an observational study or an experiment? Can they conclude that pet ownership reduces depression?

The researchers did not assign people to own or not own pets — they observed the naturally occurring association.

Answer: This is an observational study. They cannot conclude that pet ownership causes lower depression. There are plausible confounders: people who own pets may be more likely to have stable housing, higher income, active lifestyles, or social support networks — all of which independently reduce depression risk.

Problem 2: A teacher randomly assigns half of her class to study with flashcards and the other half to study with practice tests, then compares their exam scores. Is this an experiment or an observational study? What is the treatment? What is the response variable?

The teacher actively imposed a condition (study method) on students using random assignment.

Answer: This is an experiment. The treatment is the study method (flashcards vs practice tests). The response variable is the exam score. Because students were randomly assigned, the teacher can make a causal conclusion about which method produced better results — assuming the groups were large enough for the random assignment to balance out other factors.

Problem 3: A study finds that children who watch more television have lower grades. A news headline reads: “TV Causes Poor Grades in Children.” Identify the flaw in this headline and name at least one potential confounding variable.

The study is observational — it measured television viewing and grades without manipulating either variable. The headline claims causation from correlational data.

Answer: The flaw is claiming causation from an observational study (correlation does not imply causation). Potential confounders include: parental involvement (less-involved parents may allow more TV and provide less academic support), socioeconomic status (lower-income families may have fewer enrichment activities, leading to both more TV and fewer academic resources), or the child’s pre-existing academic difficulties (children struggling in school may turn to TV as an escape — meaning the direction could be reversed).

Problem 4: A company tests a new ergonomic chair by letting employees volunteer for the new chair or keep their old one. After 6 months, the new-chair group reports less back pain. What design flaw makes it impossible to conclude the chair caused the reduction in back pain?

Employees chose which group to be in rather than being randomly assigned.

Answer: The flaw is lack of random assignment (this is an observational comparison, not a true experiment). Employees who volunteered for the new chair may differ systematically from those who did not — they may be more health-conscious, more likely to use proper posture, younger, or already experiencing back pain and therefore more motivated to try something new. Any of these confounders could explain the difference in back pain. A proper experiment would randomly assign employees to chair type.

Problem 5: A clinical trial tests a new sleep medication. The patients know whether they received the real drug or the placebo because the drug has a distinctive taste. The trial finds that the drug group sleeps 45 minutes longer per night. Why might this result be unreliable, and what type of blinding has failed?

If patients can tell whether they received the drug, those in the treatment group may sleep better partly because they expect to — the placebo effect. Those in the control group may sleep worse because they know they received a placebo and feel disappointed.

Answer: Single-blinding has failed — subjects are effectively unblinded because they can identify their group assignment by taste. The 45-minute improvement may be partly or entirely due to the placebo effect rather than the drug’s pharmacological action. The fix would be to make the placebo taste identical to the drug, restoring the blind.

Key Takeaways

  • Observational studies observe without intervening and can only show associations. Experiments impose treatments and can establish causation.
  • A well-designed experiment requires a control group, random assignment, sufficient replication, and blinding.
  • Confounding variables are related to both the explanatory and response variables, creating misleading associations. Randomization distributes confounders evenly across groups.
  • Correlation does not imply causation. A correlation between XX and YY could mean XX causes YY, YY causes XX, or a third variable causes both.
  • Randomized block and matched pairs designs improve on the completely randomized design by accounting for known sources of variation.
  • The placebo effect produces real improvements from the belief of being treated. Double-blind designs control for both patient expectations and researcher bias.
  • When evaluating any study, ask: Was there random assignment? Was there a control group? Were subjects and researchers blinded? Were there enough subjects? What confounders were not controlled?

Return to Statistics for more topics in this section.

Last updated: March 29, 2026