One-Way ANOVA
Medication dosages, IV drip rates, vital monitoring
Discounts, tax, tips, profit margins
The two-sample t-test lets you compare the means of two groups — but what if you have three or more? A pharmaceutical company testing three dosage levels, a school district comparing four teaching methods, a hospital evaluating five pain management protocols — these situations all require comparing multiple group means at once. You might think the solution is to run all possible pairwise t-tests, but that approach creates a serious statistical problem. ANOVA (Analysis of Variance) solves it by testing all groups simultaneously with a single test.
Why Not Just Use Multiple T-Tests?
With 3 groups, you would need 3 pairwise comparisons (A vs B, A vs C, B vs C). With 4 groups, you need 6 comparisons. With 5 groups, you need 10. The general formula is comparisons for groups.
The problem is inflated Type I error. Each individual t-test has a 5% chance of a false positive (at ). When you run many tests, the probability that at least one produces a false positive grows rapidly. With 3 independent tests at , the overall error rate is approximately:
That is a 14.3% chance of at least one false positive — nearly triple the intended 5%. With 10 pairwise comparisons (5 groups), the rate climbs to about 40%. ANOVA avoids this entirely by performing a single test at one level.
The Logic of ANOVA
Despite its name, Analysis of Variance is a method for comparing means. The key insight is that you can learn about means by studying variances. ANOVA decomposes the total variation in the data into two components:
- Between-group variation (): how much the group means differ from the overall (grand) mean. If the population means truly differ, this will be large.
- Within-group variation (): how much individual observations vary within their own group. This represents natural random variation that exists regardless of whether the means differ.
If between-group variation is large relative to within-group variation, the group means are probably different. If between-group variation is small relative to within-group variation, the differences among group means could easily be due to random chance.
Hypotheses:
Note that the alternative hypothesis does not specify which mean differs or by how much — only that they are not all equal.
The F-Statistic
The test statistic for ANOVA is the F-ratio:
where is the number of groups, is the total number of observations across all groups, is the mean square between groups, and is the mean square within groups.
- A large F value means between-group variation dominates — evidence against
- An F value near 1 means between-group and within-group variation are similar — consistent with
- F is always non-negative, and the F-distribution is right-skewed, so the test is always right-tailed
The F-distribution has two degrees of freedom: (numerator, between groups) and (denominator, within groups).
The ANOVA Table
Results are organized in a standard ANOVA table:
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | ||||
| Within | ||||
| Total |
The key relationships: and . These always hold and serve as useful checks on your arithmetic.
Computing Sums of Squares
Here are the formulas for the three sums of squares:
Total sum of squares — measures total variation across all observations:
Between-group sum of squares — measures variation among group means:
where is the mean of group and is the grand mean of all observations.
Within-group sum of squares — measures variation within each group:
In practice, you can compute any two and find the third using .
Worked Example: Three Teaching Methods
A researcher randomly assigns 15 students to three teaching methods and records their test scores:
- Method A (): 78, 82, 85, 80, 75
- Method B (): 88, 92, 85, 90, 95
- Method C (): 72, 68, 75, 70, 65
Step 1: Compute group means and the grand mean.
Step 2: Compute .
Step 3: Compute by summing squared deviations within each group.
Method A (deviations from 80):
Method B (deviations from 90):
Method C (deviations from 70):
Step 4: Verify with .
Check: and ✓.
Step 5: Compute mean squares and the F-statistic.
Step 6: Build the completed ANOVA table.
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | 1000 | 2 | 500 | 34.48 |
| Within | 174 | 12 | 14.5 | |
| Total | 1174 | 14 |
Step 7: Compare to the critical value. With and at , the critical value is .
Since far exceeds , we reject .
Step 8: Conclusion in context. There is overwhelming evidence that at least one teaching method produces different results. Method B has the highest mean (90), Method A is in the middle (80), and Method C has the lowest mean (70). The differences are both statistically significant and educationally meaningful — a 20-point spread between the best and worst methods warrants serious attention from school administrators.
Conditions for ANOVA
The validity of the F-test depends on four conditions:
- Random samples or random assignment — each group must be a random sample from its population, or subjects must be randomly assigned to groups
- Independent groups — the groups must be independent of each other (no subject appears in more than one group)
- Normality — the population distribution within each group should be approximately normal. With small samples (under 30 per group), check for strong skewness or outliers. With larger samples, ANOVA is robust to moderate departures from normality.
- Equal variances — the population standard deviations should be roughly equal across groups. A common rule of thumb: the largest group standard deviation should be no more than twice the smallest. In this example, each group has , so the condition is perfectly satisfied.
Of these conditions, ANOVA is most sensitive to violations of independence. It is moderately robust to non-normality (especially with balanced group sizes) and moderately robust to unequal variances (especially when group sizes are equal).
What ANOVA Does NOT Tell You
A significant F-test tells you that at least one mean differs from the others. It does not tell you:
- Which specific means differ from each other
- How many means are different
- The direction or magnitude of specific differences
To identify which specific pairs of means differ, you need post-hoc tests (also called multiple comparison procedures). The most common is Tukey’s Honest Significant Difference (HSD), which compares every pair of group means while controlling the overall Type I error rate. Other options include the Bonferroni correction, Scheffé’s method, and Dunnett’s test (when comparing all groups to a single control). Post-hoc tests are a topic for a more advanced course, but the key point is: always run ANOVA first, and only perform post-hoc comparisons if the F-test is significant.
Common F Critical Values Reference Table
| , | |
|---|---|
| 2, 10 | 4.103 |
| 2, 12 | 3.885 |
| 2, 15 | 3.682 |
| 2, 20 | 3.493 |
| 3, 12 | 3.490 |
| 3, 20 | 3.098 |
| 4, 20 | 2.866 |
To use the table: (number of groups minus 1) is the numerator and (total observations minus number of groups) is the denominator. Reject if your calculated F exceeds the table value.
Real-World Application: Nursing — Comparing Pain Levels Across Treatments
A hospital pain management team wants to compare three approaches for post-surgical pain relief: medication only, medication combined with physical therapy, and physical therapy only. They randomly assign 24 patients (8 per group) and measure pain levels on a 0-to-10 scale after 48 hours.
- Medication only (): mean pain score , standard deviation
- Medication + therapy (): mean pain score , standard deviation
- Therapy only (): mean pain score , standard deviation
Grand mean:
For , use the fact that , so :
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | 30.24 | 2 | 15.12 | 4.780 |
| Within | 66.43 | 21 | 3.163 | |
| Total | 96.67 | 23 |
. The critical value for at is approximately 3.47. Since , we reject .
There is significant evidence that pain levels differ across the three treatment approaches. The combined medication-and-therapy group reported the lowest pain (mean 3.4), while therapy alone reported the highest (mean 6.1). This suggests the combined approach is most effective, and the nursing team should consider making it the standard protocol. Post-hoc tests would confirm which specific pairs of treatments differ significantly.
Practice Problems
Test your understanding with these problems. Click to reveal each answer.
Problem 1: A farmer tests three fertilizers on 12 plots (4 plots each). Crop yields (in bushels): Fertilizer A: 45, 50, 48, 47. Fertilizer B: 55, 58, 52, 55. Fertilizer C: 49, 51, 47, 53. Test at .
Group means: , , .
Grand mean: .
Within-group sums of squares:
- A:
- B:
- C:
, .
.
Critical value at .
Since , reject .
Answer: There is significant evidence that the fertilizers produce different yields. Fertilizer B has the highest mean yield (55.0 bushels), followed by C (50.0) and A (47.5).
Problem 2: A company measures employee productivity (units per hour) across three departments. Department 1 (): mean 24.5, . Department 2 (): mean 22.0, . Department 3 (): mean 23.5, . Test at .
Grand mean:
, .
.
Critical value at .
Since is less than , fail to reject .
Answer: There is not sufficient evidence that productivity differs across the three departments. The observed differences in means (24.5, 22.0, 23.5) are small relative to the within-group variability.
Problem 3: Four different medications are tested on groups of 10 patients each (). The ANOVA table shows , . Complete the table and test at .
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Between | 240 | |||
| Within | 720 | |||
| Total | 960 | 39 |
Check: ✓ and ✓.
Critical value at .
Since , reject .
Answer: There is significant evidence that at least one medication produces a different mean outcome. Post-hoc tests would be needed to determine which specific medications differ.
Problem 4: A researcher obtains with and at . The critical value is . What is the conclusion?
Since is less than the critical value of , we fail to reject .
Answer: There is not sufficient evidence to conclude that any of the four group means differ. The observed variation between groups is not large enough relative to the variation within groups to rule out random chance.
Problem 5: Three exercise programs are compared. Group 1 (): lost an average of 12.0 lbs with lbs. Group 2 (): lost 8.5 lbs with lbs. Group 3 (): lost 11.0 lbs with lbs. Test whether the programs produce different weight loss at .
Grand mean:
, .
.
Critical value at .
Since is less than , fail to reject .
Answer: There is not sufficient evidence that the three exercise programs produce different average weight loss. Although Group 1 lost the most on average (12.0 lbs vs 8.5 and 11.0), the within-group variability is too large relative to the between-group differences to rule out chance.
Key Takeaways
- ANOVA tests whether the means of three or more groups are all equal, using a single F-test instead of multiple pairwise t-tests
- Running multiple t-tests inflates the Type I error rate — ANOVA controls it at one level
- ANOVA compares between-group variation () to within-group variation () using
- A large F value indicates the group means differ more than random variation would explain
- The ANOVA table organizes the calculation: and always hold
- Conditions: random samples, independent groups, approximate normality within groups, and roughly equal variances
- A significant F-test tells you at least one mean differs, but not which ones — use post-hoc tests (like Tukey’s HSD) to identify specific differences
- In clinical and nursing settings, ANOVA is essential for comparing multiple treatment protocols, dosage levels, or care approaches to determine which produces the best patient outcomes
Return to Statistics for more topics in this section.
Next Up in Statistics
All Statistics topicsLast updated: March 29, 2026