Statistics

One-Way ANOVA

Last updated: March 2026 · Advanced
Before you start

You should be comfortable with:

Real-world applications
💊
Nursing

Medication dosages, IV drip rates, vital monitoring

💰
Retail & Finance

Discounts, tax, tips, profit margins

The two-sample t-test lets you compare the means of two groups — but what if you have three or more? A pharmaceutical company testing three dosage levels, a school district comparing four teaching methods, a hospital evaluating five pain management protocols — these situations all require comparing multiple group means at once. You might think the solution is to run all possible pairwise t-tests, but that approach creates a serious statistical problem. ANOVA (Analysis of Variance) solves it by testing all groups simultaneously with a single test.

Why Not Just Use Multiple T-Tests?

With 3 groups, you would need 3 pairwise comparisons (A vs B, A vs C, B vs C). With 4 groups, you need 6 comparisons. With 5 groups, you need 10. The general formula is (k2)=k(k1)2\binom{k}{2} = \frac{k(k-1)}{2} comparisons for kk groups.

The problem is inflated Type I error. Each individual t-test has a 5% chance of a false positive (at α=0.05\alpha = 0.05). When you run many tests, the probability that at least one produces a false positive grows rapidly. With 3 independent tests at α=0.05\alpha = 0.05, the overall error rate is approximately:

1(10.05)3=10.8574=0.14261 - (1 - 0.05)^3 = 1 - 0.8574 = 0.1426

That is a 14.3% chance of at least one false positive — nearly triple the intended 5%. With 10 pairwise comparisons (5 groups), the rate climbs to about 40%. ANOVA avoids this entirely by performing a single test at one α\alpha level.

The Logic of ANOVA

Despite its name, Analysis of Variance is a method for comparing means. The key insight is that you can learn about means by studying variances. ANOVA decomposes the total variation in the data into two components:

  • Between-group variation (SSbetweenSS_{between}): how much the group means differ from the overall (grand) mean. If the population means truly differ, this will be large.
  • Within-group variation (SSwithinSS_{within}): how much individual observations vary within their own group. This represents natural random variation that exists regardless of whether the means differ.

If between-group variation is large relative to within-group variation, the group means are probably different. If between-group variation is small relative to within-group variation, the differences among group means could easily be due to random chance.

Hypotheses:

H0:μ1=μ2=μ3==μk(all population means are equal)H_0: \mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k \quad \text{(all population means are equal)}

Ha:at least one population mean is differentH_a: \text{at least one population mean is different}

Note that the alternative hypothesis does not specify which mean differs or by how much — only that they are not all equal.

The F-Statistic

The test statistic for ANOVA is the F-ratio:

F=MSbetweenMSwithin=SSbetween/(k1)SSwithin/(Nk)F = \frac{MS_{between}}{MS_{within}} = \frac{SS_{between} / (k - 1)}{SS_{within} / (N - k)}

where kk is the number of groups, NN is the total number of observations across all groups, MSbetweenMS_{between} is the mean square between groups, and MSwithinMS_{within} is the mean square within groups.

  • A large F value means between-group variation dominates — evidence against H0H_0
  • An F value near 1 means between-group and within-group variation are similar — consistent with H0H_0
  • F is always non-negative, and the F-distribution is right-skewed, so the test is always right-tailed

The F-distribution has two degrees of freedom: df1=k1df_1 = k - 1 (numerator, between groups) and df2=Nkdf_2 = N - k (denominator, within groups).

The ANOVA Table

Results are organized in a standard ANOVA table:

SourceSSdfMSF
BetweenSSBSS_Bk1k - 1SSB/(k1)SS_B / (k-1)MSB/MSWMS_B / MS_W
WithinSSWSS_WNkN - kSSW/(Nk)SS_W / (N-k)
TotalSSTSS_TN1N - 1

The key relationships: SST=SSB+SSWSS_T = SS_B + SS_W and dfT=dfB+dfWdf_T = df_B + df_W. These always hold and serve as useful checks on your arithmetic.

Computing Sums of Squares

Here are the formulas for the three sums of squares:

Total sum of squares — measures total variation across all observations:

SST=i=1kj=1ni(xijxˉ..)2SS_T = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_{..})^2

Between-group sum of squares — measures variation among group means:

SSB=i=1kni(xˉixˉ..)2SS_B = \sum_{i=1}^{k} n_i (\bar{x}_i - \bar{x}_{..})^2

where xˉi\bar{x}_i is the mean of group ii and xˉ..\bar{x}_{..} is the grand mean of all observations.

Within-group sum of squares — measures variation within each group:

SSW=i=1kj=1ni(xijxˉi)2SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_i)^2

In practice, you can compute any two and find the third using SST=SSB+SSWSS_T = SS_B + SS_W.

Worked Example: Three Teaching Methods

A researcher randomly assigns 15 students to three teaching methods and records their test scores:

  • Method A (n1=5n_1 = 5): 78, 82, 85, 80, 75
  • Method B (n2=5n_2 = 5): 88, 92, 85, 90, 95
  • Method C (n3=5n_3 = 5): 72, 68, 75, 70, 65

Step 1: Compute group means and the grand mean.

xˉA=78+82+85+80+755=4005=80.0\bar{x}_A = \frac{78 + 82 + 85 + 80 + 75}{5} = \frac{400}{5} = 80.0

xˉB=88+92+85+90+955=4505=90.0\bar{x}_B = \frac{88 + 92 + 85 + 90 + 95}{5} = \frac{450}{5} = 90.0

xˉC=72+68+75+70+655=3505=70.0\bar{x}_C = \frac{72 + 68 + 75 + 70 + 65}{5} = \frac{350}{5} = 70.0

xˉ..=400+450+35015=120015=80.0\bar{x}_{..} = \frac{400 + 450 + 350}{15} = \frac{1200}{15} = 80.0

Step 2: Compute SSbetweenSS_{between}.

SSB=5(8080)2+5(9080)2+5(7080)2=5(0)+5(100)+5(100)=0+500+500=1000SS_B = 5(80 - 80)^2 + 5(90 - 80)^2 + 5(70 - 80)^2 = 5(0) + 5(100) + 5(100) = 0 + 500 + 500 = 1000

Step 3: Compute SSwithinSS_{within} by summing squared deviations within each group.

Method A (deviations from 80):

(7880)2+(8280)2+(8580)2+(8080)2+(7580)2=4+4+25+0+25=58(78-80)^2 + (82-80)^2 + (85-80)^2 + (80-80)^2 + (75-80)^2 = 4 + 4 + 25 + 0 + 25 = 58

Method B (deviations from 90):

(8890)2+(9290)2+(8590)2+(9090)2+(9590)2=4+4+25+0+25=58(88-90)^2 + (92-90)^2 + (85-90)^2 + (90-90)^2 + (95-90)^2 = 4 + 4 + 25 + 0 + 25 = 58

Method C (deviations from 70):

(7270)2+(6870)2+(7570)2+(7070)2+(6570)2=4+4+25+0+25=58(72-70)^2 + (68-70)^2 + (75-70)^2 + (70-70)^2 + (65-70)^2 = 4 + 4 + 25 + 0 + 25 = 58

SSW=58+58+58=174SS_W = 58 + 58 + 58 = 174

Step 4: Verify with SSTSS_T.

SST=SSB+SSW=1000+174=1174SS_T = SS_B + SS_W = 1000 + 174 = 1174

Check: dfT=N1=151=14df_T = N - 1 = 15 - 1 = 14 and dfB+dfW=2+12=14df_B + df_W = 2 + 12 = 14 ✓.

Step 5: Compute mean squares and the F-statistic.

MSB=SSBdfB=10002=500MS_B = \frac{SS_B}{df_B} = \frac{1000}{2} = 500

MSW=SSWdfW=17412=14.5MS_W = \frac{SS_W}{df_W} = \frac{174}{12} = 14.5

F=MSBMSW=50014.5=34.48F = \frac{MS_B}{MS_W} = \frac{500}{14.5} = 34.48

Step 6: Build the completed ANOVA table.

SourceSSdfMSF
Between1000250034.48
Within1741214.5
Total117414

Step 7: Compare to the critical value. With df1=2df_1 = 2 and df2=12df_2 = 12 at α=0.05\alpha = 0.05, the critical value is F0.05=3.885F_{0.05} = 3.885.

Since 34.4834.48 far exceeds 3.8853.885, we reject H0H_0.

Step 8: Conclusion in context. There is overwhelming evidence that at least one teaching method produces different results. Method B has the highest mean (90), Method A is in the middle (80), and Method C has the lowest mean (70). The differences are both statistically significant and educationally meaningful — a 20-point spread between the best and worst methods warrants serious attention from school administrators.

Conditions for ANOVA

The validity of the F-test depends on four conditions:

  • Random samples or random assignment — each group must be a random sample from its population, or subjects must be randomly assigned to groups
  • Independent groups — the groups must be independent of each other (no subject appears in more than one group)
  • Normality — the population distribution within each group should be approximately normal. With small samples (under 30 per group), check for strong skewness or outliers. With larger samples, ANOVA is robust to moderate departures from normality.
  • Equal variances — the population standard deviations should be roughly equal across groups. A common rule of thumb: the largest group standard deviation should be no more than twice the smallest. In this example, each group has s=58/4=14.53.81s = \sqrt{58/4} = \sqrt{14.5} \approx 3.81, so the condition is perfectly satisfied.

Of these conditions, ANOVA is most sensitive to violations of independence. It is moderately robust to non-normality (especially with balanced group sizes) and moderately robust to unequal variances (especially when group sizes are equal).

What ANOVA Does NOT Tell You

A significant F-test tells you that at least one mean differs from the others. It does not tell you:

  • Which specific means differ from each other
  • How many means are different
  • The direction or magnitude of specific differences

To identify which specific pairs of means differ, you need post-hoc tests (also called multiple comparison procedures). The most common is Tukey’s Honest Significant Difference (HSD), which compares every pair of group means while controlling the overall Type I error rate. Other options include the Bonferroni correction, Scheffé’s method, and Dunnett’s test (when comparing all groups to a single control). Post-hoc tests are a topic for a more advanced course, but the key point is: always run ANOVA first, and only perform post-hoc comparisons if the F-test is significant.

Common F Critical Values Reference Table

df1df_1, df2df_2α=0.05\alpha = 0.05
2, 104.103
2, 123.885
2, 153.682
2, 203.493
3, 123.490
3, 203.098
4, 202.866

To use the table: df1=k1df_1 = k - 1 (number of groups minus 1) is the numerator and df2=Nkdf_2 = N - k (total observations minus number of groups) is the denominator. Reject H0H_0 if your calculated F exceeds the table value.

Real-World Application: Nursing — Comparing Pain Levels Across Treatments

A hospital pain management team wants to compare three approaches for post-surgical pain relief: medication only, medication combined with physical therapy, and physical therapy only. They randomly assign 24 patients (8 per group) and measure pain levels on a 0-to-10 scale after 48 hours.

  • Medication only (n=8n = 8): mean pain score xˉ1=5.2\bar{x}_1 = 5.2, standard deviation s1=1.8s_1 = 1.8
  • Medication + therapy (n=8n = 8): mean pain score xˉ2=3.4\bar{x}_2 = 3.4, standard deviation s2=1.5s_2 = 1.5
  • Therapy only (n=8n = 8): mean pain score xˉ3=6.1\bar{x}_3 = 6.1, standard deviation s3=2.0s_3 = 2.0

Grand mean: xˉ..=8(5.2)+8(3.4)+8(6.1)24=41.6+27.2+48.824=117.624=4.9\bar{x}_{..} = \frac{8(5.2) + 8(3.4) + 8(6.1)}{24} = \frac{41.6 + 27.2 + 48.8}{24} = \frac{117.6}{24} = 4.9

SSB=8(5.24.9)2+8(3.44.9)2+8(6.14.9)2SS_B = 8(5.2 - 4.9)^2 + 8(3.4 - 4.9)^2 + 8(6.1 - 4.9)^2

=8(0.09)+8(2.25)+8(1.44)=0.72+18.0+11.52=30.24= 8(0.09) + 8(2.25) + 8(1.44) = 0.72 + 18.0 + 11.52 = 30.24

For SSWSS_W, use the fact that si2=SSi/(ni1)s_i^2 = SS_i / (n_i - 1), so SSi=si2×(ni1)SS_i = s_i^2 \times (n_i - 1):

SSW=1.82(7)+1.52(7)+2.02(7)=3.24(7)+2.25(7)+4.0(7)=22.68+15.75+28.0=66.43SS_W = 1.8^2(7) + 1.5^2(7) + 2.0^2(7) = 3.24(7) + 2.25(7) + 4.0(7) = 22.68 + 15.75 + 28.0 = 66.43

SourceSSdfMSF
Between30.24215.124.780
Within66.43213.163
Total96.6723

F=15.12/3.163=4.780F = 15.12 / 3.163 = 4.780. The critical value for F(2,21)F(2, 21) at α=0.05\alpha = 0.05 is approximately 3.47. Since 4.780>3.474.780 > 3.47, we reject H0H_0.

There is significant evidence that pain levels differ across the three treatment approaches. The combined medication-and-therapy group reported the lowest pain (mean 3.4), while therapy alone reported the highest (mean 6.1). This suggests the combined approach is most effective, and the nursing team should consider making it the standard protocol. Post-hoc tests would confirm which specific pairs of treatments differ significantly.

Practice Problems

Test your understanding with these problems. Click to reveal each answer.

Problem 1: A farmer tests three fertilizers on 12 plots (4 plots each). Crop yields (in bushels): Fertilizer A: 45, 50, 48, 47. Fertilizer B: 55, 58, 52, 55. Fertilizer C: 49, 51, 47, 53. Test at α=0.05\alpha = 0.05.

Group means: xˉA=190/4=47.5\bar{x}_A = 190/4 = 47.5, xˉB=220/4=55.0\bar{x}_B = 220/4 = 55.0, xˉC=200/4=50.0\bar{x}_C = 200/4 = 50.0.

Grand mean: xˉ..=(190+220+200)/12=610/12=50.833\bar{x}_{..} = (190 + 220 + 200)/12 = 610/12 = 50.833.

SSB=4(47.550.833)2+4(5550.833)2+4(5050.833)2SS_B = 4(47.5 - 50.833)^2 + 4(55 - 50.833)^2 + 4(50 - 50.833)^2

=4(11.109)+4(17.364)+4(0.694)=44.436+69.456+2.776=116.668= 4(11.109) + 4(17.364) + 4(0.694) = 44.436 + 69.456 + 2.776 = 116.668

Within-group sums of squares:

  • A: (4547.5)2+(5047.5)2+(4847.5)2+(4747.5)2=6.25+6.25+0.25+0.25=13.0(45-47.5)^2 + (50-47.5)^2 + (48-47.5)^2 + (47-47.5)^2 = 6.25 + 6.25 + 0.25 + 0.25 = 13.0
  • B: (5555)2+(5855)2+(5255)2+(5555)2=0+9+9+0=18.0(55-55)^2 + (58-55)^2 + (52-55)^2 + (55-55)^2 = 0 + 9 + 9 + 0 = 18.0
  • C: (4950)2+(5150)2+(4750)2+(5350)2=1+1+9+9=20.0(49-50)^2 + (51-50)^2 + (47-50)^2 + (53-50)^2 = 1 + 1 + 9 + 9 = 20.0

SSW=13.0+18.0+20.0=51.0SS_W = 13.0 + 18.0 + 20.0 = 51.0

MSB=116.668/2=58.334MS_B = 116.668/2 = 58.334, MSW=51.0/9=5.667MS_W = 51.0/9 = 5.667.

F=58.334/5.667=10.29F = 58.334/5.667 = 10.29.

Critical value F(2,9)F(2, 9) at α=0.054.256\alpha = 0.05 \approx 4.256.

Since 10.29>4.25610.29 > 4.256, reject H0H_0.

Answer: There is significant evidence that the fertilizers produce different yields. Fertilizer B has the highest mean yield (55.0 bushels), followed by C (50.0) and A (47.5).

Problem 2: A company measures employee productivity (units per hour) across three departments. Department 1 (n=6n = 6): mean 24.5, s=3.2s = 3.2. Department 2 (n=6n = 6): mean 22.0, s=2.8s = 2.8. Department 3 (n=6n = 6): mean 23.5, s=3.0s = 3.0. Test at α=0.05\alpha = 0.05.

Grand mean: xˉ..=(6×24.5+6×22.0+6×23.5)/18=(147+132+141)/18=420/18=23.333\bar{x}_{..} = (6 \times 24.5 + 6 \times 22.0 + 6 \times 23.5)/18 = (147 + 132 + 141)/18 = 420/18 = 23.333

SSB=6(24.523.333)2+6(22.023.333)2+6(23.523.333)2SS_B = 6(24.5-23.333)^2 + 6(22.0-23.333)^2 + 6(23.5-23.333)^2

=6(1.361)+6(1.777)+6(0.028)=8.166+10.662+0.168=18.996= 6(1.361) + 6(1.777) + 6(0.028) = 8.166 + 10.662 + 0.168 = 18.996

SSW=3.22(5)+2.82(5)+3.02(5)=51.2+39.2+45.0=135.4SS_W = 3.2^2(5) + 2.8^2(5) + 3.0^2(5) = 51.2 + 39.2 + 45.0 = 135.4

MSB=18.996/2=9.498MS_B = 18.996/2 = 9.498, MSW=135.4/15=9.027MS_W = 135.4/15 = 9.027.

F=9.498/9.027=1.052F = 9.498/9.027 = 1.052.

Critical value F(2,15)F(2, 15) at α=0.05=3.682\alpha = 0.05 = 3.682.

Since 1.0521.052 is less than 3.6823.682, fail to reject H0H_0.

Answer: There is not sufficient evidence that productivity differs across the three departments. The observed differences in means (24.5, 22.0, 23.5) are small relative to the within-group variability.

Problem 3: Four different medications are tested on groups of 10 patients each (N=40N = 40). The ANOVA table shows SSB=240SS_B = 240, SSW=720SS_W = 720. Complete the table and test at α=0.05\alpha = 0.05.
SourceSSdfMSF
Between24041=34-1=3240/3=80240/3=8080/20=4.080/20=4.0
Within720404=3640-4=36720/36=20720/36=20
Total96039

Check: 240+720=960240 + 720 = 960 ✓ and 3+36=393 + 36 = 39 ✓.

Critical value F(3,36)F(3, 36) at α=0.052.87\alpha = 0.05 \approx 2.87.

Since 4.0>2.874.0 > 2.87, reject H0H_0.

Answer: There is significant evidence that at least one medication produces a different mean outcome. Post-hoc tests would be needed to determine which specific medications differ.

Problem 4: A researcher obtains F=2.15F = 2.15 with df1=3df_1 = 3 and df2=20df_2 = 20 at α=0.05\alpha = 0.05. The critical value is F0.05=3.098F_{0.05} = 3.098. What is the conclusion?

Since F=2.15F = 2.15 is less than the critical value of 3.0983.098, we fail to reject H0H_0.

Answer: There is not sufficient evidence to conclude that any of the four group means differ. The observed variation between groups is not large enough relative to the variation within groups to rule out random chance.

Problem 5: Three exercise programs are compared. Group 1 (n=8n = 8): lost an average of 12.0 lbs with s=3.5s = 3.5 lbs. Group 2 (n=8n = 8): lost 8.5 lbs with s=4.0s = 4.0 lbs. Group 3 (n=8n = 8): lost 11.0 lbs with s=3.0s = 3.0 lbs. Test whether the programs produce different weight loss at α=0.05\alpha = 0.05.

Grand mean: xˉ..=(8×12.0+8×8.5+8×11.0)/24=(96+68+88)/24=252/24=10.5\bar{x}_{..} = (8 \times 12.0 + 8 \times 8.5 + 8 \times 11.0)/24 = (96 + 68 + 88)/24 = 252/24 = 10.5

SSB=8(12.010.5)2+8(8.510.5)2+8(11.010.5)2SS_B = 8(12.0 - 10.5)^2 + 8(8.5 - 10.5)^2 + 8(11.0 - 10.5)^2

=8(2.25)+8(4.0)+8(0.25)=18.0+32.0+2.0=52.0= 8(2.25) + 8(4.0) + 8(0.25) = 18.0 + 32.0 + 2.0 = 52.0

SSW=3.52(7)+4.02(7)+3.02(7)=85.75+112.0+63.0=260.75SS_W = 3.5^2(7) + 4.0^2(7) + 3.0^2(7) = 85.75 + 112.0 + 63.0 = 260.75

MSB=52.0/2=26.0MS_B = 52.0/2 = 26.0, MSW=260.75/21=12.417MS_W = 260.75/21 = 12.417.

F=26.0/12.417=2.094F = 26.0/12.417 = 2.094.

Critical value F(2,21)F(2, 21) at α=0.053.47\alpha = 0.05 \approx 3.47.

Since 2.0942.094 is less than 3.473.47, fail to reject H0H_0.

Answer: There is not sufficient evidence that the three exercise programs produce different average weight loss. Although Group 1 lost the most on average (12.0 lbs vs 8.5 and 11.0), the within-group variability is too large relative to the between-group differences to rule out chance.

Key Takeaways

  • ANOVA tests whether the means of three or more groups are all equal, using a single F-test instead of multiple pairwise t-tests
  • Running multiple t-tests inflates the Type I error rate — ANOVA controls it at one α\alpha level
  • ANOVA compares between-group variation (MSBMS_B) to within-group variation (MSWMS_W) using F=MSB/MSWF = MS_B / MS_W
  • A large F value indicates the group means differ more than random variation would explain
  • The ANOVA table organizes the calculation: SST=SSB+SSWSS_T = SS_B + SS_W and dfT=dfB+dfWdf_T = df_B + df_W always hold
  • Conditions: random samples, independent groups, approximate normality within groups, and roughly equal variances
  • A significant F-test tells you at least one mean differs, but not which ones — use post-hoc tests (like Tukey’s HSD) to identify specific differences
  • In clinical and nursing settings, ANOVA is essential for comparing multiple treatment protocols, dosage levels, or care approaches to determine which produces the best patient outcomes

Return to Statistics for more topics in this section.

Last updated: March 29, 2026