Statistics

One-Way ANOVA

Last updated: March 2026 · Advanced

Before you start

You should be comfortable with:

Introduction to Hypothesis Testing One-Sample T-Test

Real-world applications

💊

Nursing

Medication dosages, IV drip rates, vital monitoring

💰

Retail & Finance

Discounts, tax, tips, profit margins

The two-sample t-test lets you compare the means of two groups — but what if you have three or more? A pharmaceutical company testing three dosage levels, a school district comparing four teaching methods, a hospital evaluating five pain management protocols — these situations all require comparing multiple group means at once. You might think the solution is to run all possible pairwise t-tests, but that approach creates a serious statistical problem. ANOVA (Analysis of Variance) solves it by testing all groups simultaneously with a single test.

Why Not Just Use Multiple T-Tests?

With 3 groups, you would need 3 pairwise comparisons (A vs B, A vs C, B vs C). With 4 groups, you need 6 comparisons. With 5 groups, you need 10. The general formula is $\binom{k}{2} = \frac{k(k-1)}{2}$ comparisons for $k$ groups.

The problem is inflated Type I error. Each individual t-test has a 5% chance of a false positive (at $\alpha = 0.05$ ). When you run many tests, the probability that at least one produces a false positive grows rapidly. With 3 independent tests at $\alpha = 0.05$ , the overall error rate is approximately:

$1 - (1 - 0.05)^3 = 1 - 0.8574 = 0.1426$

That is a 14.3% chance of at least one false positive — nearly triple the intended 5%. With 10 pairwise comparisons (5 groups), the rate climbs to about 40%. ANOVA avoids this entirely by performing a single test at one $\alpha$ level.

The Logic of ANOVA

Despite its name, Analysis of Variance is a method for comparing means. The key insight is that you can learn about means by studying variances. ANOVA decomposes the total variation in the data into two components:

Between-group variation ( $SS_{between}$ ): how much the group means differ from the overall (grand) mean. If the population means truly differ, this will be large.
Within-group variation ( $SS_{within}$ ): how much individual observations vary within their own group. This represents natural random variation that exists regardless of whether the means differ.

If between-group variation is large relative to within-group variation, the group means are probably different. If between-group variation is small relative to within-group variation, the differences among group means could easily be due to random chance.

Hypotheses:

$H_0: \mu_1 = \mu_2 = \mu_3 = \cdots = \mu_k \quad \text{(all population means are equal)}$

$H_a: \text{at least one population mean is different}$

Note that the alternative hypothesis does not specify which mean differs or by how much — only that they are not all equal.

The F-Statistic

The test statistic for ANOVA is the F-ratio:

$F = \frac{MS_{between}}{MS_{within}} = \frac{SS_{between} / (k - 1)}{SS_{within} / (N - k)}$

where $k$ is the number of groups, $N$ is the total number of observations across all groups, $MS_{between}$ is the mean square between groups, and $MS_{within}$ is the mean square within groups.

A large F value means between-group variation dominates — evidence against $H_0$
An F value near 1 means between-group and within-group variation are similar — consistent with $H_0$
F is always non-negative, and the F-distribution is right-skewed, so the test is always right-tailed

The F-distribution has two degrees of freedom: $df_1 = k - 1$ (numerator, between groups) and $df_2 = N - k$ (denominator, within groups).

The ANOVA Table

Results are organized in a standard ANOVA table:

Source	SS	df	MS	F
Between	$SS_B$	$k - 1$	$SS_B / (k-1)$	$MS_B / MS_W$
Within	$SS_W$	$N - k$	$SS_W / (N-k)$
Total	$SS_T$	$N - 1$

The key relationships: $SS_T = SS_B + SS_W$ and $df_T = df_B + df_W$ . These always hold and serve as useful checks on your arithmetic.

Computing Sums of Squares

Here are the formulas for the three sums of squares:

Total sum of squares — measures total variation across all observations:

$SS_T = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_{..})^2$

Between-group sum of squares — measures variation among group means:

$SS_B = \sum_{i=1}^{k} n_i (\bar{x}_i - \bar{x}_{..})^2$

where $\bar{x}_i$ is the mean of group $i$ and $\bar{x}_{..}$ is the grand mean of all observations.

Within-group sum of squares — measures variation within each group:

$SS_W = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_i)^2$

In practice, you can compute any two and find the third using $SS_T = SS_B + SS_W$ .

Worked Example: Three Teaching Methods

A researcher randomly assigns 15 students to three teaching methods and records their test scores:

Method A ( $n_1 = 5$ ): 78, 82, 85, 80, 75
Method B ( $n_2 = 5$ ): 88, 92, 85, 90, 95
Method C ( $n_3 = 5$ ): 72, 68, 75, 70, 65

Step 1: Compute group means and the grand mean.

$\bar{x}_A = \frac{78 + 82 + 85 + 80 + 75}{5} = \frac{400}{5} = 80.0$

$\bar{x}_B = \frac{88 + 92 + 85 + 90 + 95}{5} = \frac{450}{5} = 90.0$

$\bar{x}_C = \frac{72 + 68 + 75 + 70 + 65}{5} = \frac{350}{5} = 70.0$

$\bar{x}_{..} = \frac{400 + 450 + 350}{15} = \frac{1200}{15} = 80.0$

Step 2: Compute $SS_{between}$ .

$SS_B = 5(80 - 80)^2 + 5(90 - 80)^2 + 5(70 - 80)^2 = 5(0) + 5(100) + 5(100) = 0 + 500 + 500 = 1000$

Step 3: Compute $SS_{within}$ by summing squared deviations within each group.

Method A (deviations from 80):

$(78-80)^2 + (82-80)^2 + (85-80)^2 + (80-80)^2 + (75-80)^2 = 4 + 4 + 25 + 0 + 25 = 58$

Method B (deviations from 90):

$(88-90)^2 + (92-90)^2 + (85-90)^2 + (90-90)^2 + (95-90)^2 = 4 + 4 + 25 + 0 + 25 = 58$

Method C (deviations from 70):

$(72-70)^2 + (68-70)^2 + (75-70)^2 + (70-70)^2 + (65-70)^2 = 4 + 4 + 25 + 0 + 25 = 58$

$SS_W = 58 + 58 + 58 = 174$

Step 4: Verify with $SS_T$ .

$SS_T = SS_B + SS_W = 1000 + 174 = 1174$

Check: $df_T = N - 1 = 15 - 1 = 14$ and $df_B + df_W = 2 + 12 = 14$ ✓.

Step 5: Compute mean squares and the F-statistic.

$MS_B = \frac{SS_B}{df_B} = \frac{1000}{2} = 500$

$MS_W = \frac{SS_W}{df_W} = \frac{174}{12} = 14.5$

$F = \frac{MS_B}{MS_W} = \frac{500}{14.5} = 34.48$

Step 6: Build the completed ANOVA table.

Source	SS	df	MS	F
Between	1000	2	500	34.48
Within	174	12	14.5
Total	1174	14

Step 7: Compare to the critical value. With $df_1 = 2$ and $df_2 = 12$ at $\alpha = 0.05$ , the critical value is $F_{0.05} = 3.885$ .

Since $34.48$ far exceeds $3.885$ , we reject $H_0$ .

Step 8: Conclusion in context. There is overwhelming evidence that at least one teaching method produces different results. Method B has the highest mean (90), Method A is in the middle (80), and Method C has the lowest mean (70). The differences are both statistically significant and educationally meaningful — a 20-point spread between the best and worst methods warrants serious attention from school administrators.

Conditions for ANOVA

The validity of the F-test depends on four conditions:

Random samples or random assignment — each group must be a random sample from its population, or subjects must be randomly assigned to groups
Independent groups — the groups must be independent of each other (no subject appears in more than one group)
Normality — the population distribution within each group should be approximately normal. With small samples (under 30 per group), check for strong skewness or outliers. With larger samples, ANOVA is robust to moderate departures from normality.
Equal variances — the population standard deviations should be roughly equal across groups. A common rule of thumb: the largest group standard deviation should be no more than twice the smallest. In this example, each group has $s = \sqrt{58/4} = \sqrt{14.5} \approx 3.81$ , so the condition is perfectly satisfied.

Of these conditions, ANOVA is most sensitive to violations of independence. It is moderately robust to non-normality (especially with balanced group sizes) and moderately robust to unequal variances (especially when group sizes are equal).

What ANOVA Does NOT Tell You

A significant F-test tells you that at least one mean differs from the others. It does not tell you:

Which specific means differ from each other
How many means are different
The direction or magnitude of specific differences

To identify which specific pairs of means differ, you need post-hoc tests (also called multiple comparison procedures). The most common is Tukey’s Honest Significant Difference (HSD), which compares every pair of group means while controlling the overall Type I error rate. Other options include the Bonferroni correction, Scheffé’s method, and Dunnett’s test (when comparing all groups to a single control). Post-hoc tests are a topic for a more advanced course, but the key point is: always run ANOVA first, and only perform post-hoc comparisons if the F-test is significant.

Common F Critical Values Reference Table

$df_1$ , $df_2$	$\alpha = 0.05$
2, 10	4.103
2, 12	3.885
2, 15	3.682
2, 20	3.493
3, 12	3.490
3, 20	3.098
4, 20	2.866

To use the table: $df_1 = k - 1$ (number of groups minus 1) is the numerator and $df_2 = N - k$ (total observations minus number of groups) is the denominator. Reject $H_0$ if your calculated F exceeds the table value.

Real-World Application: Nursing — Comparing Pain Levels Across Treatments

A hospital pain management team wants to compare three approaches for post-surgical pain relief: medication only, medication combined with physical therapy, and physical therapy only. They randomly assign 24 patients (8 per group) and measure pain levels on a 0-to-10 scale after 48 hours.

Medication only ( $n = 8$ ): mean pain score $\bar{x}_1 = 5.2$ , standard deviation $s_1 = 1.8$
Medication + therapy ( $n = 8$ ): mean pain score $\bar{x}_2 = 3.4$ , standard deviation $s_2 = 1.5$
Therapy only ( $n = 8$ ): mean pain score $\bar{x}_3 = 6.1$ , standard deviation $s_3 = 2.0$

Grand mean: $\bar{x}_{..} = \frac{8(5.2) + 8(3.4) + 8(6.1)}{24} = \frac{41.6 + 27.2 + 48.8}{24} = \frac{117.6}{24} = 4.9$

$SS_B = 8(5.2 - 4.9)^2 + 8(3.4 - 4.9)^2 + 8(6.1 - 4.9)^2$

$= 8(0.09) + 8(2.25) + 8(1.44) = 0.72 + 18.0 + 11.52 = 30.24$

For $SS_W$ , use the fact that $s_i^2 = SS_i / (n_i - 1)$ , so $SS_i = s_i^2 \times (n_i - 1)$ :

$SS_W = 1.8^2(7) + 1.5^2(7) + 2.0^2(7) = 3.24(7) + 2.25(7) + 4.0(7) = 22.68 + 15.75 + 28.0 = 66.43$

Source	SS	df	MS	F
Between	30.24	2	15.12	4.780
Within	66.43	21	3.163
Total	96.67	23

$F = 15.12 / 3.163 = 4.780$ . The critical value for $F(2, 21)$ at $\alpha = 0.05$ is approximately 3.47. Since $4.780 > 3.47$ , we reject $H_0$ .

There is significant evidence that pain levels differ across the three treatment approaches. The combined medication-and-therapy group reported the lowest pain (mean 3.4), while therapy alone reported the highest (mean 6.1). This suggests the combined approach is most effective, and the nursing team should consider making it the standard protocol. Post-hoc tests would confirm which specific pairs of treatments differ significantly.

Practice Problems

Test your understanding with these problems. Click to reveal each answer.

Problem 1: A farmer tests three fertilizers on 12 plots (4 plots each). Crop yields (in bushels): Fertilizer A: 45, 50, 48, 47. Fertilizer B: 55, 58, 52, 55. Fertilizer C: 49, 51, 47, 53. Test at

\alpha = 0.05

Group means: $\bar{x}_A = 190/4 = 47.5$ , $\bar{x}_B = 220/4 = 55.0$ , $\bar{x}_C = 200/4 = 50.0$ .

Grand mean: $\bar{x}_{..} = (190 + 220 + 200)/12 = 610/12 = 50.833$ .

$SS_B = 4(47.5 - 50.833)^2 + 4(55 - 50.833)^2 + 4(50 - 50.833)^2$

$= 4(11.109) + 4(17.364) + 4(0.694) = 44.436 + 69.456 + 2.776 = 116.668$

Within-group sums of squares:

A: $(45-47.5)^2 + (50-47.5)^2 + (48-47.5)^2 + (47-47.5)^2 = 6.25 + 6.25 + 0.25 + 0.25 = 13.0$
B: $(55-55)^2 + (58-55)^2 + (52-55)^2 + (55-55)^2 = 0 + 9 + 9 + 0 = 18.0$
C: $(49-50)^2 + (51-50)^2 + (47-50)^2 + (53-50)^2 = 1 + 1 + 9 + 9 = 20.0$

$SS_W = 13.0 + 18.0 + 20.0 = 51.0$

$MS_B = 116.668/2 = 58.334$ , $MS_W = 51.0/9 = 5.667$ .

$F = 58.334/5.667 = 10.29$ .

Critical value $F(2, 9)$ at $\alpha = 0.05 \approx 4.256$ .

Since $10.29 > 4.256$ , reject $H_0$ .

Answer: There is significant evidence that the fertilizers produce different yields. Fertilizer B has the highest mean yield (55.0 bushels), followed by C (50.0) and A (47.5).

Problem 2: A company measures employee productivity (units per hour) across three departments. Department 1 (

n = 6

): mean 24.5,

s = 3.2

. Department 2 (

n = 6

): mean 22.0,

s = 2.8

. Department 3 (

n = 6

): mean 23.5,

s = 3.0

. Test at

\alpha = 0.05

Grand mean: $\bar{x}_{..} = (6 \times 24.5 + 6 \times 22.0 + 6 \times 23.5)/18 = (147 + 132 + 141)/18 = 420/18 = 23.333$

$SS_B = 6(24.5-23.333)^2 + 6(22.0-23.333)^2 + 6(23.5-23.333)^2$

$= 6(1.361) + 6(1.777) + 6(0.028) = 8.166 + 10.662 + 0.168 = 18.996$

$SS_W = 3.2^2(5) + 2.8^2(5) + 3.0^2(5) = 51.2 + 39.2 + 45.0 = 135.4$

$MS_B = 18.996/2 = 9.498$ , $MS_W = 135.4/15 = 9.027$ .

$F = 9.498/9.027 = 1.052$ .

Critical value $F(2, 15)$ at $\alpha = 0.05 = 3.682$ .

Since $1.052$ is less than $3.682$ , fail to reject $H_0$ .

Answer: There is not sufficient evidence that productivity differs across the three departments. The observed differences in means (24.5, 22.0, 23.5) are small relative to the within-group variability.

Problem 3: Four different medications are tested on groups of 10 patients each (

N = 40

). The ANOVA table shows

SS_B = 240

SS_W = 720

. Complete the table and test at

\alpha = 0.05

Source	SS	df	MS	F
Between	240	$4-1=3$	$240/3=80$	$80/20=4.0$
Within	720	$40-4=36$	$720/36=20$
Total	960	39

Check: $240 + 720 = 960$ ✓ and $3 + 36 = 39$ ✓.

Critical value $F(3, 36)$ at $\alpha = 0.05 \approx 2.87$ .

Since $4.0 > 2.87$ , reject $H_0$ .

Answer: There is significant evidence that at least one medication produces a different mean outcome. Post-hoc tests would be needed to determine which specific medications differ.

Problem 4: A researcher obtains

F = 2.15

with

df_1 = 3

and

df_2 = 20

\alpha = 0.05

. The critical value is

F_{0.05} = 3.098

. What is the conclusion?

Since $F = 2.15$ is less than the critical value of $3.098$ , we fail to reject $H_0$ .

Answer: There is not sufficient evidence to conclude that any of the four group means differ. The observed variation between groups is not large enough relative to the variation within groups to rule out random chance.

Problem 5: Three exercise programs are compared. Group 1 (

n = 8

): lost an average of 12.0 lbs with

s = 3.5

lbs. Group 2 (

n = 8

): lost 8.5 lbs with

s = 4.0

lbs. Group 3 (

n = 8

): lost 11.0 lbs with

s = 3.0

lbs. Test whether the programs produce different weight loss at

\alpha = 0.05

Grand mean: $\bar{x}_{..} = (8 \times 12.0 + 8 \times 8.5 + 8 \times 11.0)/24 = (96 + 68 + 88)/24 = 252/24 = 10.5$

$SS_B = 8(12.0 - 10.5)^2 + 8(8.5 - 10.5)^2 + 8(11.0 - 10.5)^2$

$= 8(2.25) + 8(4.0) + 8(0.25) = 18.0 + 32.0 + 2.0 = 52.0$

$SS_W = 3.5^2(7) + 4.0^2(7) + 3.0^2(7) = 85.75 + 112.0 + 63.0 = 260.75$

$MS_B = 52.0/2 = 26.0$ , $MS_W = 260.75/21 = 12.417$ .

$F = 26.0/12.417 = 2.094$ .

Critical value $F(2, 21)$ at $\alpha = 0.05 \approx 3.47$ .

Since $2.094$ is less than $3.47$ , fail to reject $H_0$ .

Answer: There is not sufficient evidence that the three exercise programs produce different average weight loss. Although Group 1 lost the most on average (12.0 lbs vs 8.5 and 11.0), the within-group variability is too large relative to the between-group differences to rule out chance.

Key Takeaways

ANOVA tests whether the means of three or more groups are all equal, using a single F-test instead of multiple pairwise t-tests
Running multiple t-tests inflates the Type I error rate — ANOVA controls it at one $\alpha$ level
ANOVA compares between-group variation ( $MS_B$ ) to within-group variation ( $MS_W$ ) using $F = MS_B / MS_W$
A large F value indicates the group means differ more than random variation would explain
The ANOVA table organizes the calculation: $SS_T = SS_B + SS_W$ and $df_T = df_B + df_W$ always hold
Conditions: random samples, independent groups, approximate normality within groups, and roughly equal variances
A significant F-test tells you at least one mean differs, but not which ones — use post-hoc tests (like Tukey’s HSD) to identify specific differences
In clinical and nursing settings, ANOVA is essential for comparing multiple treatment protocols, dosage levels, or care approaches to determine which produces the best patient outcomes

Return to Statistics for more topics in this section.

Next Up in Statistics

Introduction to Hypothesis Testing One-Sample T-Test Addition Rule of Probability Bayes' Theorem

All Statistics topics

Last updated: March 29, 2026