Chi-Square Tests
You should be comfortable with:
Medication dosages, IV drip rates, vital monitoring
Discounts, tax, tips, profit margins
The z-tests and t-tests you have learned so far work well for comparing means and proportions — but what about categorical data? When your variable is not a number but a category (eye color, political party, treatment outcome, product preference), you cannot compute a mean or a standard deviation. Instead, you work with counts — how many observations fall into each category. The chi-square test (pronounced “ky-square” and written ) is the standard tool for testing hypotheses about categorical data. In this lesson you will learn two versions: the goodness-of-fit test for one categorical variable, and the test of independence for two categorical variables.
The Chi-Square Statistic
Both chi-square tests use the same core statistic:
where is the observed count (what you actually counted in the data) and is the expected count (what you would expect if the null hypothesis were true). The sum is taken over all categories or cells.
The logic is straightforward:
- For each category, compute how far the observed count is from the expected count
- Square the difference (so negative and positive deviations both contribute positively)
- Divide by the expected count (a difference of 10 matters more when the expected count is 20 than when it is 2000)
- Add up all the terms
A large value means the observed data is far from what the null hypothesis predicts — evidence against . A small value means the data fits the expected pattern well — no reason to reject . The chi-square statistic is always non-negative (zero or positive), and the test is always right-tailed: you reject only when is large enough.
The degrees of freedom depend on which test you are performing, as described below.
Chi-Square Goodness-of-Fit Test
The goodness-of-fit test asks: does a single categorical variable follow a specified distribution? You have observed counts for each category, and you compare them to what you would expect under a hypothesized distribution.
- : the distribution of the variable matches the expected proportions
- : the distribution does not match the expected proportions
- Degrees of freedom:
Example 1: Testing a Die for Fairness
You roll a six-sided die 120 times and record the results:
| Face | 1 | 2 | 3 | 4 | 5 | 6 | Total |
|---|---|---|---|---|---|---|---|
| Observed | 15 | 22 | 25 | 18 | 17 | 23 | 120 |
| Expected | 20 | 20 | 20 | 20 | 20 | 20 | 120 |
If the die is fair, each face should appear times. Is there evidence the die is unfair? Test at .
Step 1: State the hypotheses.
Step 2: Calculate the chi-square statistic.
Step 3: Find degrees of freedom and the critical value.
At with , the critical value is .
Step 4: Make a decision. Since is less than , we fail to reject .
Step 5: Conclusion in context. There is no statistically significant evidence that the die is unfair. The observed variation (some faces appearing slightly more or less than 20 times) is well within what random chance would produce with a fair die.
Example 2: Customer Preference
A store manager believes customers prefer four product flavors equally. She surveys 200 customers:
| Flavor | Vanilla | Chocolate | Strawberry | Mint | Total |
|---|---|---|---|---|---|
| Observed | 62 | 48 | 55 | 35 | 200 |
| Expected | 50 | 50 | 50 | 50 | 200 |
With and , the critical value is . Since , we reject . There is evidence that customer preferences are not equally distributed. Mint appears to be the least popular flavor, while vanilla is the most popular.
Chi-Square Test of Independence
The test of independence asks: are two categorical variables related (associated), or are they independent? The data is organized in a contingency table (also called a two-way table) with rows for one variable and columns for the other.
- : the two variables are independent (no association)
- : the two variables are not independent (there is an association)
- Expected count for each cell:
- Degrees of freedom: , where is the number of rows and is the number of columns
Example 3: Smoking and Lung Disease
A health researcher collects data on 1000 adults:
| Lung Disease | No Lung Disease | Total | |
|---|---|---|---|
| Smoker | 90 | 210 | 300 |
| Non-smoker | 60 | 640 | 700 |
| Total | 150 | 850 | 1000 |
Is there an association between smoking status and lung disease? Test at .
Step 1: State the hypotheses.
Step 2: Calculate expected counts under independence.
Verify: ✓, ✓, ✓, ✓.
Step 3: Calculate the chi-square statistic.
Notice that every cell has the same squared difference (). This happens in a 2-by-2 table when the row and column totals are fixed.
Step 4: Find degrees of freedom and the critical value.
At with , the critical value is .
Step 5: Make a decision. Since far exceeds , we reject .
Step 6: Conclusion in context. There is overwhelming evidence of an association between smoking status and lung disease. Smokers had a lung disease rate of , compared to for non-smokers. While this test does not prove causation, the strong association is consistent with the well-established medical understanding that smoking increases lung disease risk.
Conditions for Chi-Square Tests
Both chi-square tests require the following conditions:
- Random sample or random assignment — the data must come from a properly designed study
- Independence — each observation must be independent of the others; one person’s response cannot influence another’s
- Expected counts are at least 5 — every cell in the table must have an expected count of 5 or more. This ensures the chi-square distribution is a good approximation for the test statistic. If any expected count is below 5, consider combining categories or using Fisher’s exact test (for 2-by-2 tables)
Note that the “at least 5” rule applies to expected counts, not observed counts. An observed count of 0 or 2 is fine as long as the expected count for that cell is at least 5.
Chi-Square Critical Values Reference Table
| df | |||
|---|---|---|---|
| 1 | 2.706 | 3.841 | 6.635 |
| 2 | 4.605 | 5.991 | 9.210 |
| 3 | 6.251 | 7.815 | 11.345 |
| 4 | 7.779 | 9.488 | 13.277 |
| 5 | 9.236 | 11.070 | 15.086 |
| 10 | 15.987 | 18.307 | 23.209 |
To use the table: find your degrees of freedom in the left column and your chosen across the top. Reject if your calculated exceeds the table value.
Real-World Application: Nursing — Treatment Effectiveness Across Treatment Types
A nurse researcher wants to know if the effectiveness of three wound-care treatments differs among three different treatments. She classifies 360 patients by treatment outcome (healed vs. not healed) and treatment type:
| Healed | Not Healed | Total | |
|---|---|---|---|
| Treatment A | 85 | 35 | 120 |
| Treatment B | 70 | 50 | 120 |
| Treatment C | 90 | 30 | 120 |
| Total | 245 | 115 | 360 |
Expected counts (each row total is 120, each column total is 245 or 115):
By symmetry (all row totals are 120), every row has the same expected counts: 81.67 healed and 38.33 not healed.
With and , the critical value is . Since , we reject . There is significant evidence that healing rates differ among the three treatments. Treatment C has the highest healing rate (75%), Treatment A is close behind (70.8%), and Treatment B is notably lower (58.3%). This information helps the nursing team prioritize Treatment C for patients who need the best chance of healing and investigate why Treatment B underperforms.
Practice Problems
Test your understanding with these problems. Click to reveal each answer.
Problem 1: A candy company claims its bags contain equal proportions of 5 colors. A student counts 200 candies: Red 52, Blue 38, Green 45, Yellow 30, Orange 35. Test whether the distribution matches the claim at .
: all five colors occur equally ( each). : not all equal.
. Critical value .
Since is less than , fail to reject .
Answer: There is not sufficient evidence to conclude the color distribution differs from equal proportions. The observed variation is consistent with random sampling from equal proportions.
Problem 2: A survey of 500 adults cross-classifies political affiliation (Democrat, Republican, Independent) with opinion on a policy (Favor, Oppose). Results: Dem favor 120, Dem oppose 80; Rep favor 70, Rep oppose 130; Ind favor 55, Ind oppose 45. Test for independence at .
| Favor | Oppose | Total | |
|---|---|---|---|
| Democrat | 120 | 80 | 200 |
| Republican | 70 | 130 | 200 |
| Independent | 55 | 45 | 100 |
| Total | 245 | 255 | 500 |
Expected counts: .
. Critical value .
Since , reject .
Answer: There is strong evidence of an association between political affiliation and policy opinion. Democrats favor the policy at a much higher rate (60%) than Republicans (35%), with Independents in between (55%).
Problem 3: A quality inspector checks 300 items from three production shifts. Shift A: 8 defective out of 100. Shift B: 15 defective out of 100. Shift C: 7 defective out of 100. Is there a significant difference in defect rates across shifts? ()
| Defective | Not Defective | Total | |
|---|---|---|---|
| Shift A | 8 | 92 | 100 |
| Shift B | 15 | 85 | 100 |
| Shift C | 7 | 93 | 100 |
| Total | 30 | 270 | 300 |
Expected counts (all row totals are 100): and for each shift.
. Critical value .
Since is less than , fail to reject .
Answer: There is not sufficient evidence of a significant difference in defect rates across shifts. Although Shift B has a higher observed defect rate (15%) compared to Shifts A (8%) and C (7%), this variation could plausibly be due to chance.
Problem 4: A genetics experiment predicts offspring in a 9:3:3:1 phenotype ratio. Out of 160 offspring observed: 84, 35, 26, 15. Test the genetic model at .
Expected counts based on the 9:3:3:1 ratio (out of 160):
- Category 1:
- Category 2:
- Category 3:
- Category 4:
Verify: ✓.
. Critical value .
Since is less than , fail to reject .
Answer: The observed data is consistent with the predicted 9:3:3:1 genetic ratio. There is no significant deviation from the expected phenotype distribution.
Problem 5: A hospital records patient satisfaction (Satisfied, Neutral, Dissatisfied) for two departments. Department X: 80 satisfied, 30 neutral, 10 dissatisfied. Department Y: 60 satisfied, 40 neutral, 20 dissatisfied. Is there an association between department and satisfaction? ()
| Satisfied | Neutral | Dissatisfied | Total | |
|---|---|---|---|---|
| Dept X | 80 | 30 | 10 | 120 |
| Dept Y | 60 | 40 | 20 | 120 |
| Total | 140 | 70 | 30 | 240 |
Expected counts (both row totals are 120):
. Critical value .
Since , reject .
Answer: There is significant evidence of an association between department and patient satisfaction. Department X has a higher proportion of satisfied patients (66.7% vs 50%) and a lower proportion of dissatisfied patients (8.3% vs 16.7%). Hospital administrators should investigate what Department X does differently.
Key Takeaways
- The chi-square statistic measures how far observed counts are from expected counts — it works exclusively with counts, not proportions or means
- The goodness-of-fit test checks whether a single categorical variable follows a specified distribution, with where is the number of categories
- The test of independence checks whether two categorical variables are associated, with and expected counts computed as
- Both tests are always right-tailed — you only reject when is large
- The key condition is that all expected counts must be at least 5
- A significant chi-square test tells you that the variables are associated, but it does not tell you the direction or strength — examine the observed percentages to interpret the nature of the relationship
- Chi-square tests are essential in medical research (treatment outcomes), quality control (defect patterns), survey analysis (opinion by demographic), and genetics (phenotype ratios)
Return to Statistics for more topics in this section.
Next Up in Statistics
Last updated: March 29, 2026