Statistics

Two-Way Tables and Probability

Last updated: March 2026 · Intermediate

Before you start

You should be comfortable with:

Conditional Probability

Real-world applications

💊

Nursing

Medication dosages, IV drip rates, vital monitoring

💰

Retail & Finance

Discounts, tax, tips, profit margins

A two-way table (also called a contingency table) organizes data by two categorical variables. It is one of the most practical tools for calculating real-world probabilities because it displays all the information you need — joint counts, row totals, column totals, and the grand total — in a single grid.

Anatomy of a Two-Way Table

Every two-way table has the same structure:

Rows represent the categories of one variable
Columns represent the categories of the other variable
Cells (the interior values) show how many observations fall into each combination
Row totals (rightmost column) sum across each row
Column totals (bottom row) sum down each column
Grand total (bottom-right corner) is the total number of observations

	Category X₁	Category X₂	Row Total
Category Y₁	cell count	cell count	row sum
Category Y₂	cell count	cell count	row sum
Column Total	col sum	col sum	Grand Total

Each cell count, row total, column total, and grand total gives you a different type of probability.

Joint, Marginal, and Conditional Probabilities

Three types of probability can be read from a two-way table:

Joint probability — the probability that two specific categories occur together:

$P(A \text{ and } B) = \frac{\text{cell count}}{\text{grand total}}$

Marginal probability — the probability of a single category, ignoring the other variable:

$P(A) = \frac{\text{row or column total}}{\text{grand total}}$

The name “marginal” comes from the fact that these totals appear in the margins (edges) of the table.

Conditional probability — the probability of one category given that another is known:

$P(A \mid B) = \frac{\text{cell count}}{\text{row or column total for the given condition}}$

Worked Example: Student Survey

Example 1

A university surveyed 650 students about their preferred class format. The results are organized by class year:

	Prefers Online	Prefers In-Person	Total
Freshman	80	120	200
Sophomore	110	90	200
Junior	95	55	150
Senior	65	35	100
Total	350	300	650

Verification: Row totals: $200 + 200 + 150 + 100 = 650$ . Column totals: $350 + 300 = 650$ . Both match the grand total.

Let’s calculate each type of probability.

Joint probability: What is the probability that a randomly selected student is a freshman who prefers online classes?

$P(\text{Freshman and Online}) = \frac{80}{650} \approx 0.123 = 12.3\%$

Marginal probabilities: What is the overall probability of being a freshman? Of preferring online?

$P(\text{Freshman}) = \frac{200}{650} \approx 0.308 = 30.8\%$

$P(\text{Online}) = \frac{350}{650} \approx 0.538 = 53.8\%$

Conditional probability: Given that a student is a freshman, what is the probability they prefer online classes?

Restrict to the Freshman row (200 students) and look at the Online cell (80):

$P(\text{Online} \mid \text{Freshman}) = \frac{80}{200} = 0.40 = 40\%$

Reversed conditional: Given that a student prefers online classes, what is the probability they are a freshman?

Restrict to the Online column (350 students) and look at the Freshman cell (80):

$P(\text{Freshman} \mid \text{Online}) = \frac{80}{350} \approx 0.229 = 22.9\%$

Notice that 40% of freshmen prefer online, but only 22.9% of online-preferring students are freshmen. These are different questions — always check which condition goes after the ”|” symbol.

Testing for Independence Using Two-Way Tables

Two variables are independent if knowing one does not change the probability of the other. The test is straightforward:

$\text{If } P(A \mid B) = P(A), \text{ the variables are independent.}$

Equivalently, you can check whether $P(A \text{ and } B) = P(A) \times P(B)$ .

Example 2: Are Class Year and Format Preference Independent?

Using the student survey data, check whether class year and format preference are independent.

Overall probability of preferring online:

$P(\text{Online}) = \frac{350}{650} \approx 0.538$

Probability of preferring online for each class year:

$P(\text{Online} \mid \text{Freshman}) = \frac{80}{200} = 0.400$

$P(\text{Online} \mid \text{Sophomore}) = \frac{110}{200} = 0.550$

$P(\text{Online} \mid \text{Junior}) = \frac{95}{150} \approx 0.633$

$P(\text{Online} \mid \text{Senior}) = \frac{65}{100} = 0.650$

If class year and format preference were independent, all of these conditional probabilities would equal the marginal probability of 0.538. Instead, they range from 0.400 (freshmen) to 0.650 (seniors).

Conclusion: The variables are not independent. Online preference increases with class year — freshmen are the least likely to prefer online (40%), while seniors are the most likely (65%).

Relative Frequency Tables

A relative frequency table converts all counts to proportions by dividing every cell by the grand total. This makes it easy to read joint probabilities directly from the table.

Starting with the student survey:

	Prefers Online	Prefers In-Person	Total
Freshman	80/650 ≈ 0.123	120/650 ≈ 0.185	0.308
Sophomore	110/650 ≈ 0.169	90/650 ≈ 0.138	0.308
Junior	95/650 ≈ 0.146	55/650 ≈ 0.085	0.231
Senior	65/650 ≈ 0.100	35/650 ≈ 0.054	0.154
Total	0.538	0.462	1.000

Now every cell is a joint probability, every margin is a marginal probability, and the grand total is 1.000. You can also create row-relative tables (each row sums to 1) to compare conditional probabilities across groups, or column-relative tables (each column sums to 1) to compare the composition within each column.

Row-relative table (each row divided by its row total):

	Prefers Online	Prefers In-Person	Total
Freshman	80/200 = 0.400	120/200 = 0.600	1.000
Sophomore	110/200 = 0.550	90/200 = 0.450	1.000
Junior	95/150 ≈ 0.633	55/150 ≈ 0.367	1.000
Senior	65/100 = 0.650	35/100 = 0.350	1.000

This table makes the trend immediately visible: online preference grows steadily from 40% among freshmen to 65% among seniors.

Building a Two-Way Table from Raw Data

Sometimes you need to construct the table yourself from a description. Here is the process:

Example 3

A store tracks 400 customer transactions. Of the 240 cash transactions, 36 involved a return. Of the 160 credit card transactions, 48 involved a return. Build the two-way table and find the probability of a return given credit card payment.

Step 1: Set up the rows and columns.

	Return	No Return	Total
Cash	36	?	240
Credit	48	?	160
Total	?	?	400

Step 2: Fill in the missing cells by subtraction.

Cash, No Return: $240 - 36 = 204$
Credit, No Return: $160 - 48 = 112$
Total Returns: $36 + 48 = 84$
Total No Returns: $204 + 112 = 316$

	Return	No Return	Total
Cash	36	204	240
Credit	48	112	160
Total	84	316	400

Verification: $84 + 316 = 400$ . $240 + 160 = 400$ . Both match.

Step 3: Answer the question.

$P(\text{Return} \mid \text{Credit}) = \frac{48}{160} = 0.30 = 30\%$

For comparison: $P(\text{Return} \mid \text{Cash}) = \frac{36}{240} = 0.15 = 15\%$ . Credit card purchases have double the return rate.

Real-World Application: Nursing — Treatment Outcomes

A hospital compares outcomes for two physical therapy approaches used on 200 patients recovering from knee surgery:

	Improved	No Change	Worsened	Total
Traditional PT	55	35	10	100
Aquatic PT	70	22	8	100
Total	125	57	18	200

Key conditional probabilities for the nursing team:

$P(\text{Improved} \mid \text{Traditional}) = \frac{55}{100} = 0.55 = 55\%$

$P(\text{Improved} \mid \text{Aquatic}) = \frac{70}{100} = 0.70 = 70\%$

$P(\text{Worsened} \mid \text{Traditional}) = \frac{10}{100} = 0.10 = 10\%$

$P(\text{Worsened} \mid \text{Aquatic}) = \frac{8}{100} = 0.08 = 8\%$

Aquatic PT shows a higher improvement rate (70% vs 55%) and a slightly lower worsening rate (8% vs 10%). A nurse reviewing this data could use these conditional probabilities to inform patient discussions — while noting that this observational data alone cannot prove causation (other factors like patient age or injury severity could explain the difference).

Practice Problems

Test your understanding with these problems. Click to reveal each answer.

Problem 1: A survey of 300 adults asked about exercise habits and sleep quality. Of 180 who exercise regularly, 126 reported good sleep. Of 120 who do not exercise regularly, 48 reported good sleep. What is

P(\text{Good Sleep} \mid \text{Exercise})

$P(\text{Good Sleep} \mid \text{Exercise}) = \frac{126}{180} = 0.70 = 70\%$

Answer: Among those who exercise regularly, 70% report good sleep quality.

Problem 2: Using the same data, what is

P(\text{Exercise} \mid \text{Good Sleep})

Total with good sleep: $126 + 48 = 174$ .

$P(\text{Exercise} \mid \text{Good Sleep}) = \frac{126}{174} \approx 0.724 = 72.4\%$

Answer: Among those with good sleep, about 72.4% exercise regularly. This is a different question from Problem 1.

Problem 3: Using the student survey data at the top of this page, what is the joint probability that a randomly selected student is a junior who prefers in-person classes?

$P(\text{Junior and In-Person}) = \frac{55}{650} \approx 0.085 = 8.5\%$

Answer: About 8.5% of all students surveyed are juniors who prefer in-person classes.

Problem 4: A hospital tested 500 patients. Of 300 who received Drug A, 240 improved. Of 200 who received Drug B, 140 improved. Are drug type and outcome independent?

$P(\text{Improved}) = \frac{240 + 140}{500} = \frac{380}{500} = 0.76$

$P(\text{Improved} \mid \text{Drug A}) = \frac{240}{300} = 0.80$

$P(\text{Improved} \mid \text{Drug B}) = \frac{140}{200} = 0.70$

Since $0.80 \neq 0.76$ and $0.70 \neq 0.76$ , the variables are not independent. Drug A has a higher improvement rate.

Problem 5: A store tracked 500 orders. Online: 300 total (45 returned, 255 not). In-Store: 200 total (15 returned, 185 not). Find the joint, marginal, and conditional probabilities for “Online and Returned.”

Joint: $P(\text{Online and Returned}) = \frac{45}{500} = 0.09 = 9\%$

Marginal: $P(\text{Online}) = \frac{300}{500} = 0.60$ , and $P(\text{Returned}) = \frac{60}{500} = 0.12$

Conditional: $P(\text{Returned} \mid \text{Online}) = \frac{45}{300} = 0.15 = 15\%$

Also: $P(\text{Online} \mid \text{Returned}) = \frac{45}{60} = 0.75 = 75\%$

Answer: The joint probability is 9%, the marginal probabilities are 60% (Online) and 12% (Returned), and the conditional probability of a return given online purchase is 15%.

Key Takeaways

A two-way table organizes data by two categorical variables, displaying cell counts, row totals, column totals, and the grand total.
Joint probability: divide the cell count by the grand total — $P(A \text{ and } B) = \frac{\text{cell}}{\text{grand total}}$ .
Marginal probability: divide the row or column total by the grand total — $P(A) = \frac{\text{margin}}{\text{grand total}}$ .
Conditional probability: divide the cell count by the relevant row or column total — $P(A \mid B) = \frac{\text{cell}}{\text{condition total}}$ .
Relative frequency tables convert counts to proportions, making probabilities directly readable.
To test for independence, check whether conditional probabilities equal the corresponding marginal probabilities.
Two-way tables are widely used in healthcare, business, and social science to compare outcomes across groups.

Return to Statistics for more topics in this section.

Next Up in Statistics

Conditional Probability Addition Rule of Probability One-Way ANOVA Bayes' Theorem

All Statistics topics

Last updated: March 29, 2026