Statistics

Sampling Distributions

Last updated: March 2026 · Advanced
Before you start

You should be comfortable with:

Real-world applications
💊
Nursing

Medication dosages, IV drip rates, vital monitoring

If you take two different random samples from the same population, you will get different sample means. Take a third sample, and you will get yet another mean. This natural variability is not a flaw in the sampling process — it is an unavoidable feature of working with samples instead of entire populations. Sampling distributions describe exactly how much this variability is, what shape it takes, and how it shrinks as your sample size grows. Understanding sampling distributions is the key to understanding confidence intervals, hypothesis tests, and virtually all of statistical inference.

What Is a Sampling Distribution?

A sampling distribution is not the distribution of data in a single sample. That is a common and important misconception to clear up right away. A sampling distribution is the distribution of a statistic — such as the sample mean xˉ\bar{x} or the sample proportion p^\hat{p} — computed across many possible samples of the same size drawn from the same population.

Here is the thought experiment that makes this concrete:

  1. You have a population with some mean μ\mu and standard deviation σ\sigma.
  2. You draw a random sample of size nn and compute the sample mean xˉ\bar{x}.
  3. You put that sample back (or draw from a very large population) and draw another sample of size nn. You compute xˉ\bar{x} again.
  4. You repeat this process 1,000 times — or 10,000 times, or a million times.
  5. Now you have 1,000 (or 10,000, or a million) values of xˉ\bar{x}. Plot them in a histogram.

That histogram is the sampling distribution of xˉ\bar{x}. It shows you all the possible values the sample mean could take and how likely each value is. In practice, you never actually draw thousands of samples — but the sampling distribution tells you what would happen if you did, and that knowledge is the foundation of statistical inference.

Sampling Distribution of the Sample Mean

The sampling distribution of xˉ\bar{x} has three key properties:

Center: The mean of the sampling distribution equals the population mean:

μxˉ=μ\mu_{\bar{x}} = \mu

This says that the sample mean is an unbiased estimator of the population mean. If you averaged all possible sample means, you would get exactly μ\mu. Individual samples may overshoot or undershoot, but on average they hit the target.

Spread: The standard deviation of the sampling distribution is:

σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}

This quantity is called the standard error of the mean. Notice the critical role of nn: as the sample size increases, the standard error decreases. Larger samples produce more consistent (less variable) estimates.

Shape: When nn is large enough, the sampling distribution of xˉ\bar{x} is approximately normal — regardless of the shape of the population. This remarkable fact is the Central Limit Theorem, covered on the next page. If the population itself is normal, then the sampling distribution of xˉ\bar{x} is exactly normal for any sample size.

Example 1: Adult Heights

Suppose adult heights in a population have μ=170\mu = 170 cm and σ=8\sigma = 8 cm. You plan to take random samples of n=64n = 64 adults and compute xˉ\bar{x} for each sample.

The sampling distribution of xˉ\bar{x} has:

  • Mean: μxˉ=170\mu_{\bar{x}} = 170 cm
  • Standard error: σxˉ=864=88=1\sigma_{\bar{x}} = \frac{8}{\sqrt{64}} = \frac{8}{8} = 1 cm

Individual heights vary with a standard deviation of 8 cm. But sample means (with n=64n = 64) vary with a standard deviation of only 1 cm. The averaging process dramatically reduces variability — your sample mean will almost certainly be within a few centimeters of the true population mean.

Individual Heights vs. Sampling Distribution of the Mean (n = 64)

154162170178186Individual heights (σ = 8)Sample means (SE = 1)Height (cm)

The diagram above illustrates the key idea: while individual heights (dashed blue curve) spread widely around 170 cm, sample means with n=64n = 64 (solid green curve) cluster tightly around 170 cm. The sampling distribution is far narrower than the population distribution.

Standard Error

The standard error (SE) is the standard deviation of a sampling distribution. It measures how much a statistic typically varies from sample to sample.

Standard error of the mean:

SE=σnSE = \frac{\sigma}{\sqrt{n}}

When the population standard deviation σ\sigma is unknown (which is typical in practice), we estimate it from the sample:

SEsnSE \approx \frac{s}{\sqrt{n}}

where ss is the sample standard deviation.

The key insight is the n\sqrt{n} in the denominator: to cut the standard error in half, you must quadruple the sample size. This is the law of diminishing returns for sampling — initial increases in sample size produce large gains in precision, but each further increase yields less additional benefit.

Example 2: Effect of Sample Size

A population has σ=20\sigma = 20. How does the standard error change as nn increases?

Sample size (nn)n\sqrt{n}SE=20nSE = \frac{20}{\sqrt{n}}
2554.0
100102.0
400201.0
1,600400.5

Going from n=25n = 25 to n=100n = 100 (a fourfold increase) cuts the SE from 4.0 to 2.0 — halving it. Going from n=100n = 100 to n=400n = 400 (another fourfold increase) cuts the SE from 2.0 to 1.0 — halving it again. Each time you want to halve the SE, you need four times as many observations. This relationship governs how researchers choose sample sizes for studies.

Sampling Distribution of the Sample Proportion

When working with categorical data (yes/no, pass/fail, support/oppose), the relevant statistic is the sample proportion p^\hat{p}. Its sampling distribution has analogous properties:

Center: The mean of p^\hat{p} equals the population proportion:

μp^=p\mu_{\hat{p}} = p

The sample proportion is an unbiased estimator of pp.

Spread: The standard error of p^\hat{p} is:

σp^=p(1p)n\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}

Shape: The sampling distribution of p^\hat{p} is approximately normal when both of these conditions are met:

np10andn(1p)10np \geq 10 \quad \text{and} \quad n(1-p) \geq 10

These conditions ensure that there are enough “successes” and “failures” in the expected sample for the normal approximation to work well.

Example 3: Voter Support

A candidate has true support of p=0.60p = 0.60 in a population. A pollster takes a random sample of n=200n = 200 voters.

Standard error:

σp^=0.60×0.40200=0.24200=0.00120.0346\sigma_{\hat{p}} = \sqrt{\frac{0.60 \times 0.40}{200}} = \sqrt{\frac{0.24}{200}} = \sqrt{0.0012} \approx 0.0346

Normality check:

  • np=200×0.60=12010np = 200 \times 0.60 = 120 \geq 10 --- condition met
  • n(1p)=200×0.40=8010n(1-p) = 200 \times 0.40 = 80 \geq 10 --- condition met

So p^\hat{p} is approximately normal with mean 0.600.60 and standard error 0.03460.0346.

Interpretation: In repeated polls of 200 voters, the sample proportion would typically be within about 2×0.0346=0.06922 \times 0.0346 = 0.0692 (about 7 percentage points) of the true proportion. This is why polls report a “margin of error” — it comes directly from the standard error of the sampling distribution.

Why Sampling Distributions Matter

Sampling distributions are not just an abstract concept — they are the engine that powers statistical inference. Here is why:

  • Confidence intervals use the standard error to build a range of plausible values around a sample statistic. A 95% confidence interval for a mean is approximately xˉ±2×SE\bar{x} \pm 2 \times SE, which comes directly from the sampling distribution.
  • Hypothesis tests ask: “If the true parameter were some specific value, how likely is the sample result we observed?” The sampling distribution provides the answer by telling us what values are typical and what values are surprising.
  • Margin of error in polls and surveys is a direct application of the standard error of p^\hat{p}.

Without understanding sampling distributions, confidence intervals and p-values are just mysterious formulas. With this understanding, they become logical consequences of how statistics vary from sample to sample.

Real-World Application: Nursing — Precision of Blood Pressure Estimates

A hospital clinic wants to estimate the average systolic blood pressure of its patient population. From prior studies, σ=18\sigma = 18 mmHg is a reasonable estimate of the population standard deviation.

How precise is the estimate for different sample sizes?

Patients measured (nn)SE=18nSE = \frac{18}{\sqrt{n}}95% margin of error (2×SE\approx 2 \times SE)
9183=6.0\frac{18}{3} = 6.0 mmHg±\pm12.0 mmHg
36186=3.0\frac{18}{6} = 3.0 mmHg±\pm6.0 mmHg
1001810=1.8\frac{18}{10} = 1.8 mmHg±\pm3.6 mmHg
2251815=1.2\frac{18}{15} = 1.2 mmHg±\pm2.4 mmHg

With only 9 patients, the estimate could easily be off by 12 mmHg — that is clinically meaningless because it spans the difference between normal and hypertensive. With 100 patients, the margin shrinks to 3.6 mmHg, which is precise enough for most clinical decisions. With 225 patients, the margin is 2.4 mmHg — very precise, but the improvement from 100 to 225 patients is modest compared to the improvement from 9 to 36.

This illustrates the practical tradeoff that nurses, physicians, and public health researchers face: more measurements improve precision, but with diminishing returns. The sampling distribution quantifies exactly how much precision you gain for each additional measurement.

Practice Problems

Test your understanding with these problems. Click to reveal each answer.

Problem 1: A population has μ=50\mu = 50 and σ=12\sigma = 12. For samples of size n=36n = 36, find the mean and standard error of the sampling distribution of xˉ\bar{x}.

Mean of sampling distribution:

μxˉ=μ=50\mu_{\bar{x}} = \mu = 50

Standard error:

σxˉ=σn=1236=126=2\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = \frac{12}{6} = 2

Answer: The sampling distribution of xˉ\bar{x} has mean 50 and standard error 2.

Problem 2: A researcher wants to cut the standard error of her estimate in half. Her current sample size is n=50n = 50. What sample size does she need?

To halve the standard error, you must quadruple the sample size:

nnew=4×50=200n_{\text{new}} = 4 \times 50 = 200

Verification: SEold=σ50SE_{\text{old}} = \frac{\sigma}{\sqrt{50}}. SEnew=σ200=σ4×50=σ250=12SEoldSE_{\text{new}} = \frac{\sigma}{\sqrt{200}} = \frac{\sigma}{\sqrt{4 \times 50}} = \frac{\sigma}{2\sqrt{50}} = \frac{1}{2} \cdot SE_{\text{old}} --- confirmed.

Answer: She needs a sample size of 200.

Problem 3: A population proportion is p=0.35p = 0.35 and the sample size is n=150n = 150. Find the standard error of p^\hat{p} and check whether the normal approximation is valid.

Standard error:

σp^=0.35×0.65150=0.2275150=0.0015170.0389\sigma_{\hat{p}} = \sqrt{\frac{0.35 \times 0.65}{150}} = \sqrt{\frac{0.2275}{150}} = \sqrt{0.001517} \approx 0.0389

Normality check:

  • np=150×0.35=52.510np = 150 \times 0.35 = 52.5 \geq 10 --- met
  • n(1p)=150×0.65=97.510n(1-p) = 150 \times 0.65 = 97.5 \geq 10 --- met

Answer: The standard error is approximately 0.039, and the normal approximation is valid because both conditions are satisfied.

Problem 4: Two researchers study the same population (σ=30\sigma = 30). Researcher A uses n=100n = 100 and Researcher B uses n=900n = 900. How do their standard errors compare?

Researcher A:

SEA=30100=3010=3SE_A = \frac{30}{\sqrt{100}} = \frac{30}{10} = 3

Researcher B:

SEB=30900=3030=1SE_B = \frac{30}{\sqrt{900}} = \frac{30}{30} = 1

Researcher B’s standard error is one-third of Researcher A’s. Since 900=9×100900 = 9 \times 100, the sample size increased by a factor of 9, so the standard error decreased by a factor of 9=3\sqrt{9} = 3.

Answer: Researcher A has SE=3SE = 3 and Researcher B has SE=1SE = 1. The ninefold increase in sample size reduced the standard error to one-third its original value.

Problem 5: In a large city, 20% of residents speak a language other than English at home (p=0.20p = 0.20). A survey samples n=400n = 400 residents. What is the standard error of p^\hat{p}, and within what range would you expect 95% of sample proportions to fall?

Standard error:

σp^=0.20×0.80400=0.16400=0.0004=0.02\sigma_{\hat{p}} = \sqrt{\frac{0.20 \times 0.80}{400}} = \sqrt{\frac{0.16}{400}} = \sqrt{0.0004} = 0.02

95% range (approximately p^±2×SE\hat{p} \pm 2 \times SE):

0.20±2(0.02)=0.20±0.040.20 \pm 2(0.02) = 0.20 \pm 0.04

So 95% of sample proportions would fall between 0.16 and 0.24 (i.e., between 16% and 24%).

Normality check: np=8010np = 80 \geq 10 and n(1p)=32010n(1-p) = 320 \geq 10 --- valid.

Answer: The standard error is 0.02, and approximately 95% of sample proportions would fall between 0.16 and 0.24.

Key Takeaways

  • A sampling distribution is the distribution of a statistic (like xˉ\bar{x} or p^\hat{p}) across all possible samples of a given size — it describes how the statistic varies from sample to sample
  • The sampling distribution of xˉ\bar{x} is centered at μ\mu (unbiased) with standard error σxˉ=σn\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}
  • The sampling distribution of p^\hat{p} is centered at pp (unbiased) with standard error σp^=p(1p)n\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}
  • Standard error measures the typical distance between a sample statistic and the population parameter — it is the standard deviation of the sampling distribution
  • Larger samples produce smaller standard errors — but you must quadruple nn to halve the SE
  • The normal approximation for p^\hat{p} requires np10np \geq 10 and n(1p)10n(1-p) \geq 10
  • Sampling distributions are the foundation of confidence intervals and hypothesis tests — they tell us what to expect when sampling from a population

Return to Statistics for more topics in this section.

Last updated: March 29, 2026