Statistics

Statistics Formulas Cheat Sheet

Central Tendency

Measure	Formula	Use
Mean (population)	$\mu = \dfrac{\sum x_i}{N}$	Average of all values in a population
Mean (sample)	$\bar{x} = \dfrac{\sum x_i}{n}$	Average of all values in a sample
Median	Middle value when data is sorted	Central value; resistant to outliers
Mode	Most frequently occurring value	Most common category or value
Weighted mean	$\bar{x}_w = \dfrac{\sum w_i x_i}{\sum w_i}$	Average where values have different weights

Spread and Variability

Measure	Formula	Use
Range	$\text{Range} = x_{\max} - x_{\min}$	Distance from smallest to largest
Interquartile range	$\text{IQR} = Q_3 - Q_1$	Spread of the middle 50% of data
Variance (population)	$\sigma^2 = \dfrac{\sum (x_i - \mu)^2}{N}$	Average squared deviation from the mean
Variance (sample)	$s^2 = \dfrac{\sum (x_i - \bar{x})^2}{n - 1}$	Estimated variance using Bessel’s correction
Std deviation (population)	$\sigma = \sqrt{\dfrac{\sum (x_i - \mu)^2}{N}}$	Typical distance from the mean
Std deviation (sample)	$s = \sqrt{\dfrac{\sum (x_i - \bar{x})^2}{n - 1}}$	Estimated typical distance from the mean
Coefficient of variation	$\text{CV} = \dfrac{s}{\bar{x}} \times 100\%$	Relative variability (unitless)

Measures of Position

Measure	Formula	Use
Percentile rank	$P = \dfrac{\text{Values below } x}{n} \times 100$	Percentage of data at or below a value
Z-score	$z = \dfrac{x - \mu}{\sigma}$ or $z = \dfrac{x - \bar{x}}{s}$	How many standard deviations from the mean
Quartiles	$Q_1$ = 25th percentile, $Q_2$ = median, $Q_3$ = 75th	Divide data into four equal parts
Outlier boundaries	Below $Q_1 - 1.5 \times \text{IQR}$ or above $Q_3 + 1.5 \times \text{IQR}$	Identify unusually extreme values

Probability Rules

Rule	Formula
Probability of an event	$P(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}$
Complement rule	$P(A') = 1 - P(A)$
Addition (mutually exclusive)	$P(A \text{ or } B) = P(A) + P(B)$
Addition (general)	$P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)$
Multiplication (independent)	$P(A \text{ and } B) = P(A) \cdot P(B)$
Multiplication (general)	$P(A \text{ and } B) = P(A) \cdot P(B \mid A)$
Conditional probability	$P(A \mid B) = \dfrac{P(A \text{ and } B)}{P(B)}$
Bayes’ theorem	$P(A \mid B) = \dfrac{P(B \mid A) \cdot P(A)}{P(B)}$

Counting Techniques

Method	Formula	Use
Fundamental counting principle	$n_1 \times n_2 \times \cdots \times n_k$	Total outcomes from sequential choices
Factorial	$n! = n \times (n-1) \times \cdots \times 1$	Number of ways to arrange $n$ items
Permutations	$P(n, r) = \dfrac{n!}{(n - r)!}$	Ordered arrangements of $r$ items from $n$
Combinations	$C(n, r) = \dbinom{n}{r} = \dfrac{n!}{r!(n - r)!}$	Unordered selections of $r$ items from $n$

Discrete Distributions

Distribution	Formula	Parameters
Expected value	$E(X) = \sum x_i \cdot P(x_i)$	$x_i$ = values, $P(x_i)$ = probabilities
Variance of $X$	$\text{Var}(X) = \sum (x_i - \mu)^2 \cdot P(x_i)$	or $E(X^2) - [E(X)]^2$
Binomial probability	$P(X = k) = \dbinom{n}{k} p^k (1-p)^{n-k}$	$n$ = trials, $p$ = success probability
Binomial mean	$\mu = np$
Binomial std deviation	$\sigma = \sqrt{np(1-p)}$
Geometric probability	$P(X = k) = (1-p)^{k-1} p$	First success on trial $k$
Poisson probability	$P(X = k) = \dfrac{e^{-\lambda} \lambda^k}{k!}$	$\lambda$ = average rate

Normal Distribution

Formula	Use
$z = \dfrac{x - \mu}{\sigma}$	Convert a value to a z-score
$x = \mu + z\sigma$	Convert a z-score back to a value
Empirical rule: 68% within $\mu \pm \sigma$	Approximate middle proportion
Empirical rule: 95% within $\mu \pm 2\sigma$	Approximate large majority
Empirical rule: 99.7% within $\mu \pm 3\sigma$	Nearly all data

Confidence Intervals

Parameter	Formula	Conditions
Mean ( $\sigma$ known)	$\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}$	Normal population or $n \geq 30$
Mean ( $\sigma$ unknown)	$\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}$	Normal population or $n \geq 30$ ; use $t$ with $df = n - 1$
Proportion	$\hat{p} \pm z^* \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}$	$n\hat{p} \geq 10$ and $n(1-\hat{p}) \geq 10$
Margin of error	$E = z^* \cdot \dfrac{\sigma}{\sqrt{n}}$ or $E = t^* \cdot \dfrac{s}{\sqrt{n}}$	Half-width of the confidence interval
Sample size (mean)	$n = \left(\dfrac{z^* \cdot \sigma}{E}\right)^2$	Minimum $n$ for desired margin of error
Sample size (proportion)	$n = \hat{p}(1-\hat{p})\left(\dfrac{z^*}{E}\right)^2$	Use $\hat{p} = 0.5$ if unknown

Common Critical Values

Confidence Level	$z^*$
90%	1.645
95%	1.960
99%	2.576

Hypothesis Tests

Test	Test Statistic	Use
Z-test for proportion	$z = \dfrac{\hat{p} - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}$	Test a claim about a population proportion
Z-test for mean ( $\sigma$ known)	$z = \dfrac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$	Test a claim about a population mean
T-test for mean ( $\sigma$ unknown)	$t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}}$	Test a claim about a mean; $df = n - 1$
Two-sample t-test	$t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}$	Compare means of two independent groups
Paired t-test	$t = \dfrac{\bar{d}}{s_d / \sqrt{n}}$	Compare means of matched pairs; $\bar{d}$ = mean difference
Chi-square goodness of fit	$\chi^2 = \sum \dfrac{(O - E)^2}{E}$	Test if observed frequencies match expected
Chi-square independence	$\chi^2 = \sum \dfrac{(O - E)^2}{E}$	Test if two categorical variables are related

Linear Regression

Formula	Use
$\hat{y} = a + bx$	Predicted value from the regression line
$b = \dfrac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}$	Slope of the least-squares line
$a = \bar{y} - b\bar{x}$	Y-intercept of the least-squares line
$r = \dfrac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}$	Correlation coefficient
$R^2 = r^2$	Proportion of variance explained by the model

Return to Statistics for the full topic list.