Statistics

Statistics Formulas Cheat Sheet

Central Tendency

MeasureFormulaUse
Mean (population)μ=xiN\mu = \dfrac{\sum x_i}{N}Average of all values in a population
Mean (sample)xˉ=xin\bar{x} = \dfrac{\sum x_i}{n}Average of all values in a sample
MedianMiddle value when data is sortedCentral value; resistant to outliers
ModeMost frequently occurring valueMost common category or value
Weighted meanxˉw=wixiwi\bar{x}_w = \dfrac{\sum w_i x_i}{\sum w_i}Average where values have different weights

Spread and Variability

MeasureFormulaUse
RangeRange=xmaxxmin\text{Range} = x_{\max} - x_{\min}Distance from smallest to largest
Interquartile rangeIQR=Q3Q1\text{IQR} = Q_3 - Q_1Spread of the middle 50% of data
Variance (population)σ2=(xiμ)2N\sigma^2 = \dfrac{\sum (x_i - \mu)^2}{N}Average squared deviation from the mean
Variance (sample)s2=(xixˉ)2n1s^2 = \dfrac{\sum (x_i - \bar{x})^2}{n - 1}Estimated variance using Bessel’s correction
Std deviation (population)σ=(xiμ)2N\sigma = \sqrt{\dfrac{\sum (x_i - \mu)^2}{N}}Typical distance from the mean
Std deviation (sample)s=(xixˉ)2n1s = \sqrt{\dfrac{\sum (x_i - \bar{x})^2}{n - 1}}Estimated typical distance from the mean
Coefficient of variationCV=sxˉ×100%\text{CV} = \dfrac{s}{\bar{x}} \times 100\%Relative variability (unitless)

Measures of Position

MeasureFormulaUse
Percentile rankP=Values below xn×100P = \dfrac{\text{Values below } x}{n} \times 100Percentage of data at or below a value
Z-scorez=xμσz = \dfrac{x - \mu}{\sigma} or z=xxˉsz = \dfrac{x - \bar{x}}{s}How many standard deviations from the mean
QuartilesQ1Q_1 = 25th percentile, Q2Q_2 = median, Q3Q_3 = 75thDivide data into four equal parts
Outlier boundariesBelow Q11.5×IQRQ_1 - 1.5 \times \text{IQR} or above Q3+1.5×IQRQ_3 + 1.5 \times \text{IQR}Identify unusually extreme values

Probability Rules

RuleFormula
Probability of an eventP(A)=favorable outcomestotal outcomesP(A) = \dfrac{\text{favorable outcomes}}{\text{total outcomes}}
Complement ruleP(A)=1P(A)P(A') = 1 - P(A)
Addition (mutually exclusive)P(A or B)=P(A)+P(B)P(A \text{ or } B) = P(A) + P(B)
Addition (general)P(A or B)=P(A)+P(B)P(A and B)P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)
Multiplication (independent)P(A and B)=P(A)P(B)P(A \text{ and } B) = P(A) \cdot P(B)
Multiplication (general)P(A and B)=P(A)P(BA)P(A \text{ and } B) = P(A) \cdot P(B \mid A)
Conditional probabilityP(AB)=P(A and B)P(B)P(A \mid B) = \dfrac{P(A \text{ and } B)}{P(B)}
Bayes’ theoremP(AB)=P(BA)P(A)P(B)P(A \mid B) = \dfrac{P(B \mid A) \cdot P(A)}{P(B)}

Counting Techniques

MethodFormulaUse
Fundamental counting principlen1×n2××nkn_1 \times n_2 \times \cdots \times n_kTotal outcomes from sequential choices
Factorialn!=n×(n1)××1n! = n \times (n-1) \times \cdots \times 1Number of ways to arrange nn items
PermutationsP(n,r)=n!(nr)!P(n, r) = \dfrac{n!}{(n - r)!}Ordered arrangements of rr items from nn
CombinationsC(n,r)=(nr)=n!r!(nr)!C(n, r) = \dbinom{n}{r} = \dfrac{n!}{r!(n - r)!}Unordered selections of rr items from nn

Discrete Distributions

DistributionFormulaParameters
Expected valueE(X)=xiP(xi)E(X) = \sum x_i \cdot P(x_i)xix_i = values, P(xi)P(x_i) = probabilities
Variance of XXVar(X)=(xiμ)2P(xi)\text{Var}(X) = \sum (x_i - \mu)^2 \cdot P(x_i)or E(X2)[E(X)]2E(X^2) - [E(X)]^2
Binomial probabilityP(X=k)=(nk)pk(1p)nkP(X = k) = \dbinom{n}{k} p^k (1-p)^{n-k}nn = trials, pp = success probability
Binomial meanμ=np\mu = np
Binomial std deviationσ=np(1p)\sigma = \sqrt{np(1-p)}
Geometric probabilityP(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1} pFirst success on trial kk
Poisson probabilityP(X=k)=eλλkk!P(X = k) = \dfrac{e^{-\lambda} \lambda^k}{k!}λ\lambda = average rate

Normal Distribution

FormulaUse
z=xμσz = \dfrac{x - \mu}{\sigma}Convert a value to a z-score
x=μ+zσx = \mu + z\sigmaConvert a z-score back to a value
Empirical rule: 68% within μ±σ\mu \pm \sigmaApproximate middle proportion
Empirical rule: 95% within μ±2σ\mu \pm 2\sigmaApproximate large majority
Empirical rule: 99.7% within μ±3σ\mu \pm 3\sigmaNearly all data

Confidence Intervals

ParameterFormulaConditions
Mean (σ\sigma known)xˉ±zσn\bar{x} \pm z^* \cdot \dfrac{\sigma}{\sqrt{n}}Normal population or n30n \geq 30
Mean (σ\sigma unknown)xˉ±tsn\bar{x} \pm t^* \cdot \dfrac{s}{\sqrt{n}}Normal population or n30n \geq 30; use tt with df=n1df = n - 1
Proportionp^±zp^(1p^)n\hat{p} \pm z^* \sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}np^10n\hat{p} \geq 10 and n(1p^)10n(1-\hat{p}) \geq 10
Margin of errorE=zσnE = z^* \cdot \dfrac{\sigma}{\sqrt{n}} or E=tsnE = t^* \cdot \dfrac{s}{\sqrt{n}}Half-width of the confidence interval
Sample size (mean)n=(zσE)2n = \left(\dfrac{z^* \cdot \sigma}{E}\right)^2Minimum nn for desired margin of error
Sample size (proportion)n=p^(1p^)(zE)2n = \hat{p}(1-\hat{p})\left(\dfrac{z^*}{E}\right)^2Use p^=0.5\hat{p} = 0.5 if unknown

Common Critical Values

Confidence Levelzz^*
90%1.645
95%1.960
99%2.576

Hypothesis Tests

TestTest StatisticUse
Z-test for proportionz=p^p0p0(1p0)nz = \dfrac{\hat{p} - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}}Test a claim about a population proportion
Z-test for mean (σ\sigma known)z=xˉμ0σ/nz = \dfrac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}Test a claim about a population mean
T-test for mean (σ\sigma unknown)t=xˉμ0s/nt = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}}Test a claim about a mean; df=n1df = n - 1
Two-sample t-testt=xˉ1xˉ2s12n1+s22n2t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}Compare means of two independent groups
Paired t-testt=dˉsd/nt = \dfrac{\bar{d}}{s_d / \sqrt{n}}Compare means of matched pairs; dˉ\bar{d} = mean difference
Chi-square goodness of fitχ2=(OE)2E\chi^2 = \sum \dfrac{(O - E)^2}{E}Test if observed frequencies match expected
Chi-square independenceχ2=(OE)2E\chi^2 = \sum \dfrac{(O - E)^2}{E}Test if two categorical variables are related

Linear Regression

FormulaUse
y^=a+bx\hat{y} = a + bxPredicted value from the regression line
b=nxyxynx2(x)2b = \dfrac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}Slope of the least-squares line
a=yˉbxˉa = \bar{y} - b\bar{x}Y-intercept of the least-squares line
r=nxyxy[nx2(x)2][ny2(y)2]r = \dfrac{n\sum xy - \sum x \sum y}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}Correlation coefficient
R2=r2R^2 = r^2Proportion of variance explained by the model

Return to Statistics for the full topic list.