Describing Distributions
Medication dosages, IV drip rates, vital monitoring
Discounts, tax, tips, profit margins
When someone hands you a dataset, the first step is to describe its distribution. Before you calculate anything, you need a clear picture of how the data is spread out. A good description tells the reader what the data looks like, where the center is, how spread out the values are, and whether anything unusual is going on. Statisticians have a simple framework for doing this consistently every time.
The SOCS Framework
SOCS is a checklist for describing any distribution:
| Letter | Stands For | What to Address |
|---|---|---|
| S | Shape | Is the distribution symmetric, skewed, or bimodal? |
| O | Outliers | Are there any unusually high or low values? |
| C | Center | What is a typical value? (mean or median) |
| S | Spread | How much do the values vary? (range, IQR, standard deviation) |
When you describe a distribution, address all four elements in order. This ensures you never miss an important feature of the data.
Shape of a Distribution
The shape of a distribution describes the overall pattern of the data when you look at a histogram or dot plot. There are five common shapes you should know.
Symmetric — The left side is roughly a mirror image of the right side. The mean and median are approximately equal. Example: test scores on a well-designed exam typically form a symmetric bell shape.
Skewed Right (Positively Skewed) — The bulk of the data is on the left, with a long tail stretching to the right. The mean is greater than the median because the tail pulls the mean toward higher values. Example: household income in the United States is skewed right — most households earn moderate amounts, but a few very high earners stretch the tail far to the right.
Skewed Left (Negatively Skewed) — The bulk of the data is on the right, with a long tail stretching to the left. The mean is less than the median because the tail pulls the mean toward lower values. Example: age at retirement is skewed left — most people retire around 62 to 67, but a few retire much earlier, creating a left tail.
Bimodal — The distribution has two distinct peaks. This often indicates that the data comes from two different groups mixed together. Example: the heights of a mixed male and female group often show two peaks — one near the average female height and one near the average male height.
Uniform — All values occur with roughly equal frequency. There is no peak and no tail. Example: the outcomes of rolling a fair die — each face (1 through 6) comes up about equally often.
Three Common Distribution Shapes
How Shape Affects Center
The shape of a distribution determines the relationship between the mean and the median. This is one of the most important ideas in descriptive statistics because it tells you which measure of center to trust.
| Shape | Relationship | Which Measure to Report |
|---|---|---|
| Symmetric | Mean Median | Either (mean is standard) |
| Skewed Right | Mean Median | Median (resistant to tail) |
| Skewed Left | Mean Median | Median (resistant to tail) |
Why does this happen? The mean is calculated using every value in the dataset, so extreme values in the tail pull it toward them. The median only depends on the position of the middle value, so it is not affected by how far away the extremes are. When data is skewed, the median is a better representation of a “typical” value.
Effects of Outliers
An outlier is a value that is unusually far from the rest of the data. Outliers have a dramatic effect on some statistics and almost no effect on others.
Example 1: Salary Data
Consider the salaries (in dollars) of seven employees at a small company:
The last value ($200,000) is an outlier — likely the owner’s salary.
With the outlier included (all 7 values):
Without the outlier (first 6 values only):
| Statistic | With Outlier | Without Outlier | Change |
|---|---|---|---|
| Mean | $64,000 | $41,333 | $22,667 shift |
| Median | $42,000 | $41,000 | $1,000 shift |
The mean jumped by over $22,000 because of a single outlier. The median barely moved. This is why the median is called a resistant measure — it resists the influence of extreme values.
| Statistic | Resistant? | Why |
|---|---|---|
| Mean | No | Uses every value in the calculation — outliers pull it |
| Median | Yes | Depends only on position, not on the magnitude of extremes |
| Range | No | Directly determined by the most extreme values |
| IQR | Yes | Based on quartiles, which ignore extremes |
| Standard Deviation | No | Squaring deviations magnifies the effect of outliers |
Putting It All Together: Describing a Distribution
Now let’s use the full SOCS framework to describe a real dataset.
Example 2: Patient Wait Times
A clinic recorded the wait times (in minutes) for 15 patients on a Monday morning:
Shape: The distribution is skewed right. Most patients wait between 5 and 25 minutes, but there is a long tail of longer wait times stretching up to 60 minutes.
Outliers: To check for outliers, we need the quartiles and the 1.5 IQR rule.
There are 15 values, so the median is the 8th value: .
Lower half (values 1 through 7): . The median of these 7 values is the 4th: .
Upper half (values 9 through 15): . The median of these 7 values is the 4th: .
Since all values fall between and , there are no outliers by the 1.5 IQR rule. The values 50 and 60 are on the high end but do not exceed the upper fence.
Center: The median wait time is 20 minutes. Because the distribution is skewed right, the median is a better measure of center than the mean. (The mean would be pulled higher by the long right tail.)
Spread: The range is minutes, and the IQR is 23 minutes. The middle 50% of patients waited between 12 and 35 minutes.
Full description: “The distribution of patient wait times is skewed right with no outliers. The median wait time is 20 minutes, with an IQR of 23 minutes (from 12 to 35 minutes). While most patients are seen within 25 minutes, a few patients waited considerably longer, up to 60 minutes.”
Real-World Application: Nursing — Describing Blood Glucose Levels
A nurse collects fasting blood glucose readings (mg/dL) from 12 patients in a diabetes screening clinic:
Using the SOCS framework:
Shape: Skewed right — most readings cluster between 82 and 130, but there is a long tail to the right caused by the 210 reading.
Outliers: With 12 values, . Lower half: gives . Upper half: gives .
The value 210 exceeds 170.5 — it is an outlier. This patient likely has uncontrolled diabetes and should be flagged for follow-up.
Center: Median = 102.5 mg/dL. The median is appropriate here because of the right skew and the outlier.
Spread: IQR = 31 mg/dL. Range = mg/dL, but the range is inflated by the outlier.
Clinical note: Normal fasting glucose is 70 to 99 mg/dL. Pre-diabetes is 100 to 125. Diabetes is 126 or higher. This distribution shows that one-third of the patients in the screening are in the pre-diabetic range, with at least one requiring urgent intervention.
Practice Problems
Test your understanding with these problems. Click to reveal each answer.
Problem 1: A dataset of exam scores has a mean of 74 and a median of 74. What can you say about the shape of the distribution?
When the mean and median are approximately equal, the distribution is likely symmetric. The mean is not being pulled in either direction by a tail, which indicates the data is roughly evenly distributed on both sides of center.
Answer: The distribution is approximately symmetric.
Problem 2: Home sale prices in a neighborhood have a mean of $385,000 and a median of $320,000. Describe the likely shape and explain why.
The mean ($385,000) is considerably larger than the median ($320,000). This happens when a few very expensive homes pull the mean upward while the median remains anchored at the middle position.
Answer: The distribution is skewed right (positively skewed). A few high-priced homes create a long right tail, pulling the mean above the median.
Problem 3: Dataset: 10, 12, 13, 14, 14, 15, 15, 16, 80. Calculate the mean and median. Which better represents a “typical” value?
The mean of 21 is higher than 8 of the 9 values — it does not represent a typical value at all. The outlier (80) inflated it.
Answer: The median of 14 is the better measure of center. The outlier at 80 pulls the mean to 21, which is misleadingly high.
Problem 4: Describe this dataset using the SOCS framework: 2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8.
Shape: Roughly symmetric, slightly mounded in the center. Values peak around 5 and taper off on both sides.
Outliers: (median of first 8 values: ), (median of last 8 values: ). IQR = . Lower fence = . Upper fence = . All values are between 1 and 9, so no outliers.
Center: Median = . Mean = . Mean and median agree, consistent with a symmetric shape.
Spread: Range = . IQR = .
Answer: The distribution is approximately symmetric with no outliers, centered at 5, with a range of 6 and an IQR of 2.
Problem 5: Why would a retail manager prefer the median over the mean when reporting “typical” daily sales, given that Black Friday and holiday sales are included in the dataset?
Black Friday and holiday sales are extreme high values (outliers) that would pull the mean upward, making the “typical” day seem much more profitable than it actually is. The median is resistant to these outliers and gives a more accurate picture of what a normal day’s sales look like.
Answer: The median is preferred because holiday sales are outliers that inflate the mean. The median reflects a typical day without being distorted by a few extreme values.
Key Takeaways
- Use the SOCS framework (Shape, Outliers, Center, Spread) to describe any distribution systematically
- Symmetric distributions have mean median; skewed right distributions have mean median; skewed left distributions have mean median
- Outliers strongly affect the mean, range, and standard deviation, but have little impact on the median and IQR
- When data is skewed or has outliers, the median and IQR are better measures of center and spread than the mean and standard deviation
- Always describe the context — a distribution is not just numbers, it’s measurements of something real (wait times, salaries, blood glucose levels)
Return to Statistics for more topics in this section.
Next Up in Statistics
All Statistics topicsLast updated: March 29, 2026