Measures of Central Tendency
"'Average' is the most dangerous word in statistics — because it means at least THREE different things."
1. Chapter Overview
A MEASURE OF CENTRAL TENDENCY is a SINGLE VALUE that represents the CENTRE of a dataset. There are THREE main measures: Arithmetic Mean (the common 'average'), Median (the middle value), and Mode (the most frequent value). Each has different properties, calculation methods, and situations where it's the BEST choice.
2. Arithmetic Mean (AM, X̄)
What Is It?
- Sum of all values ÷ Number of values
- The most COMMONLY USED measure of central tendency
Calculation
- Individual series: X̄ = ΣX / N (Sum all values, divide by count)
- Discrete series: X̄ = ΣfX / Σf (Multiply each value by its frequency, sum, divide by total frequency)
- Continuous series: X̄ = ΣfM / Σf (M = mid-point of each class)
Weighted Arithmetic Mean
When different values have different importance (weights):
- Weighted Mean (X̄_w) = ΣwX / Σw (multiply each value by its weight, sum, divide by total weight)
- Example: If subject marks are weighted by credit hours, use weighted mean — not simple mean
- Used in calculating index numbers, GPA, stock market indices
Properties
- Every value is used in the calculation → affected by EXTREME VALUES (outliers)
- The sum of deviations around the mean = ZERO: Σ(X — X̄) = 0
- The sum of SQUARED deviations around the mean is MINIMUM (compared to any other value)
Pros and Cons
- ✓ Uses ALL data. Easy to understand and compute. Mathematically tractable.
- ✗ Sensitive to EXTREME VALUES (a single billionaire raises the 'average income' of a village). Cannot be used for QUALITATIVE data.
3. Median
What Is It?
- The MIDDLE value when data is arranged in ascending/descending ORDER
- Splits the dataset into TWO EQUAL PARTS. 50% of observations are below the median, 50% above.
Calculation
- Individual series: Sort data. If N is ODD → median = (N+1)/2 th value. If N is EVEN → median = average of (N/2)th and (N/2 + 1)th value.
- Continuous series: Median = L + [(N/2 — CF) / f] × h, where L = lower limit of median class, CF = cumulative frequency before median class, f = frequency of median class, h = class width.
When to Use Median
- When there are EXTREME OUTLIERS (income distribution — a few billionaires). Median income is more representative than mean income.
- For ORDINAL data (rankings, ratings).
4. Mode
What Is It?
- The value that occurs MOST FREQUENTLY in a dataset
- The 'most popular' value
Calculation
- Individual/Discrete: Count frequencies. Highest frequency = mode. (Can have multiple modes: bimodal, multimodal.)
- Continuous series: Mode = L + [(f₁ − f₀) / (2f₁ − f₀ − f₂)] × h, where L = lower limit of modal class, f₁ = frequency of modal class, f₀ = frequency of class before modal class, f₂ = frequency of class after modal class.
When to Use Mode
- When you want to know what's MOST COMMON (shopkeeper wants to stock the most popular shoe size)
- For NOMINAL data (categories, attributes — 'which religion is the modal category?')
5. Which Measure to Use — When?
| Situation | Best Measure |
|---|---|
| Data is symmetrical, no outliers | MEAN |
| Data is skewed (income, wealth) | MEDIAN |
| Want to know the most common value | MODE |
| Qualitative / nominal data | MODE |
| Further mathematical computation needed | MEAN |
Empirical Relationship (for moderately asymmetrical distributions)
Mode ≈ 3 Median — 2 Mean
Relationship Between AM, GM, and HM
For any set of unequal positive numbers: AM ≥ GM ≥ HM (equality holds only when all values are equal)
- AM (Arithmetic Mean) = ΣX/N
- GM (Geometric Mean) = (X₁ × X₂ × ... × Xₙ)^(1/n)
- HM (Harmonic Mean) = N / Σ(1/X)
- Example: For 2 and 8: AM = 5, GM = 4, HM = 3.2 → 5 ≥ 4 ≥ 3.2 ✓
6. Exam Focus
- Mean — formulas for individual/discrete/continuous series, properties
- Median — middle value, formula for continuous series, when to use
- Mode — most frequent value, when to use
- Comparison — mean vs median vs mode: which to use when (symmetrical vs skewed vs qualitative data)
- Empirical relationship: Mode ≈ 3Median — 2Mean
7. Common Mistakes
- Using the mean for income data without checking for outliers — A few very high incomes SKEW the mean upward. The MEDIAN is MORE REPRESENTATIVE for skewed distributions like income and wealth.
- The mode is always valid — Some datasets have NO mode (all values equally frequent) or MULTIPLE modes. Mode is only meaningful when there's a CLEAR concentration.
8. Conclusion
'Average' is not ONE number — it's a CHOICE:
- MEAN: Mathematical centre. Use when data is roughly symmetrical.
- MEDIAN: The middle. Use when data is SKEWED (income, house prices, wealth).
- MODE: The most frequent. Use when you want to know what's TYPICAL.
Knowing WHICH average to use — and why — is the difference between statistical literacy and statistical deception.
