Organisation of Data
"Raw data is like raw ore. You have to refine it before it yields value."
1. Chapter Overview
Once data is COLLECTED, it must be ORGANISED. A MESS of individual responses must become a STRUCTURED dataset. This chapter covers: CLASSIFICATION (grouping data into categories), FREQUENCY DISTRIBUTIONS (how many observations fall in each category), and TYPES OF STATISTICAL SERIES (individual, discrete, continuous).
2. Classification of Data
What Is Classification?
- Grouping data into CATEGORIES or CLASSES based on shared characteristics
Types of Classification
| Type | Basis | Example |
|---|---|---|
| Geographical | Location | State-wise GDP, country-wise population |
| Chronological | Time | Year-wise inflation rate, month-wise rainfall |
| Qualitative | Attribute (non-numerical) | Gender (male/female), religion, occupation, caste |
| Quantitative | Numerical measurement | Income groups, age groups, height ranges |
3. Frequency Distribution
What Is It?
- A table that shows HOW MANY times each value (or range of values) occurs in a dataset
Key Terms
| Term | Meaning |
|---|---|
| Class | A group/category. Example: age group 10-19, 20-29, etc. |
| Class Interval | The RANGE of values in a class. Example: 10-19. |
| Class Limit | The BOUNDARIES. Lower limit (10). Upper limit (19). |
| Class Frequency | The NUMBER of observations in that class |
| Class Width (Size) | Upper limit — Lower limit. For 10-19: width = 10 |
| Class Mid-Point | (Lower + Upper) ÷ 2. Mid-point of 10-19 = 14.5 |
Types of Frequency Distributions
| Type | Features | Example |
|---|---|---|
| Discrete (Ungrouped) | Variable takes SPECIFIC integer values. Each value is its own class. | Number of children per family (0, 1, 2, 3...) |
| Continuous (Grouped) | Variable can take ANY value in a range. Classes cover INTERVALS. | Height (150-159cm, 160-169cm, etc.) |
Inclusive vs Exclusive Classes
- Exclusive: Upper limit EXCLUDED from that class (belongs to next class). Example: 10-20, 20-30. An observation of exactly 20 goes in the SECOND class (20-30).
- Inclusive: Upper limit INCLUDED in that class. Example: 10-19, 20-29. An observation of 19 goes in the first class.
- Continuous variables use EXCLUSIVE method (generally).
4. Types of Statistical Series
| Type | How Data Is Presented |
|---|---|
| Individual Series | Raw data: each observation listed individually. (5, 8, 12, 7, 3...) |
| Discrete Series | Frequency table for discrete variable. X values + their frequencies. |
| Continuous Series | Frequency table for continuous variable. Class intervals + their frequencies. |
5. How to Construct a Frequency Distribution
- Determine the RANGE: Highest value — Lowest value
- Decide the NUMBER of classes (typically 5-15, depending on data)
- Determine class WIDTH: Range ÷ Number of classes (round up)
- Set class LIMITS. Start at or below the lowest value.
- SORT each observation into its class
- COUNT the frequency for each class
- Optional: add columns for relative frequency (%), cumulative frequency (running total)
6. Exam Focus
- Types of classification (geographical, chronological, qualitative, quantitative)
- Frequency distribution — key terms (class, interval, limit, frequency, width, mid-point)
- Discrete vs Continuous series
- Inclusive vs Exclusive classes
- Constructing a frequency distribution from raw data
7. Conclusion
Organisation is the bridge between RAW DATA and MEANINGFUL ANALYSIS:
- CLASSIFICATION groups data into meaningful categories
- FREQUENCY DISTRIBUTION shows the PATTERN — where observations cluster, how spread out they are
- Before you can calculate an average or draw a graph, you must first ORGANISE
'Data that is not organised is not data — it is noise.'
