Collection of Data
"Garbage in, garbage out. The quality of your analysis depends on the quality of your data."
1. Chapter Overview
Before you can ANALYSE data, you must COLLECT it. This chapter covers: the TWO SOURCES (primary and secondary), METHODS of collecting primary data, the difference between CENSUS and SAMPLE surveys, TYPES of sampling (random and non-random), and key SOURCES of secondary data in India.
2. Primary vs Secondary Data
| Primary Data | Secondary Data | |
|---|---|---|
| Definition | Data collected by the investigator FIRST-HAND for their SPECIFIC purpose | Data ALREADY collected by someone else for SOME OTHER purpose |
| Examples | A survey you conduct to study student spending habits | Census of India data. NSSO consumption survey. RBI bulletins. |
| Advantages | Tailored to YOUR question. You know the quality. | Cheap, fast. Covers large populations and long time periods. |
| Disadvantages | Expensive, time-consuming. Requires fieldwork. | May not perfectly fit YOUR question. Quality may be uncertain. |
3. Methods of Collecting Primary Data
A. Census vs Sample
| Census (Complete Enumeration) | Sample Survey | |
|---|---|---|
| What | Survey EVERY unit in the population | Survey a SUBSET (sample) and INFER about the whole population |
| Cost | VERY EXPENSIVE | Cheaper |
| Time | Very TIME-CONSUMING | Faster |
| Accuracy | In theory: PERFECT. In practice: errors possible in huge operations. | Sampling error exists, but CAN be measured. If well-designed: reliable. |
| When Used | Population is small. Or high precision required (national census). | Population is large. When census is impractical. |
B. Methods of Sampling
Random (Probability) Sampling
- Each unit has a KNOWN, NON-ZERO probability of being selected
- Simple Random Sampling: Each unit has EQUAL chance. Like drawing names from a hat.
- Stratified Sampling: Population divided into GROUPS (strata) first (by age, gender, income). Then random sample from EACH stratum. Ensures all groups are represented.
- Systematic Sampling: Select every Kth unit (every 10th house on a street, every 100th name on a list).
Non-Random (Non-Probability) Sampling
- Selection is based on INVESTIGATOR'S JUDGMENT or CONVENIENCE
- Judgment / Purposive Sampling: Investigator CHOOSES units they think are representative
- Convenience Sampling: Choose whoever is EASIEST to reach
- PROBLEM: Cannot measure sampling error. May NOT be representative. Bias risk.
4. Sources of Secondary Data in India
| Source | What It Provides |
|---|---|
| Census of India (every 10 years — 2011, next: 2021-delayed) | Population, literacy, occupation, housing, amenities — for every village and town |
| NSSO (National Sample Survey Office — now merged into MoSPI surveys) | Consumption expenditure, employment, health, education — continuous surveys |
| RBI Bulletin | Banking, money supply, forex reserves, interest rates, inflation |
| Economic Survey (Ministry of Finance, annually) | Comprehensive review of the Indian economy |
| Registrar General of India | Birth rates, death rates, IMR, life expectancy |
| Periodic Labour Force Survey (PLFS) | Employment and unemployment |
5. Pilot Survey and Questionnaire Design
- Pilot survey: a SMALL-SCALE TRIAL before the full survey. Tests: are the questions CLEAR? Do they generate USEFUL answers? Are there ambiguities?
- Questionnaire: the form with questions. Must be: CLEAR, SPECIFIC, UNAMBIGUOUS, LOGICALLY ORDERED. Avoid leading questions. Pre-test with a pilot.
6. Exam Focus
- Primary vs Secondary data — distinction, pros/cons
- Census vs Sample survey — when each is used
- Random sampling — simple random, stratified, systematic
- Non-random sampling — judgment, convenience. Problem: cannot measure error.
- Key Indian data sources — Census, NSSO, RBI, Economic Survey
7. Conclusion
Good data doesn't grow on trees. It is COLLECTED — with care, method, and awareness of sources of error:
- PRIMARY data: You collect it. Tailored but expensive.
- SECONDARY data: Someone else collected it. Cheap but may not fit perfectly.
- SAMPLING: When you can't survey everyone, pick a REPRESENTATIVE sample using RANDOM methods.
- INDIA's DATA INFRASTRUCTURE: Census, NSSO, RBI, Economic Survey — these are the building blocks of economic knowledge in India.
'To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem.' — R.A. Fisher. Statistical thinking begins BEFORE data is collected — with the design of the survey.
