Probability and Data Analysis
MYP Unit Framework
Key Concept: LOGIC Related Concepts: Representation, Validity, Models Global Context: Identities and Relationships (How do data and probability help us understand ourselves and our communities?) Statement of Inquiry: Statistical methods allow us to draw valid conclusions from data while recognising uncertainty and bias.
Inquiry Questions
| Type | Question |
|---|---|
| Factual | What is the difference between mean, median, and mode? How do you calculate probability of simple events? |
| Conceptual | How do different representations of data affect interpretation? Why is randomness important in probability and statistics? |
| Debatable | Can statistics be trusted — or are they easily manipulated to support any argument? Should probability influence life decisions, or is intuition a better guide? |
ATL Skills
- Thinking: Critically evaluate statistical claims; distinguish between correlation and causation
- Research: Collect, organise, and analyse data; design surveys and experiments
- Communication: Present data using appropriate graphs and charts; write data-driven conclusions
- Social: Collaborate on data collection and analysis projects
- Self-Management: Manage time for extended data investigation
1. Descriptive Statistics
Measures of Central Tendency
Mean: The arithmetic average. Sum of all values divided by number of values.
Median: The middle value when data is arranged in order. Less affected by outliers than the mean.
Mode: The most frequently occurring value. Useful for categorical data.
Choosing the Right Measure
- Mean: Use when data is symmetrically distributed without outliers
- Median: Use when data is skewed or has outliers
- Mode: Use for categorical data or when identifying the most common value
Measures of Dispersion (Spread)
Range: Maximum value - Minimum value. Simple but affected by outliers.
Interquartile Range (IQR): Q3 - Q1. Represents the spread of the middle 50% of data. Not affected by outliers.
Quartiles:
- Q1: Median of the lower half of data
- Q2: Median of all data
- Q3: Median of the upper half of data
Box-and-Whisker Plots
Visual representation showing minimum, Q1, median, Q3, and maximum. Useful for comparing distributions.
2. Data Representation
Types of Graphs and Their Uses
| Graph Type | Best Used For |
|---|---|
| Bar chart | Comparing categories |
| Histogram | Distribution of continuous data |
| Pie chart | Showing proportions of a whole |
| Line graph | Trends over time |
| Scatter plot | Relationship between two variables |
| Box plot | Comparing distributions; showing spread |
Misleading Graphs
Graphs can mislead through:
- Truncated y-axis (not starting at zero)
- Inconsistent scales
- Cherry-picked time frames
- 3D effects distorting proportions
- Cherry-picking data to support a narrative
Analysing Graphical Data
When interpreting a graph, ask:
- What does the x-axis represent? The y-axis?
- What is the scale? Does it start at zero?
- What trends or patterns are visible?
- What conclusions can be drawn?
- What information is missing?
3. Probability Basics
What Is Probability?
Probability measures the likelihood of an event occurring. It ranges from 0 (impossible) to 1 (certain).
Formula: P(event) = Number of favourable outcomes / Total number of possible outcomes
Key Terminology
- Experiment: A process with uncertain outcomes (e.g., rolling a die)
- Outcome: A single result of an experiment
- Event: A set of outcomes (e.g., rolling an even number)
- Sample Space: All possible outcomes
Types of Probability
Theoretical Probability: Based on reasoning (e.g., probability of rolling a 6 on a fair die is 1/6)
Experimental Probability: Based on actual trials (e.g., 12 sixes in 60 rolls = 0.2). As the number of trials increases, experimental probability approaches theoretical probability (Law of Large Numbers).
The Complement Rule
P(not A) = 1 - P(A)
Addition Rule (OR)
For mutually exclusive events: P(A or B) = P(A) + P(B)
For non-mutually exclusive events: P(A or B) = P(A) + P(B) - P(A and B)
Multiplication Rule (AND)
For independent events: P(A and B) = P(A) x P(B)
For dependent events: P(A and B) = P(A) x P(B given A)
Expected Value
Expected value = sum of (each outcome x its probability)
Real-world application: Insurance companies use expected value to set premiums; casinos use it to ensure profitability.
4. Probability in Practice
Tree Diagrams
Tree diagrams help visualise multi-stage events with probabilities at each branch.
Venn Diagrams
Venn diagrams show relationships between sets and can be used to calculate probabilities involving unions and intersections.
Two-Way Tables
Two-way tables organise data by two categories and allow calculation of conditional probabilities.
Conditional Probability
The probability of event A given that event B has occurred: P(A|B) = P(A and B) / P(B)
5. Correlation and Causation
Correlation
Correlation measures the strength and direction of the linear relationship between two variables.
- Positive correlation: As one variable increases, the other increases
- Negative correlation: As one variable increases, the other decreases
- No correlation: No relationship between the variables
Correlation Coefficient (r)
Ranges from -1 to +1:
- r = +1: Perfect positive correlation
- r = -1: Perfect negative correlation
- r = 0: No correlation
Correlation Does NOT Imply Causation
A common statistical fallacy: assuming that because two variables are correlated, one causes the other.
Example: Ice cream sales and drowning incidents are correlated. But ice cream does not cause drowning. Both are caused by a third variable: hot weather (more people swim AND more people eat ice cream).
Spurious Correlations
Sometimes correlations are purely coincidental. The website 'Spurious Correlations' shows examples like the correlation between margarine consumption and the divorce rate in Maine.
6. Data Investigation Project
The Statistical Investigation Cycle (PPDAC)
- Problem: Define the question you want to answer
- Plan: Design how to collect data
- Data: Collect the data
- Analysis: Organise, represent, and analyse the data
- Conclusion: Draw conclusions and communicate findings
Designing a Survey
- Define a clear research question
- Choose an appropriate sample size
- Avoid biased or leading questions
- Ensure anonymity and ethical data collection
- Consider sampling method (random, stratified, convenience)
Ethical Considerations
- Informed consent
- Privacy and confidentiality
- Honest representation of data
- Avoiding manipulation of statistics for persuasion
Summative Assessment
Task: Statistical investigation (800-1000 words equivalent) involving data collection, analysis, and interpretation.
Criteria:
- A: Knowing and Understanding — Apply statistical and probability concepts correctly
- B: Investigating Patterns — Collect data, identify patterns, and draw valid conclusions
- C: Communicating — Present data using appropriate representations; communicate reasoning clearly
- D: Applying Mathematics in Real-World Contexts — Apply statistics to a real-world question; evaluate limitations
Option 1: Design and conduct a survey on a topic of interest. Analyse the data using measures of central tendency and dispersion. Present findings with appropriate graphs.
Option 2: Investigate a claim made in the media using statistical analysis. Is the claim supported by evidence? How might data be misleading?
Option 3: Conduct a probability experiment (e.g., rolling dice, spinning spinners). Compare experimental results with theoretical probability. Discuss the Law of Large Numbers.
Formative Assessment
- Calculating mean, median, mode, range, and IQR from data sets
- Creating and interpreting different types of graphs
- Probability problem sets (single and multi-stage events)
- Identifying misleading graphs (media analysis)
- Correlation vs. causation exercises
- Tree diagram and Venn diagram construction
Interdisciplinary Connections
- Science: Analysing experimental data; understanding statistical significance
- Economics: Risk assessment; market analysis; probability in financial decisions
- Psychology: Understanding statistical claims in psychological research
- Media Studies: Critical analysis of statistics in news and advertising
- Sports: Analysing player and team statistics; probability in game strategy
Service as Action
- Data for Good: Collect and analyse data on an issue in your school (waste, energy use, wellbeing). Present findings to the school leadership with recommendations.
- Statistical Literacy: Create posters or a workshop for younger students about how to spot misleading statistics in the media.
IB Learner Profile
- Inquirers: Ask questions that can be answered through data
- Thinkers: Critically evaluate statistical claims and distinguish valid conclusions from misleading ones
- Knowledgeable: Understand concepts of probability and data analysis
- Principled: Use data ethically; represent findings honestly
- Reflective: Consider the limitations of statistical methods and the role of uncertainty
Self-Test
- Find the mean, median, and mode of: 3, 7, 2, 8, 3, 9, 5.
- What is the range? What is the interquartile range?
- When should you use the median instead of the mean?
- List THREE ways graphs can be misleading.
- What is the probability of rolling an even number on a fair six-sided die?
- Two dice are rolled. What is the probability of rolling a sum of 7?
- What is the complement rule in probability?
- Explain the difference between mutually exclusive and independent events.
- What is the difference between correlation and causation? Give an example.
- What are the five stages of the PPDAC statistical investigation cycle?
This unit aligns with IB MYP Mathematics guide, developed for Year 4 (Class 9) students.
