AP Precalculus: Single-Variable Statistics

Master variance, standard deviation, outliers, and confidence intervals

πŸ“Š Variance πŸ“ˆ Std Dev 🎯 Outliers πŸ“ Confidence Intervals

πŸ“š Understanding Statistics

Single-variable statistics summarizes and analyzes data from one variable. Key concepts include measures of spread (variance, standard deviation), identifying unusual values (outliers), and making inferences about populations using confidence intervals. These tools help us understand data variability and make informed decisions.

1 Variance & Standard Deviation

Variance measures how spread out data values are from the mean. Standard deviation is the square root of variance, in the same units as the original data.

Population Variance
\(\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2\)
Divide by N (population size)
Sample Variance
\(s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2\)
Divide by n-1 (Bessel's correction)
Population Std Dev
\(\sigma = \sqrt{\sigma^2}\)
Sample Std Dev
\(s = \sqrt{s^2}\)
πŸ“Œ Example

Data: 4, 8, 6, 5, 7 (sample)

Mean: \(\bar{x} = \frac{4+8+6+5+7}{5} = 6\)

Deviations squared: \((4-6)^2 + (8-6)^2 + (6-6)^2 + (5-6)^2 + (7-6)^2 = 4+4+0+1+1 = 10\)

Sample variance: \(s^2 = \frac{10}{4} = 2.5\)

Sample std dev: \(s = \sqrt{2.5} \approx 1.58\)

πŸ’‘ Why n-1 for Samples?

Dividing by n-1 (instead of n) corrects for bias when estimating population variance from a sample. This is called Bessel's correction.

2 Outlier Detection & Effects

An outlier is a data point that is unusually far from other values. The IQR method provides a standard way to identify outliers.

Interquartile Range \(IQR = Q_3 - Q_1\)
Lower Fence
\(Q_1 - 1.5 \times IQR\)
Values below are outliers
Upper Fence
\(Q_3 + 1.5 \times IQR\)
Values above are outliers

Effects of Outliers

On Mean

Outliers pull the mean toward extreme values. Mean is NOT resistant to outliers.

On Median

Median is resistant β€” not affected much by outliers.

On Standard Deviation

Outliers increase standard deviation. Removing outliers usually decreases it.

On IQR

IQR is resistant β€” based on quartiles, not extreme values.

πŸ“Œ Example

Data: 2, 4, 5, 6, 7, 8, 25

Q1 = 4, Q3 = 8: IQR = 8 - 4 = 4

Fences: Lower = 4 - 6 = -2, Upper = 8 + 6 = 14

Conclusion: 25 > 14 β†’ 25 is an outlier βœ“

3 Sampling Bias & Types

Sampling bias occurs when the sample is not representative of the population, leading to inaccurate conclusions.

Type of Bias Description Example
Selection Bias Some groups more likely to be selected Surveying only mall shoppers
Non-response Bias People who respond differ from non-responders Only motivated people return surveys
Response Bias Respondents give inaccurate answers Embarrassing questions, leading wording
Undercoverage Some population members have no chance of selection Phone survey excludes those without phones
Voluntary Response Participants self-select into study Online polls attract strong opinions
⚠️ Avoiding Bias

Use random sampling to give every member of the population an equal chance of being selected. This is the foundation of valid statistical inference.

4 Confidence Intervals

A confidence interval provides a range of plausible values for a population parameter, based on sample data.

CI for Mean (Οƒ known)
\(\bar{x} \pm z^* \cdot \frac{\sigma}{\sqrt{n}}\)
CI for Mean (Οƒ unknown)
\(\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\)
CI for Proportion
\(\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)

Common Z* Values

90% CI
z* = 1.645
95% CI
z* = 1.96
99% CI
z* = 2.576
πŸ“Œ Example

Given: \(\bar{x} = 75\), \(s = 10\), \(n = 36\), find 95% CI for mean (Οƒ unknown)

Standard error: \(\frac{10}{\sqrt{36}} = \frac{10}{6} \approx 1.67\)

Margin of error: \(1.96 \times 1.67 \approx 3.27\)

95% CI: \(75 \pm 3.27 = (71.73, 78.27)\)

5 Interpreting Confidence Intervals

A confidence interval interpretation must include: confidence level, parameter of interest, and the interval values in context.

Correct Interpretation Template "We are ___% confident that the true [parameter] lies between [lower bound] and [upper bound]."

What Affects Interval Width?

Increasing Confidence

Higher confidence β†’ Wider interval (more sure = less precise)

Increasing Sample Size

Larger n β†’ Narrower interval (more data = more precise)

More Variability

Larger Οƒ or s β†’ Wider interval

⚠️ Common Mistake

DON'T say "There's a 95% probability the parameter is in this interval." The parameter is fixed β€” either it's in the interval or it's not. The 95% refers to the method's long-run success rate.

6 Experiment Design Principles

A well-designed experiment allows us to establish cause and effect. Key principles ensure valid, reliable results.

Control
Keep other variables constant; use control group
Randomization
Randomly assign subjects to treatment groups
Replication
Use enough subjects to reduce chance variation
Blocking
Group similar subjects before randomizing

Observational Study

Observe without intervention. Cannot establish causation β€” only association.

Experiment

Researcher imposes treatments. CAN establish cause and effect when well-designed.

πŸ’‘ Simulations

Use simulations to model random processes and analyze variability. Repeat many times to understand what outcomes are likely or unusual.

πŸ“‹ Quick Reference

Sample Variance

\(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)

Standard Deviation

\(s = \sqrt{s^2}\)

IQR

\(Q_3 - Q_1\)

Outlier Test

\(x < Q_1 - 1.5 \cdot IQR\) or \(x> Q_3 + 1.5 \cdot IQR\)

CI for Mean

\(\bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}}\)

CI for Proportion

\(\hat{p} \pm z^* \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)

Need Help with Statistics?

Our expert tutors provide personalized instruction to help you excel in AP Precalculus.

Book Free Consultation