AP Statistics Formula Sheet & Booklet
Complete Unit-by-Unit Reference with All Formulas, Theorems & Key Concepts
Exploring One-Variable Data (15-23% of Exam)
• 68% within 1σ of mean
• 95% within 2σ of mean
• 99.7% within 3σ of mean
Exploring Two-Variable Data (5-7% of Exam)
Collecting Data (12-15% of Exam)
• Sampling Bias
• Nonresponse Bias
• Response Bias
• Voluntary Response Bias
• Simple Random Sample (SRS)
• Stratified Random Sample
• Cluster Sample
• Systematic Sample
• Control (Control Group)
• Randomization
• Replication
• Blocking
Probability, Random Variables & Probability Distributions (10-20% of Exam)
P(A|B) = P(A), or
P(A ∩ B) = P(A) × P(B)
\( \sigma_X = \sqrt{np(1-p)} \)
\( \sigma_X = \frac{\sqrt{1-p}}{p} \)
• Symmetric, bell-shaped curve
• Mean = Median = Mode
• 68-95-99.7 Rule applies
• Continuous probability distribution
Sampling Distributions (7-12% of Exam)
Distribution of sample means is approximately normal with
\( \mu_{\bar{x}} = \mu \)
\( \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \)
\( \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \)
\( \sigma_{\hat{p}_1 - \hat{p}_2} = \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \)
\( \sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \)
• \( \bar{x} \) estimates μ
• \( \hat{p} \) estimates p
• \( s \) estimates σ (approximately)
Inference for Categorical Data: Proportions (12-15% of Exam)
Type II (β): Fail to reject H₀ when false (false negative)
Inference for Quantitative Data: Means (10-18% of Exam)
• Random sample
• Independence (10% rule)
• Normality (n ≥ 30 or check for severe skewness/outliers)
Inference for Categorical Data: Chi-Square (2-5% of Exam)
H₀: Distribution follows specific pattern
Hₐ: Distribution doesn't follow pattern
Hₐ: Variables are associated (or proportions differ)
| Column 1 | Column 2 | Total | |
|---|---|---|---|
| Row 1 | Count | Count | Row Total |
| Row 2 | Count | Count | Row Total |
| Total | Col Total | Col Total | Grand Total |
• Random sample
• Independence
• All expected frequencies ≥ 5
Inference for Quantitative Data: Slopes (2-5% of Exam)
With Interval: \( \hat{y} \pm t^* \cdot se \)
PI for individual y: \( \hat{y} \pm t^* \sqrt{s_e^2 + \frac{s_e^2}{n} + \frac{(x^*-\bar{x})^2}{\sum(x_i-\bar{x})^2}} \)
• Linearity: Relationship is linear
• Independence: Residuals independent
• Normality: Residuals approximately normal
• Equal Variance: Constant residual spread
• Show random scatter around y = 0
• Be approximately normally distributed
• Have constant spread across x values
Influential Point: Significantly affects regression line
Frequently Asked Questions About AP Statistics Formulas
Q: Do I need to memorize all these formulas for the AP Statistics exam?
A: No! The College Board provides a formula sheet during the exam that includes most formulas you'll need. However, understanding when and how to use each formula is critical. Focus on mastering the concepts and applications rather than pure memorization. A solid understanding will help you work faster during the exam and make fewer mistakes.
Q: How do I decide which statistical test to use?
A: Ask yourself these questions: (1) Am I analyzing categorical or quantitative data? (2) Am I testing one population or comparing two? (3) Are the samples independent or paired? (4) For proportions, use z-tests; for means, use t-tests; for categorical use chi-square. Creating a decision tree helps organize your thinking.
Q: What does a p-value really mean?
A: A p-value is the probability of observing a test statistic as extreme as yours, assuming H₀ is true. Smaller p-values provide stronger evidence against the null hypothesis. If p < α (typically 0.05), reject H₀. Remember: it's NOT the probability that H₀ is true!
Q: What's the difference between z and t distributions?
A: Use z-distributions when the population standard deviation (σ) is known or when working with proportions and large samples. Use t-distributions when the population standard deviation is unknown and you're estimating it with sample SD (s). The t-distribution has heavier tails and accounts for additional uncertainty; it approaches the z-distribution as sample size increases.
Q: What conditions must I check before using a particular test?
A: Always check: (1) Random/representative sample, (2) Independence (n < 10% of population), (3) Appropriate sample size or distribution shape (normality). Write these in your work! Violating conditions can invalidate your results. For specific tests: proportion tests need n·p̂ ≥ 10, mean tests need n ≥ 30 or normal data, chi-square needs all expected frequencies ≥ 5.
Q: How are confidence level and sample size related?
A: Increasing sample size decreases margin of error (narrower CI), making your estimate more precise. Increasing confidence level (95% to 99%) increases margin of error (wider CI), making your interval more likely to contain the parameter. It's a trade-off: larger n gives both narrower intervals and higher confidence, but cost increases with sample size.
Q: What are Type I and Type II errors, and which is worse?
A: Type I error (α): Rejecting H₀ when it's true (false positive). Type II error (β): Failing to reject H₀ when it's false (false negative). Which is worse depends on context. If testing a new drug (H₀ = not effective), Type I error (approving ineffective drug) is dangerous. If testing environmental contamination, Type II error (missing actual contamination) might be worse.
Q: When should I use paired t-test vs. two-sample t-test?
A: Use paired t-test when: (1) Same subjects measured twice (before/after), (2) Matched pairs (twins, siblings), or (3) Dependent samples. Use two-sample t-test when: samples are independent with different subjects. Always identify which type first—using the wrong test is a major error!
Q: What does R² tell me about my regression model?
A: R² (coefficient of determination) tells you the proportion of variation in y explained by x. If R² = 0.82, then 82% of variation in y is explained by the linear model; 18% is due to other factors. Higher R² means better fit, but a strong R² doesn't guarantee causation or that the model is appropriate.
Q: How do I interpret regression slope in context?
A: If slope = 2.5 and units are (y in dollars, x in years), say: "For each additional year, the predicted value increases by $2.50" or "A one-year increase in x is associated with a $2.50 increase in predicted y." Always include units and be careful about causation language in observational studies.
📚 Related AP Statistics Resources & Study Guides
Expand your AP Statistics preparation with these comprehensive resources: