Beginner-Friendly AP Statistics Guide

AP Statistics 2026 FRQ Solutions: Complete Step-by-Step Answer Guide

This complete solution guide explains every question from the 2026 AP Statistics Free-Response Questions in a simple, beginner-friendly way. Each answer includes the method, formula, substitution, final result, and interpretation in context.

What you will learn Five-number summaries, boxplots, experiment design, normal probability, binomial and geometric distributions, two-sample \(t\)-tests, conditional probability, independence, mosaic plots, and regression intervals.

Best for AP Statistics students, teachers, tutors, and anyone reviewing FRQ-style statistical reasoning.

Difficulty Beginner to intermediate. Every calculation is broken down slowly.

How to use this guide: First read the question idea, then study the formula, then check how the numbers are substituted. In AP Statistics, your explanation matters as much as your final numerical answer.

Question 1 Overview

A goat farmer compares the weights of two goat breeds: Breed H and Breed J. We are given the actual data for Breed H and a boxplot for Breed J. The main skills tested are finding a five-number summary and comparing distributions.

Part A: Find the five-number summary for Breed H

The Breed H goat weights are already listed from smallest to largest:

\[ 48,\ 48,\ 55,\ 56,\ 56,\ 57,\ 62,\ 66,\ 72,\ 72,\ 72,\ 73,\ 80,\ 80 \]

A five-number summary has five values:

\[ \text{Minimum},\ Q_1,\ \text{Median},\ Q_3,\ \text{Maximum} \]

Step 1: Minimum

The minimum is the smallest value.

\[ \text{Minimum}=48 \]

Step 2: Maximum

The maximum is the largest value.

\[ \text{Maximum}=80 \]

Step 3: Median

There are \(14\) values. Since \(14\) is even, the median is the average of the 7th and 8th values.

\[ \text{Median}=\frac{62+66}{2}=64 \]

Step 4: First quartile, \(Q_1\)

The lower half of the data is:

\[ 48,\ 48,\ 55,\ 56,\ 56,\ 57,\ 62 \]

The middle value is \(56\), so:

\[ Q_1=56 \]

Step 5: Third quartile, \(Q_3\)

The upper half of the data is:

\[ 66,\ 72,\ 72,\ 72,\ 73,\ 80,\ 80 \]

The middle value is \(72\), so:

\[ Q_3=72 \]

Final answer for Part A: \[ \boxed{\text{Minimum}=48,\ Q_1=56,\ \text{Median}=64,\ Q_3=72,\ \text{Maximum}=80} \]

Part B: Compare the center and variability of Breed H and Breed J

To compare two distributions, focus on two major ideas:

Center Variability

Compare center

Breed H has median:

\[ \text{Median}=64 \]

From the Breed J boxplot, the median is also about \(64\).

The centers are about the same because both breeds have a median weight of about \(64\) pounds.

Compare variability

For Breed H:

\[ IQR=Q_3-Q_1=72-56=16 \]

For Breed J, the boxplot shows approximately:

\[ Q_1\approx 56,\qquad Q_3\approx 80 \] \[ IQR\approx 80-56=24 \]

Final answer for Part B: Breed H and Breed J have about the same center because both medians are about \(64\) pounds. However, Breed J has more variability because its IQR is about \(24\) pounds, while Breed H has an IQR of \(16\) pounds. This means Breed J goat weights are more spread out than Breed H goat weights.

Part C(i): What does the stem-and-leaf plot show that the boxplot does not?

The stem-and-leaf plot shows the individual data values. Because individual values are visible, we can see the shape of the distribution more clearly.

Breed H has values clustered around the 50s and again around the 70s. This suggests a bimodal shape.

Final answer for Part C(i): The stem-and-leaf plot shows that the Breed H goat weights have a bimodal shape, but this would not be clear from the boxplot.

Part C(ii): Why does a boxplot not show this?

A boxplot only displays the five-number summary. It does not show every individual value.

\[ \text{Boxplot information}=\text{Minimum},\ Q_1,\ \text{Median},\ Q_3,\ \text{Maximum} \]

Final answer for Part C(ii): A boxplot would not show the bimodal shape because it only displays the five-number summary. It does not show individual goat weights, so the two clusters are hidden.

Question 2 Overview

Holly wants to know whether adding coffee grounds to soil helps rosebushes produce more roses. She grows \(30\) rosebushes in a greenhouse, randomly assigns \(15\) to receive coffee grounds, and leaves \(15\) without coffee grounds.

Part A(i): Identify the treatments

A treatment is the condition applied to the experimental units.

Final answer: The treatments are:

One-half cup of coffee grounds added weekly.
No coffee grounds added.

Part A(ii): Identify the experimental units

Experimental units are the individuals or objects that receive the treatments.

Final answer: The experimental units are the \(30\) rosebushes.

Part A(iii): Identify the response variable

The response variable is what is measured after the treatments are applied.

Final answer: The response variable is the number of roses on each rosebush after three months.

Part B: Describe random assignment

Random assignment helps make the groups comparable. It reduces bias and helps us decide whether differences are likely caused by the treatment.

Step-by-step random assignment method

Label the rosebushes from \(1\) to \(30\).
Use a random number generator to choose \(15\) unique numbers from \(1\) to \(30\).
Assign the selected \(15\) rosebushes to receive coffee grounds.
Assign the remaining \(15\) rosebushes to receive no coffee grounds.

Final answer: Number the \(30\) rosebushes from \(1\) to \(30\). Randomly select \(15\) unique numbers. Assign those selected rosebushes to receive one-half cup of coffee grounds weekly. Assign the remaining \(15\) rosebushes to receive no coffee grounds.

Part C: Explain “statistically significant” in context

The result was statistically significant at:

\[ \alpha=0.05 \]

This means the observed difference would be unlikely if coffee grounds truly had no effect.

Final answer: If coffee grounds truly have no effect on the number of roses, then the probability of getting a difference as large as or larger than the one Holly observed is less than \(0.05\). Therefore, the results are unlikely to be due to chance alone, and there is convincing evidence that coffee grounds affect the number of roses.

Question 3 Overview

The time it takes for a team song to be performed follows a normal distribution with mean \(109\) seconds and standard deviation \(16\) seconds.

\[ X\sim N(109,\ 16) \]

Part A: Probability one performance lasts longer than 120 seconds

We need:

\[ P(X>120) \]

Step 1: Convert 120 seconds to a z-score

\[ z=\frac{x-\mu}{\sigma} \] \[ z=\frac{120-109}{16} \] \[ z=\frac{11}{16}=0.6875 \]

Step 2: Find the area to the right

\[ P(X>120)=P(Z>0.6875) \]

Using a normal calculator or table:

\[ P(Z>0.6875)\approx 0.246 \]

Final answer for Part A: \[ \boxed{0.246} \] There is about a \(24.6\%\) chance that a randomly selected performance lasts longer than \(120\) seconds.

Part B: Probability at least 3 of 10 performances last longer than 120 seconds

From Part A, the probability of success is:

\[ p=0.246 \]

There are \(10\) performances, so this is a binomial distribution:

\[ X\sim \text{Binomial}(n=10,\ p=0.246) \]

We need:

\[ P(X\geq 3) \]

This means \(3\) or more performances last longer than \(120\) seconds.

Final answer for Part B: \[ \boxed{P(X\geq 3)\approx 0.462} \] There is about a \(46.2\%\) chance that at least \(3\) of the \(10\) performances last longer than \(120\) seconds.

Part C: Geometric distribution

Ben attends games until he sees the first performance longer than \(120\) seconds. This is geometric because we are waiting for the first success.

\[ p=0.246 \]

Part C(i): Mean of \(Y\)

For a geometric random variable:

\[ \mu_Y=\frac{1}{p} \] \[ \mu_Y=\frac{1}{0.246}\approx 4.07 \]

Final answer for Part C(i): \[ \boxed{4.07\text{ games}} \] On average, Ben will attend about \(4.07\) games until he sees a performance longer than \(120\) seconds.

Part C(ii): Standard deviation of \(Y\)

For a geometric random variable:

\[ \sigma_Y=\frac{\sqrt{1-p}}{p} \] \[ \sigma_Y=\frac{\sqrt{1-0.246}}{0.246} \] \[ \sigma_Y=\frac{\sqrt{0.754}}{0.246}\approx 3.53 \]

Final answer for Part C(ii): \[ \boxed{3.53\text{ games}} \]

Part D: Interpret the standard deviation

The standard deviation tells us how much the number of games usually varies from the average.

Final answer for Part D: The number of games Ben attends until he sees a performance longer than \(120\) seconds typically varies by about \(3.53\) games from the mean of about \(4.07\) games.

Question 4 Overview

A farmer wants to know whether there is a difference in the mean number of oranges produced by trees fertilized with Brand C and Brand N.

Fertilizer	Sample Size	Mean	Standard Deviation
Brand C	\(58\)	\(141\)	\(15\)
Brand N	\(58\)	\(148\)	\(19\)

Step 1: State the hypotheses

Let:

\[ \mu_C=\text{true mean number of oranges for Brand C} \] \[ \mu_N=\text{true mean number of oranges for Brand N} \]

The null hypothesis says there is no difference:

\[ H_0:\mu_C-\mu_N=0 \]

The alternative hypothesis says there is a difference:

\[ H_a:\mu_C-\mu_N\neq 0 \]

Step 2: Choose the test

We are comparing two means, so we use a two-sample \(t\)-test.

\[ \boxed{\text{Two-sample }t\text{-test}} \]

Step 3: Check conditions

Random condition

The farmer randomly assigned trees to the fertilizer treatments, so the random condition is satisfied.

Independent groups

Each tree received only one fertilizer, so the two treatment groups are independent.

Large sample condition

Both sample sizes are \(58\), and \(58\geq 30\), so the sample sizes are large enough.

The conditions for a two-sample \(t\)-test are satisfied.

Step 4: Calculate the test statistic

\[ t=\frac{(\bar{x}_C-\bar{x}_N)-0}{\sqrt{\frac{s_C^2}{n_C}+\frac{s_N^2}{n_N}}} \]

Substitute the values:

\[ t=\frac{141-148}{\sqrt{\frac{15^2}{58}+\frac{19^2}{58}}} \] \[ t=\frac{-7}{\sqrt{\frac{225}{58}+\frac{361}{58}}} \] \[ t=\frac{-7}{\sqrt{10.103}} \] \[ t=\frac{-7}{3.179}\approx -2.20 \]

Step 5: Find the p-value

\[ p\text{-value}\approx 0.03 \]

Step 6: Make the decision

The significance level is:

\[ \alpha=0.05 \]

Since:

\[ 0.03<0.05 \]

we reject \(H_0\).

Final conclusion: There is convincing statistical evidence at the \(0.05\) significance level that the mean number of oranges for trees fertilized with Brand C is different from the mean number of oranges for trees fertilized with Brand N. Because Brand N had the larger sample mean, \(148>141\), the data suggest that Brand N may produce more oranges on average than Brand C.

Question 5 Overview

The table classifies \(4,193\) professional athletes by sport and age group. The main skills are probability, conditional probability, mutual exclusivity, independence, and deciding whether a chi-square test is appropriate.

Age Group	Basketball	Football	Baseball	Total
Age \( \lt 25\)	\(232\)	\(807\)	\(259\)	\(1298\)
\(25\leq Age\lt 30\)	\(175\)	\(1326\)	\(620\)	\(2121\)
\(30\leq Age\lt 35\)	\(90\)	\(287\)	\(276\)	\(653\)
\(35\leq Age\)	\(19\)	\(41\)	\(61\)	\(121\)
Total	\(516\)	\(2461\)	\(1216\)	\(4193\)

Part A(i): Probability that a randomly selected athlete is a football player

Number of football players:

\[ 2461 \]

Total number of athletes:

\[ 4193 \]

So:

\[ P(\text{Football})=\frac{2461}{4193}\approx 0.587 \]

Final answer for Part A(i): \[ \boxed{0.587} \] About \(58.7\%\) of the athletes are football players.

Part A(ii): Probability athlete is age 25 to under 30, given they are a football player

This is a conditional probability. We only look at football players.

\[ P(25\leq Age\lt 30\mid \text{Football})=\frac{1326}{2461} \] \[ P(25\leq Age\lt 30\mid \text{Football})\approx 0.539 \]

Final answer for Part A(ii): \[ \boxed{0.539} \] About \(53.9\%\) of football players are between \(25\) and \(30\) years old.

Part B(i): Which probability does \(b\) represent?

In a mosaic plot, the width of a category represents the probability of that category. The width \(b\) is the width of the football section.

\[ b=P(\text{Football}) \]

Final answer for Part B(i): \(b\) represents the probability from Part A(i), \(P(\text{Football})\).

Part B(ii): What probability does \(x\) represent?

The mosaic plot shows:

\[ x=b\cdot h \]

In a mosaic plot, area represents a joint probability. So \(x\) represents the probability that a randomly selected athlete is both a football player and in the age group \(25\leq Age\lt 30\).

\[ x=P(\text{Football and }25\leq Age\lt 30) \] \[ x=\frac{1326}{4193}\approx 0.316 \]

Final answer for Part B(ii): \[ \boxed{x=P(\text{Football and }25\leq Age\lt 30)\approx 0.316} \]

Part C(i): Are “Baseball” and “\(35\leq Age\)” mutually exclusive?

Two events are mutually exclusive if they cannot happen at the same time.

The table shows \(61\) athletes are both baseball players and age \(35\) or older.

Final answer for Part C(i): No, they are not mutually exclusive. A professional athlete can be both a baseball player and \(35\) years old or older.

Part C(ii): Are “Baseball” and “\(35\leq Age\)” independent?

Two events are independent if knowing one event happened does not change the probability of the other event.

Step 1: Find \(P(35\leq Age)\)

\[ P(35\leq Age)=\frac{121}{4193}\approx 0.029 \]

Step 2: Find \(P(35\leq Age\mid \text{Baseball})\)

\[ P(35\leq Age\mid \text{Baseball})=\frac{61}{1216}\approx 0.050 \]

Step 3: Compare

\[ 0.029\neq 0.050 \]

Final answer for Part C(ii): No, Baseball and \(35\leq Age\) are not independent. Knowing an athlete is a baseball player changes the probability that the athlete is \(35\) or older.

Part D: Is a chi-square test for independence appropriate?

A chi-square test for independence is used when we have sample data and want to make an inference about a larger population.

But this table includes all \(4,193\) professional athletes in these sports for the recent year. Since we already have the entire population of interest, inference is not needed.

Final answer for Part D: No, a chi-square test is not appropriate. The data include the entire population of \(4,193\) athletes, so we do not need to use a chi-square test to make an inference about a larger group.

Question 6 Overview

This question studies the relationship between number of hits and number of runs for professional baseball teams.

Part A(i): Describe the scatterplot

From the scatterplot, as the number of hits increases, the number of runs tends to increase. This means the association is positive.

The points follow a roughly straight-line pattern, so the relationship is approximately linear. The relationship is also moderately strong because the points generally follow the upward pattern.

Final answer for Part A(i): There is a moderately strong, positive, approximately linear relationship between number of hits and number of runs. Teams with more hits tend to score more runs.

Part A(ii): Predict runs for \(1,250\) hits

The regression equation is:

\[ \widehat{\text{runs}}=-372.2+0.823(\text{hits}) \]

Substitute \(1250\) for hits:

\[ \widehat{\text{runs}}=-372.2+0.823(1250) \] \[ 0.823(1250)=1028.75 \] \[ \widehat{\text{runs}}=-372.2+1028.75=656.55 \]

Final answer for Part A(ii): \[ \boxed{656.55} \] A team with \(1,250\) hits is predicted to score about \(657\) runs.

Part B(i): Compare Team A with other teams in the same salary group

Team A is shown as a square. The problem says squares represent teams with salaries less than the median.

Therefore, Team A is a lower-salary team. Compared with other lower-salary teams, Team A has one of the greatest numbers of hits and one of the greatest numbers of runs.

Final answer for Part B(i): Team A is a team with salary less than the median. Compared with other teams in the same salary group, Team A has one of the highest numbers of hits and one of the highest numbers of runs.

Part B(ii): Compare strength of linear relationships

Dots represent teams with salaries greater than the median. Squares represent teams with salaries less than the median.

The dots appear closer to a straight-line pattern, while the squares are more spread out. So the relationship is stronger for teams with salaries greater than the median.

Final answer for Part B(ii): The relationship between hits and runs appears stronger for teams with salaries greater than the median because the dots follow a clearer linear pattern with less scatter than the squares.

Part C(i): Find the critical value

There are \(30\) teams, so:

\[ n=30 \]

For regression inference:

\[ df=n-2=30-2=28 \]

For a \(95\%\) confidence level with \(28\) degrees of freedom:

\[ \boxed{t^*\approx 2.05} \]

Part C(ii): 95% confidence interval for the mean number of runs

This interval estimates the mean number of runs for all teams with \(1,250\) hits.

Point estimate:

\[ 656.55 \]

Standard error:

\[ 17.48 \]

Critical value:

\[ 2.05 \]

Use:

\[ \text{Point estimate}\pm t^*(SE) \] \[ 656.55\pm 2.05(17.48) \] \[ 2.05(17.48)=35.834 \] \[ 656.55-35.834=620.716 \] \[ 656.55+35.834=692.384 \]

Final answer for Part C(ii): \[ \boxed{(620.7,\ 692.4)} \] We are \(95\%\) confident that the mean number of runs for all teams with \(1,250\) hits is between about \(621\) and \(692\) runs.

Part C(iii): 95% prediction interval for one team

This interval predicts the number of runs for one individual team with \(1,250\) hits.

Point estimate:

\[ 656.55 \]

Standard error:

\[ 56.78 \]

Critical value:

\[ 2.05 \]

\[ 656.55\pm 2.05(56.78) \] \[ 2.05(56.78)=116.399 \] \[ 656.55-116.399=540.151 \] \[ 656.55+116.399=772.949 \]

Final answer for Part C(iii): \[ \boxed{(540.2,\ 772.9)} \] A single team with \(1,250\) hits is predicted to score between about \(540\) and \(773\) runs.

Part D(i): Which has more variability: sample means or individual observations?

Individual observations usually vary more. Sample means vary less because averaging smooths out extreme values.

Final answer for Part D(i): A distribution of sample means would have less variability than a distribution of individual observations because averages are more stable than individual values.

Part D(ii): Why is the prediction interval wider than the confidence interval?

The confidence interval estimates the average number of runs for all teams with \(1,250\) hits. The prediction interval predicts the number of runs for one single team with \(1,250\) hits.

Predicting one team is harder because individual teams vary more than averages. That is why the prediction interval has more uncertainty.

The confidence interval standard error is:

\[ s\sqrt{\frac{1}{n}+\frac{(x-\bar{x})^2}{\sum (x_i-\bar{x})^2}} \]

The prediction interval standard error is:

\[ s\sqrt{1+\frac{1}{n}+\frac{(x-\bar{x})^2}{\sum (x_i-\bar{x})^2}} \]

The prediction interval formula has an extra \(1\) inside the square root. That extra \(1\) makes the prediction standard error larger.

Final answer for Part D(ii): The prediction interval is wider because it predicts one individual team’s number of runs, which has more variability than the mean number of runs for all teams. The formula for the prediction interval also includes an extra \(1\) inside the square root, making its standard error larger. Since both intervals use the same \(t^*\), the interval with the larger standard error is wider.

FAQ: AP Statistics 2026 FRQ Solutions

What topics appeared in the 2026 AP Statistics FRQs?

The questions covered descriptive statistics, boxplots, experimental design, statistical significance, normal probability, binomial and geometric distributions, two-sample \(t\)-tests, conditional probability, independence, mosaic plots, chi-square reasoning, and linear regression intervals.

Why is showing work important in AP Statistics?

AP Statistics free-response questions are scored for method, calculations, and communication. A correct final number without explanation may not receive full credit.

What is the difference between a confidence interval and a prediction interval?

A confidence interval estimates a mean response for all individuals with a certain \(x\)-value. A prediction interval predicts the response for one individual with that \(x\)-value. Prediction intervals are wider because individual outcomes vary more than averages.

Why was the chi-square test not appropriate in Question 5?

The table included the entire population of professional athletes in those sports for that year. Since the data were not a random sample used to infer to a larger population, a chi-square inference test was not needed.

What is the easiest way to improve AP Statistics FRQ answers?

Use a clear structure: identify the method, write the formula, substitute values, calculate carefully, and interpret the result in the context of the problem.