AP Precalculus: Bivariate Statistics

Master correlation, regression, and two-variable data analysis

📈 Scatter Plots 🔗 Correlation 📏 Regression 📊 r²

📚 Understanding Bivariate Data

Bivariate statistics analyzes the relationship between two variables. We use scatter plots to visualize data, correlation to measure the strength of linear relationships, and regression to model and predict. Understanding these concepts helps you describe patterns, make predictions, and interpret real-world data.

1 Scatter Plots & Outliers

A scatter plot displays pairs of data (x, y) as points. An outlier is a point that deviates significantly from the general pattern.

Describing Scatter Plots

  • Direction: Positive (up-right), Negative (down-right), or No trend
  • Form: Linear, Curved, or No pattern
  • Strength: Strong, Moderate, or Weak
  • Unusual features: Outliers, clusters, gaps

Outlier Effects

Outliers can dramatically change correlation and regression. Always check for outliers before interpreting results.

Influential Points

Points at extreme x-values that strongly affect the regression line — removing them changes the line significantly.

2 Correlation Coefficient (r)

The correlation coefficient r measures the strength and direction of the linear relationship between two variables.

Correlation Coefficient Formula \(r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \cdot \sum (y_i - \bar{y})^2}}\)

Interpreting r

-1
Perfect
Negative
-0.5
Moderate
Negative
0
No Linear
Correlation
+0.5
Moderate
Positive
+1
Perfect
Positive
|r| > 0.8
Strong correlation
0.5 < |r| < 0.8
Moderate correlation
0.3 < |r| < 0.5
Weak correlation
|r| < 0.3
Very weak/No correlation
⚠️ Correlation ≠ Causation

Correlation only measures association, not cause and effect. Two variables can be strongly correlated due to a third variable (lurking variable) or coincidence.

3 Linear Regression (Least Squares)

The least squares regression line (LSRL) is the line that minimizes the sum of squared residuals — it's the "best fit" line for the data.

Regression Line Equation \(\hat{y} = a + bx\) or \(\hat{y} = b_0 + b_1 x\)
Slope (b)
\(b = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\)
Or: \(b = r \cdot \frac{s_y}{s_x}\)
Y-Intercept (a)
\(a = \bar{y} - b\bar{x}\)
Line passes through \((\bar{x}, \bar{y})\)
📌 Example

Given: \(\bar{x} = 10\), \(\bar{y} = 25\), \(r = 0.85\), \(s_x = 4\), \(s_y = 8\)

Slope: \(b = 0.85 \times \frac{8}{4} = 0.85 \times 2 = 1.7\)

Intercept: \(a = 25 - 1.7(10) = 25 - 17 = 8\)

Equation: \(\hat{y} = 8 + 1.7x\)

4 Interpreting Regression

Understanding what the slope and intercept mean in context is essential for interpreting regression results.

Slope Interpretation

"For each 1-unit increase in x, the predicted y increases/decreases by [slope] units."

Intercept Interpretation

"When x = 0, the predicted y is [intercept]."

📌 Example in Context

Equation: \(\hat{y} = 8 + 1.7x\) where x = hours studied, y = test score

Slope: "For each additional hour of study, the predicted test score increases by 1.7 points."

Intercept: "A student who studies 0 hours is predicted to score 8 points." (May not be meaningful if 0 is outside data range)

⚠️ Extrapolation Warning

Don't use the regression line to predict y for x-values outside the range of your data. The relationship may not hold beyond the observed values.

5 Residuals & Coefficient of Determination

A residual measures prediction error. (coefficient of determination) measures how well the regression line fits the data.

Residual
\(\text{Residual} = y - \hat{y}\)
Actual value minus predicted value
Coefficient of Determination
\(r^2 = (r)^2\)
Proportion of variance explained

Interpreting r²

Meaning
0.90 90% of variability in y is explained by the linear relationship with x
0.64 64% of variability in y is explained by the linear relationship with x
0.25 Only 25% of variability is explained — poor fit
💡 Residual Plot Analysis

Plot residuals vs. x-values. Random scatter = good linear fit. Patterns = linear model is inappropriate (try nonlinear).

6 Exponential Regression

Exponential regression fits data that grows or decays by a constant percentage — when values multiply rather than add.

Exponential Model \(y = ab^x\) or \(y = ae^{kx}\)

When to Use Exponential

Data shows multiplicative change (percent increase/decrease). Scatter plot curves upward or downward.

Linearizing Method

Take log of y: if log(y) vs. x is linear, exponential is appropriate.

b > 1
Exponential growth
0 < b < 1
Exponential decay
a
Initial value (when x = 0)
📌 Example

Model: \(y = 100(1.05)^x\) for population growth

Interpretation: Starting population is 100, growing at 5% per year.

After 10 years: \(y = 100(1.05)^{10} \approx 163\)

📋 Quick Reference

Correlation r

Strength & direction of linear relationship, -1 ≤ r ≤ 1

Slope b

\(b = r \cdot \frac{s_y}{s_x}\)

Intercept a

\(a = \bar{y} - b\bar{x}\)

Residual

\(y - \hat{y}\)

% of variance explained

Exponential

\(y = ab^x\)

Need Help with Bivariate Statistics?

Our expert tutors provide personalized instruction to help you excel in AP Precalculus.

Book Free Consultation