14 - Scatterplots

Explain the following quote by George Box: “All models are wrong, but some are useful.”

Practice

Chapter 7 # 5, 11,15,17,31 Chapter 8 # 1,9,11,28 Chapter 9 # 21,31

Vocabulary

model, scatterplot, association, response variable, explanatory variable, correlation coefficient, lurking variable, residuals, regression line, Se, R2, r, e, extrapolation, leverage, influential point

Study Questions

  1. What is the distinction between ‘association’ and ‘correlation’?
  2. What do you discuss/ point-out when describing the scatter and any association present?
  3. How do you know when it’s appropriate to calculate correlation?
  4. What impact (if any) does changing units have on correlation? Explain.
  5. Does correlation imply causation? Explain.
  6. What is the role of residuals in determining the appropriateness of the linear model?
  7. What does R2 tells us?
  8. Does y-intercept have meaning in every context? Explain with an example.
  9. What is the meaning of a positive residual? negative residual?
  10. What points would be considered unusual on a scatterplot? How would you identify influence points, points with high residuals, and leverage points?
  11. What would you do (in regards to running regression and reporting results) if significant outliers were present in your scatter?

Resources

  1. Link: Rossman/Chance - Correlation Guessing Game
  2. Link: Correlation and Regression
  3. Slide Deck: 14-Scatterplots.pdf