1. Chapter 7: Scatterplots, Associations, and Correlation
    1. Scatterplots: display patterns, trends, relationships, and value
      1. Between two q-variables
        1. Assocation between them?
      2. Looking at Scatterplots
        1. direction
          1. positive
          2. negative
        2. form
          1. straight line relationship
          2. appears as cloud or swarm of points
          3. linearity
        3. scatter
          1. lots of random points
          2. small amount random points
      3. Mechanics
        1. y-axis
        2. x-axis
        3. computerized scatterplots do not show origin
    2. Two Variable Roles
      1. explanatory
        1. x-axis
      2. response variable
        1. y-axis
    3. Correlation
      1. find z-scores of x- and y-variables
        1. multiply each coordinates z\/x and z\/y scores together, find the sum
          1. summaries direction + strength of assocation
          2. divide sum by n-1
      2. Correlation Conditions
        1. Correlation measures strength of Linear
          1. Quantitative Variables condition
          2. Straight enough Condition
          3. must be linear!
          4. Outlier Condtion
          5. outliers distort correlation
        2. Between +1 and -1
          1. closer to -1 or +1, more linear the association
          2. no units
      3. Correlation isn't Association
        1. Association: vague term describing relationship between two variables
        2. Correlation: very precise term describing LINEAR relationship between quantitative variables
      4. Expressed as r
    4. Lurking Variables
      1. hidden variable stands behind relationship + affects both variables
  2. Chapter 8: Linear Regression
    1. Models for Data
      1. model relationship w/ line
        1. require numbers
          1. perameters
      2. linear model: equation of straight line through data
      3. specify Normal model w/ mean and S
      4. model
    2. Residuals
      1. Linear models not perfect
      2. predicted value
        1. estimate
        2. y-hat
          1. observed value - predicted value
      3. residuals: diff. between observed value + associated predicted value
        1. how far off model's prediction
      4. Data = model + Residual or Residual = Data - Model
    3. "Best Fit" Means Least Squares
      1. line of best fit: line which sum of squared residuals smallest
      2. square residuals
        1. makes positive
        2. add them up
          1. tells how far off line is
    4. Correlation and the Line
      1. slope: value m
        1. larger m, steeper slope
        2. negative
          1. negative association
        3. zero
          1. horizontal line
      2. correlation coefficent, r, for m
        1. y = rx
          1. moving on S away from mean in x moves r S away from mean in y
    5. Size of Predicted Values
      1. regression to the mean
        1. predicted y tends closer to its mean than correspond x was
      2. regression line
        1. linear equation satisfies least squares criterion
    6. Units
      1. y-intercept
        1. b sub 0
        2. value of y crosses y-axis
      2. slope
        1. b sub 1
    7. R-Squared
      1. gives fraction of data's variance accounted for by model
      2. 1 minus R-squared is fraction of original variance left in residuals
      3. given as percentage, typically
      4. R-squared of 100% is pefect fit w/ no scatter around line
      5. measures success of regression line
    8. Examining Residuals
      1. check whether linear model appropriate
        1. plot residuals
      2. histogram
        1. displays multiple modes + y-outliers
      3. scatterplot
        1. residuals versus predicted values
          1. reveals bends, groups, model outliers
  3. Chapter 9: Regression Wisdom
    1. Subset
      1. data consist two or more groups been thrown together
        1. best fit diff. linear models each group
      2. found by residual plots
    2. Sifting Residuals for Groups
      1. May need to analyze groups of data i n scatterplot separately (if diff. behavior than most of data)
    3. Extrapolation
      1. new x-values not part of the linear regression model plugged into the equation that venture far from the mean
      2. dangerous
      3. time as x-variable
        1. extrapolation becomes attempt peer into future
    4. Outliers
      1. can strongly influence regression
        1. even single point
      2. outlier: any point that stands away from others
        1. model outliers
          1. removing generally increases R-squared
        2. x-outliers
        3. y-outliers
      3. leverage: x-value outliers who are far from the mean of x
        1. pull line close to them
        2. sometimes determine slope and intercept
        3. removing can decrease R-squared
    5. Influential Points
      1. can hide in plots of residuals
      2. seen easier in scatterplots of original data
    6. Lurking Variables and Causation
      1. correlation isn't causation
      2. lurking variable: no explicitly part of model but affect variable in model
    7. Things can go wrong
      1. assure straightness of relationship
      2. Do not extrapolate
      3. Do not use extrapolation with time
      4. Subsets in regression: separate them/analyze separately
      5. Outliers
      6. Leverage points
      7. Lurking Variables
      8. Summary statistics
        1. less variable than raw data
          1. inflate impression of strength