Basic Statistics- (03)Correlation & Regression

 

Basic Statistics - (03) Correlation & Regression

1. Correlation

  • Contingency table 列联表

    A contingency table enables you to display the relationship between two ordinal or nominal variables.

    • Conditional proportion元素比例: A particular row/column of the contingency table.
    • Marginal proportion综合比例: the row/column sum of the contingency table.
  • Scatterplot 分布图

    A scatterplot is a graphical display for two quantitative variables using the horizontal (x) axis for the explanatory variable x and the vertical (y) axis for the response variable y.

2. Pearson’s r

To determine how strong the correlations are.

  • The shape of scatterplot

  • r value: [-1, 1]

  • Important note: Check scatterplot before you calculate Pearson’s r.

3.Regression

  • Regression line–r

    • Ordinary least square regression 最小二乘法

      }{:height=”40%” width=”40%”}

    • compute regression line

    • r^2

    The prediction with only one variable is much less accurate than when you have information about two “related” variables.

    万物相连,世间的许多关系可能让你无法想象。巧克力和诺贝尔奖获得者国籍

    • tells you how much better a regression line predicts the value of a dependent variable than the mean of the variable.

      • the amount of variance in your dependent variable(y) that is explained by your independent variable(x).

4.confounding or lurking variables

  • :exclamation:correlation/regression ≠ causation 相关/回归不等于因果

    • 社会科学主要研究相关性而非因果性。
    • 随时提醒自己注意是否还有其他变量因子。
    • confounding variable 混淆变量
      • include in the study
    • lurking variable 潜在变量
      • not include in the study but has the potential to influence the result.
  • :exclamation:be aware of outliers

    • Outlier may cause the positive coefficient into a negative one.

5. Example Pearson’s r and regression