Basic Statistics - (03) Correlation & Regression
1. Correlation
-
Contingency table 列联表
A contingency table enables you to display the relationship between two ordinal or nominal variables.
- Conditional proportion元素比例: A particular row/column of the contingency table.
- Marginal proportion综合比例: the row/column sum of the contingency table.
-
Scatterplot 分布图
A scatterplot is a graphical display for two quantitative variables using the horizontal (x) axis for the explanatory variable x and the vertical (y) axis for the response variable y.
2. Pearson’s r
To determine how strong the correlations are.
-
The shape of scatterplot
-
r value: [-1, 1]
-
Important note: Check scatterplot before you calculate Pearson’s r.
3.Regression
-
Regression line–r
-
Ordinary least square regression 最小二乘法
}{:height=”40%” width=”40%”}
-
compute regression line
-
r^2
The prediction with only one variable is much less accurate than when you have information about two “related” variables.
万物相连,世间的许多关系可能让你无法想象。巧克力和诺贝尔奖获得者国籍
-
tells you how much better a regression line predicts the value of a dependent variable than the mean of the variable.
-
the amount of variance in your dependent variable(y) that is explained by your independent variable(x).
-
-
4.confounding or lurking variables
-
correlation/regression ≠ causation 相关/回归不等于因果
- 社会科学主要研究相关性而非因果性。
- 随时提醒自己注意是否还有其他变量因子。
- confounding variable 混淆变量
- include in the study
- lurking variable 潜在变量
- not include in the study but has the potential to influence the result.
-
be aware of outliers
- Outlier may cause the positive coefficient into a negative one.