Basic Statistics - (03) Correlation & Regression
1. Correlation

-
Contingency table 列联表
A contingency table enables you to display the relationship between two ordinal or nominal variables.

- Conditional proportion元素比例: A particular row/column of the contingency table.
- Marginal proportion综合比例: the row/column sum of the contingency table.
-
Scatterplot 分布图
A scatterplot is a graphical display for two quantitative variables using the horizontal (x) axis for the explanatory variable x and the vertical (y) axis for the response variable y.

2. Pearson’s r

To determine how strong the correlations are.


-
The shape of scatterplot

-
r value: [-1, 1]

-
Important note: Check scatterplot before you calculate Pearson’s r.
3.Regression
-
Regression line–r
-
Ordinary least square regression 最小二乘法
}{:height=”40%” width=”40%”}

-
compute regression line



-
r^2

The prediction with only one variable is much less accurate than when you have information about two “related” variables.
万物相连,世间的许多关系可能让你无法想象。巧克力和诺贝尔奖获得者国籍
-
tells you how much better a regression line predicts the value of a dependent variable than the mean of the variable.

-
the amount of variance in your dependent variable(y) that is explained by your independent variable(x).


-
-
4.confounding or lurking variables
-
correlation/regression ≠ causation 相关/回归不等于因果- 社会科学主要研究相关性而非因果性。
- 随时提醒自己注意是否还有其他变量因子。
- confounding variable 混淆变量
- include in the study
- lurking variable 潜在变量
- not include in the study but has the potential to influence the result.
-
be aware of outliers- Outlier may cause the positive coefficient into a negative one.