Basic Statistics - (10) Confidence Interval
1. Inferential Statistics
- Estimation
- a point estimate
- an interval estimate is a range of numbers which, most likely, contains the actual population value.
2. Confidence Interval(CI)
-
公式
3. CI for mean with unknown population
-
It is impossible to compute the confidence interval because we usually don’t know the value of the population standard deviation.
- \[\overline{x} \pm 1.96*\sigma_{\overline{x}}\]
- \[\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}}\]
- 总数标准差未知。
-
T Distribution
-
The T distribution (also called Student’s T Distribution) is a family of distributions that look almost identical to the normal distribution curve, only a bit shorter and fatter. The t distribution is used instead of the normal distribution when you have small samples (for more on this, see: t-score vs. z-score). The larger the sample size, the more the t distribution looks like the normal distribution. In fact, for sample sizes larger than 20 (e.g. more degrees of freedom), the distribution is almost exactly like the normal distribution.
-
2 assumptions
- randomization sampling
- approximately normal population distribution
- be wary of extreme outliers
-
4. CI for proportions
-
It is impossible to compute the confidence interval because we usually don’t know the value of the population standard deviation.
- the population standard deviation pi is unknown.
-
公式
-
二项样本分布:stick to standard normal distribution
-
1 assumption
- at least 15 successes and 15 failures. \(\begin{align*} np &\geq15 \\ n(1-p) &\geq 15 \end{align*}\)
-
5. Confidence Level
-
We have to compromise between confidence and precision.
As one gets better, the other gets worse.
6. Sample Size
-
Mean
- \[n = \frac{\sigma^2 * z^2}{m^2}\]
-
m: margin of error 容错区间。
- \sigma: educated guess
-
Proportion
- \[n= \frac{p(1-p) z^2}{m^2}\]
-
safe approach最大数值