当前位置:网站首页>[reading notes > statistics] 12-01 construction of confidence interval - Introduction to the concept of confidence interval
[reading notes > statistics] 12-01 construction of confidence interval - Introduction to the concept of confidence interval
2022-07-23 08:06:00 【Xiaoming 2766】
Confidence interval construction
introduction
The previous chapter helps us use samples to estimate the population mean 、 The exact value of variance or a certain proportion . But the sample you think must be accurate ( Or unbiased ) Do you ? This chapter , Another way to estimate population statistics —— confidence interval , It has its function .
Mandy candy company uses a containing 100 The point estimator of the mean taste duration obtained from the sample of sugar balls is 62.7 minute , At the same time, the point estimator of the total variance is 25 minute . This is the most reliable estimate of taste duration possible based on the evidence at hand , But if there is a slight difference , So what do we do? ?
We did use the most representative data sample , In order to estimate the main statistics of the population , Mean value 、 variance 、 The proportion , This means that the point estimator of the mean taste duration of the super long effect gum ball is the best estimate we can give .
But there is such 2 A question :
- We rely on results from a single sample to arrive at very accurate estimates . We try our best to make it unbiased , Make it representative . But can it 100% Represents the overall , We are not absolutely sure , The reason is simple —— We use samples .
- If the sample we use is unbiased , Then this estimator is likely to be close to the truth of the whole . The problem is , How close is it “ Close enough ”?
therefore , Instead of giving an exact value as an estimate of the overall mean , It's better to adopt another method . We can specify a certain interval —— Instead of using a very precise length of time , As an estimate of the duration of sugar ball taste . for example , We can say : We expect the taste duration of sugar balls to be 55 to 65 minute , This will still make listeners feel that the duration of sugar ball taste is close 1 Hours , But there is a bigger one Error space . Determining the width of the space depends on how confident you are about the result .
confidence interval
Know the confidence interval
before , We are based on sample data , Using point estimator, the mean duration of sugar ball taste was estimated , Through point estimator , We can give a very accurate estimate of the average duration of sugar ball taste . The following figure shows the distribution of taste duration of sugar ball samples .

that , If we specify an interval for the overall mean , What will happen ? We don't specify an exact value , And specify two values — We expect the duration of sugar ball taste to be between these two values . Let's put the point estimator of the mean at the center of this interval , And set the upper and lower limits of this interval as the point estimator plus or minus some error .

The upper and lower limits of the interval are selected to make “ The overall average is between a and b Between ” This result has a specific probability . for example , You may want to choose a and b, The probability of including the overall mean in this interval is 95%. in other words , The chosen a and b bring :
P ( a < μ < b ) = 0.95 P(a < \mu < b) = 0.95 P(a<μ<b)=0.95
We use it (a,b) I mean this interval , because a and b The exact value of depends on what you want to do with “ This interval contains the overall mean ” The credibility of this result , therefore ,(a,b) go by the name of confidence interval .
that , How do we find the confidence interval of the overall mean ?
There are four steps to solve the confidence interval
- choice Overall statistics ( It refers to the total statistics that you want to use to build confidence intervals )
- Find its sampling distribution ( The previous chapter talked about sampling distribution )
- Determine the confidence level ( The interval you choose contains the probability of this statistic )
- Find the upper and lower confidence limits ( In order to find the upper and lower confidence limits , We need to know the confidence level and sampling distribution )
The first 1 Step : Select the population statistics
The first 1 The first step is to select the statistics for which the confidence interval is to be constructed , It depends on the actual problem to be solved .
In our case , It is necessary to build a confidence interval for the mean duration of gum ball taste , So it needs to be the overall average μ \mu μ Build a confidence interval .
The first 2 Step : Find the sampling distribution of the selected statistic
In order to find the sampling distribution of the population mean , We need to know the sampling distribution of the mean , I need to know X ‾ \overline{X} X And its distribution .( amount to “ The probability of sample mean in the previous chapter ” In turn, , What we know this time is the probability of the sample mean , What is required is the overall mean and variance )
Let's first find the expectation and variance . Review the previous chapter , We know the sampling distribution of the mean ( Concept : Use the mean of all samples from all possible samples to form a distribution ) The expectation and variance of is :
E ( X ‾ ) = μ V a r ( X ‾ ) = σ 2 n E(\overline{X}) = \mu \\ Var(\overline{X}) = \frac{\sigma^2}{n} E(X)=μVar(X)=nσ2
In order to use the above results to find μ \mu μ The confidence interval of , We substitute the value of the population variance σ 2 \sigma^2 σ2 And sample size n. But we don't substitute μ \mu μ The numerical , Because this is because we are calculating the confidence interval for this value .( μ \mu μ Is the overall mean , We are finding the confidence interval for it )
reason ( Maybe you can understand it later ): We are using sampling distribution to find μ \mu μ The confidence interval of , therefore , except μ \mu μ outside , We substitute all the values . Plug in σ 2 \sigma^2 σ2 and n after , Can use X ‾ \overline{X} X Find the confidence interval , We will explain it soon .
But there's a problem —— We don't know σ 2 \sigma^2 σ2 Of Truth value , Estimates must be made based on samples . What do I do ?
-> utilize Point estimator
Although we don't know the total variance σ 2 \sigma^2 σ2 True value of , But it can be estimated with its point estimator . So we substitute σ ^ 2 \hat{\sigma}^2 σ^2( Point estimator of population variance , See the previous chapter for the concept ), Or called s 2 s^2 s2, instead of σ 2 \sigma^2 σ2.( It means to use σ ^ 2 \hat{\sigma}^2 σ^2 Roughly make do with it when σ 2 \sigma^2 σ2)
So the mean and variance of the sampling distribution of the mean are equal to :
E ( X ‾ ) = μ V a r ( X ‾ ) = s 2 n E(\overline{X}) = \mu \\ Var(\overline{X}) = \frac{s^2}{n} E(X)=μVar(X)=ns2
( Again : s 2 s^2 s2 Is the point estimator of variance . We don't know the true value of the total variance , So we use the sample variance to estimate .)
Mandy confectionery company uses contains 100 Calculate the estimated value of a sample of sugar balls , And calculate s 2 = 25 s^2=25 s2=25, therefore :
V a r ( X ‾ ) = s 2 n = 25 / 100 = 0.25 Var(\overline{X})=\frac{s^2}{n} = 25/100 = 0.25 Var(X)=ns2=25/100=0.25
besides , We also need to know clearly X ‾ \overline{X} X The distribution of .

The first 3 Step : Determine the confidence level
The confidence level indicates You want to be interested in “ Confidence intervals contain population statistics ” How sure is this statement . for example , Suppose we want the confidence level of the overall mean to be 95%, This means that the probability that the overall mean is in the confidence interval is 0.95.

Be careful : The higher the confidence level , The wider the range , The greater the probability that the confidence interval contains population statistics .
Choose a reasonable confidence level , It can guarantee a high probability , And make the interval narrow enough . Otherwise, for example : We can say that the average duration of sugar ball taste is 0 to 3 Between days , But you can't know how long the taste of sugar balls actually lasts .
The first 4 Step : Find the upper and lower confidence limits
The last step is to ask a and b— Confidence interval Upper and lower limit , The upper and lower bounds indicate the left and right boundaries of a range — The average is 95% The probability of falling into this range .a and b The exact value of depends on the sampling distribution to be used and the confidence level to be possessed .
For our example , We need to make the mean duration of sugar ball taste have 95% The degree of confidence , namely , μ \mu μ Located in the a and b The probability between must be 0.95. We also know that , X ‾ \overline{X} X In line with the normal distribution , among X ‾ ~ N ( μ , 0.25 ) \overline{X}~N(\mu,0.25) X~N(μ,0.25).

utilize X ‾ \overline{X} X We can find the distribution of a and b Value . namely , We can use X ‾ ∼ N ( μ , 0.25 ) \overline{X} \sim N(\mu, 0.25) X∼N(μ,0.25) Find out a and b, for example P ( X ‾ < a ) = 0.025 P(\overline{X}<a) = 0.025 P(X<a)=0.025 and P ( X ‾ > b ) = 0.025 P(\overline{X} > b) = 0.025 P(X>b)=0.025.
because X ‾ \overline{X} X In line with the normal distribution , So we can Find the confidence interval with normal distribution . The method is similar to the algorithm mentioned above : Calculate the standard score , Query the standard normal distribution probability table , Get the desired results .
1 seek Z
Yes X ‾ \overline{X} X Standardize .
Z = X ‾ − μ 0.25 , among Z ∼ N ( 0 , 1 ) Z = \frac{\overline{X}-\mu}{\sqrt{0.25}}, among Z\sim N(0,1) Z=0.25X−μ, among Z∼N(0,1)
The following is the normalized confidence interval graph :

utilize P ( Z < z a ) = 0.025 P(Z < z_a) = 0.025 P(Z<za)=0.025 and P ( Z > z b ) = 0.0255 P(Z > z_b) = 0.0255 P(Z>zb)=0.0255 We can work out z a , z b z_a, z_b za,zb, They are the upper and lower limits of the standard confidence interval .

2 use μ \mu μ Rewrite inequality
Only this and nothing more , We get P ( − 1.96 < Z < 1.96 ) = 0.95 P(-1.96<Z<1.96) = 0.95 P(−1.96<Z<1.96)=0.95, namely :
P ( − 1.96 < X ‾ − μ 0.5 < 1.96 ) = 0.95 P(-1.96 < \frac{\overline{X}-\mu}{0.5} < 1.96) = 0.95 P(−1.96<0.5X−μ<1.96)=0.95
use μ \mu μ Rewrite inequality , You can get μ \mu μ The confidence interval of .
− 1.96 < X ‾ − μ 0.5 < 1.96 − 0.98 < X ‾ − μ < 0.98 X ‾ − 0.98 < μ < X ‾ + 0.98 -1.96 < \frac{\overline{X}-\mu}{0.5} < 1.96 \\ -0.98 < \overline{X}-\mu < 0.98 \\ \overline{X} - 0.98 < \mu < \overline{X} + 0.98 −1.96<0.5X−μ<1.96−0.98<X−μ<0.98X−0.98<μ<X+0.98
3 Finally, ask for X ‾ \overline{X} X The numerical
Write the inequality , We are very close to the value describing the typical taste duration of sugar balls —— μ \mu μ The confidence interval of . namely , We use :
P ( X ‾ − 0.98 < μ < X ‾ + 0.98 ) = 0.95 P(\overline{X}-0.98 < \mu < \overline{X}+0.98) = 0.95 P(X−0.98<μ<X+0.98)=0.95
Here is the sketch :

Then just ask for the money X ‾ \overline{X} X, We can get the upper and lower confidence limits .
X ‾ \overline{X} X refer to The distribution of the sample mean , So we can use samples from Mandy candy company x ‾ \overline{x} x value ( The term is : Sample mean ).
Tips : There is no substitute , See the following for specific reasons “ ask 2”. V a r ( X ‾ ) = σ 2 n Var(\overline{X}) = \frac{\sigma^2}{n} Var(X)=nσ2 Medium σ 2 \sigma^2 σ2 Because it is the total variance , Using the total variance point estimator s 2 s^2 s2 replace .

In this way, the confidence interval is obtained . In the interval (61.72,63.68) The probability of including the overall mean duration of sugar ball taste in 95%.
Use confidence intervals instead of point estimators , An accurate and accurate estimation of the taste duration of sugar balls is given , There is no need to mention precise figures —— Even if the sample has errors, there is still room for maneuver .
Step summary
Let's review the previous steps of confidence interval construction .
First Select the population statistics used to construct the confidence interval . We need to find the confidence interval of the mean duration of sugar ball taste , So we need to build the confidence interval of the mountain .
After determining the total statistics used to construct the confidence interval , next Find its sampling distribution . We obtain the expectation and variance of the sampling distribution of the mean , Substitute in Division M The value of each statistic other than , So we found that we can use the normal distribution of the text .
And then , We determined the confidence level used to construct the confidence interval ——95%.
In the end, it must Find the upper and lower confidence limits of the confidence interval . We use the confidence level and sampling distribution to get the appropriate confidence interval .
The same steps are used repeatedly to build confidence intervals , Therefore, some simplification can be made , It depends on the required confidence level and the distribution of test statistics . As follows , Just look at the overall estimate required 、 Overall distribution and various conditions , Then substitute the population statistics or its estimators , That's it . The number c Depends on the confidence level .

The example above should be No 3 In this case .
Example ( Just substitute )

ask : Ask before X ‾ \overline{X} X Of expectations and variances , Why substitute σ 2 \sigma^2 σ2 Point estimator of , But do not substitute μ \mu μ Point estimator of ?
answer : Because what we need is μ \mu μ The confidence interval of , So there's no need to x ‾ \overline{x} x Instead of μ \mu μ. We need to find the content μ \mu μ The expression of , In order to find the confidence interval .
ask : Why x ‾ \overline{x} x As X ‾ \overline{X} X Value ?
answer : X ‾ \overline{X} X The distribution of is the sampling distribution of the mean . This is how it came : Take each size from the total as n Possible samples of , Then use the mean value of all samples to form a sampling distribution .
x ‾ \overline{x} x Is the specific mean value from the sample , So we use it to find the confidence interval .
ask : What is the difference between confidence interval and confidence level ?
answer : The confidence level is “ The statistics are in the confidence interval ” Probability , Usually a percentage , for example 95%. The confidence interval gives the interval itself —— The upper and lower limits of the actual range of numbers .
ask : We have obtained μ \mu μ Of 95% The confidence interval is (61.72, 63.68), What exactly does this mean ?
answer : It means : If you plan to take multiple samples of the same size , Then build confidence intervals for all these samples , Then there are 95% Will contain the true value of the overall mean . From this you know , The confidence interval constructed by this method is 95% In all cases, it will include the overall mean .
ask : Are all confidence intervals based on normal distribution ?
answer : Is not the case, . We will then talk about intervals based on other distributions .
ask : Since it's just a matter of substituting numerical values into a simple algorithm , Why are there so many steps ?
answer : These steps are to let you see the essence of the problem , Understand the construction process of confidence interval . Most of the time , You just need to insert the value .
ask : Is it necessary to make continuity correction when using confidence intervals ?
answer : Theoretically, yes , However, it is often ignored in practice , That is to say, it is only necessary to calculate the confidence interval by substituting the numerical value in the simple algorithm .
边栏推荐
- QT document reading notes - qaudioinput & qaudioformat parsing and examples
- Example analysis of SQL error reporting and blind injection
- Worthington哺乳动物乳酸脱氢酶研究——特点及测定方案
- 互联网流量编排方案
- How to open the file in keil is the real path in the 109th blog of fledgling Xiao Li
- pip更新一个package
- Scala idea prompt function parameters
- PNY file to picture
- VMware 中搭建 SylixOS 环境
- How to use C language to realize simple employee information management system
猜你喜欢

VMware 中搭建 SylixOS 环境

File upload, server file name Chinese garbled file upload, server file name Chinese garbled

多商户系统的直播功能用过吗?用过的朋友扣个 666!

The boss asked me to do an IP territorial function and an open source library!

主题域模型

21 -- product of arrays other than itself

HCIP --- BGP综合实验

Building a sylixos environment in VMWare

实验三 LZW

Expérience II Yuv
随机推荐
C voice realizes TCP client and TCP server, and QT call test
Experiment 5 JPEG
这不是真正意义上的元宇宙,元宇宙应当具备自身鲜明的特质和独特的发展逻辑
C#中C/S端如何实现WebService服务
Etcdv3 · watch operation implementation and relevant key instructions
How to use C language to realize simple employee information management system
yolov5 test. Py broken pipe error: [errno 32] broken pipe problem solving
Graduation project ----- Internet of things environment detection system based on stm32
scala 主构造器_Scala主构造器深度
RestClient操作索引库-初始化RestClient
Celebrity interview | various strange current situations in the open source community -- night sky Book Chen Zili tison
HCIP --- BGP综合实验
Detailed analysis of the 110th blog of the fledgling Xiao Li in stm32__ NVIC_ Setprioritygrouping (uint32_t prioritygroup) function
How to implement WebService service on the c/s side of C #
C语音实现tcp客户端和tcp服务端,Qt调用测试
【干货原创】发现了一个好用到爆的数据分析利器
Worthington对肝细胞分离系统的测试及相关优化方案
大咖访谈 | 开源社区里各种奇怪的现状——夜天之书陈梓立tison
Codeforces round 809 (Div. 2) (Questions C and D1)
MPLS VPN 跨域-optionB