当前位置:网站首页>Understanding of expectation, variance, covariance and correlation coefficient

Understanding of expectation, variance, covariance and correlation coefficient

2022-07-08 01:23:00 You roll, I don't roll

Catalog

1、 Mathematical expectation ( mean value )

2、 variance D(X) or Var(X)

3、 covariance Cov(X,Y)

 4、 The correlation coefficient ρ

 5、 Covariance matrix


One sentence summary : expect Reflects the average level , variance It reflects the fluctuation degree of data , covariance It reflects the correlation between two random variables ( Dimensionality ), The correlation coefficient It reflects the dimensionless correlation between two random variables .

1、 Mathematical expectation ( mean value )

For random variables and their probabilities weighted mean

The expectation here is the mean , In statistics, samples are used to replace the whole in most cases , Therefore, the average value of the sample is calculated as :

2、 variance D(X) or Var(X)

It is used to understand the deviation between the actual index and the average value , That is, it reflects the dispersion of data values .

if X Value set , Then its variance is small , conversely , X The more dispersed, the greater the variance .

D(X) Satisfy the following properties :

When X And Y Satisfy Independent homologous distribution (iid when , , here :

there   That's what we'll talk about later   covariance .

in addition   Standard deviation ( Mean square error ) The calculation formula of is :  , And X Have the same dimension .

  In sample analysis , The calculation formula of variance is :

​​​​​​​ Be careful : This is divided by 1/(n-1).

Why does divide appear in variance calculation n And divided by n-1 Two cases ?:

Divide n It calculates the population variance  , Divide n-1 It calculates the sample variance  ( That is, the unbiased estimation of the total variance ). But in reality, it is often unrealistic to calculate the total variance , One of the research contents of statistics is to infer the population with samples , Therefore, we often use sample variance to replace the overall situation .

Why is the sample variance calculated by n-1 Well ? Because we must calculate the sample mean before calculating the sample variance  ( let me put it another way , Will sum the samples ), This leads to the n If the item is determined n-1 Item's words , The first n Items can be determined , That is, the degree of freedom is n-1, So the probability of each occurrence is 1/(n-1) , So you have to divide by n-1. In terms of linear algebra , this n Quantity is not independent , If the n If a quantity is regarded as a vector, it is linearly related , Can be n-1 A linearly independent vector representation .

If divided by n It means that we know the mean value of the population sample in advance μ( This μ It is known. , Not calculated , Because in reality, it is often unrealistic to calculate the overall average ), At this time, the probability of occurrence of all quantities is 1/n, So the variance at this time   The calculation of is divided by n. But this situation can only be regarded as the ideal calculation method , In reality, it is basically impossible , In reality, most cases estimate the population based on samples , Therefore, our common variance calculation formula is to divide by n-1 了 .

3、 covariance Cov(X,Y)

Covariance is used to describe the correlation between two variables . Covariance is a dimensional quantity .

if X And Y Are independent of each other , be  .

 4、 The correlation coefficient ρ

The correlation coefficient is also used to describe the correlation between two variables , But unlike covariance , The correlation coefficient is a dimensionless quantity , The formula is as follows .

in addition , call

by X、Y Standardization of . Then there are :

​​​​​​​ The nature of the correlation coefficient :

  •  .   The greater the value of, the greater the degree of linear correlation ,   When the value is large, it is called X And Y The linear correlation is good ;   Time description X And Y There is no linear relationship , But there may be other relationships , For example, for obedience   ​​​​​​​ The random variable on X Come on , if X1=sinX,X2=cosX, although  , But satisfied  .
  •   The necessary and sufficient conditions for : There is a constant a、b, bring

 5、 Covariance matrix

  Covariance matrix is used to describe the covariance between different dimensions of multidimensional random variables .

set up n Dimensional random variable   The second-order covariance of is

​​​​​​​ Then the matrix

  be called n Dimensional random variable   The covariance matrix of . because  , Therefore, the covariance matrix is also a symmetric matrix , The variance forms the elements on its diagonal , Covariance constitutes the non diagonal element . In a general way ,n The distribution of dimensional random variables is unknown , Or it's too complicated , So difficult to deal with mathematically , Therefore, covariance matrix is very important in practical application . Covariance matrix is widely used in statistics, machine learning and other fields .

原网站

版权声明
本文为[You roll, I don't roll]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202130542480136.html