当前位置:网站首页>Understanding of expectation, variance, covariance and correlation coefficient
Understanding of expectation, variance, covariance and correlation coefficient
2022-07-08 01:23:00 【You roll, I don't roll】
Catalog
1、 Mathematical expectation ( mean value )
4、 The correlation coefficient ρ
One sentence summary : expect Reflects the average level , variance It reflects the fluctuation degree of data , covariance It reflects the correlation between two random variables ( Dimensionality ), The correlation coefficient It reflects the dimensionless correlation between two random variables .
1、 Mathematical expectation ( mean value )
For random variables and their probabilities weighted mean :
The expectation here is the mean , In statistics, samples are used to replace the whole in most cases , Therefore, the average value of the sample is calculated as :
2、 variance D(X) or Var(X)
It is used to understand the deviation between the actual index and the average value , That is, it reflects the dispersion of data values .
if X Value set , Then its variance is small , conversely , X The more dispersed, the greater the variance .
D(X) Satisfy the following properties :
When X And Y Satisfy Independent homologous distribution (iid) when , , here :
there That's what we'll talk about later covariance .
in addition Standard deviation ( Mean square error ) The calculation formula of is : , And X Have the same dimension .
In sample analysis , The calculation formula of variance is :
Be careful : This is divided by 1/(n-1).
Why does divide appear in variance calculation n And divided by n-1 Two cases ?:
Divide n It calculates the population variance , Divide n-1 It calculates the sample variance ( That is, the unbiased estimation of the total variance ). But in reality, it is often unrealistic to calculate the total variance , One of the research contents of statistics is to infer the population with samples , Therefore, we often use sample variance to replace the overall situation .
Why is the sample variance calculated by n-1 Well ? Because we must calculate the sample mean before calculating the sample variance ( let me put it another way , Will sum the samples ), This leads to the n If the item is determined n-1 Item's words , The first n Items can be determined , That is, the degree of freedom is n-1, So the probability of each occurrence is 1/(n-1) , So you have to divide by n-1. In terms of linear algebra , this n Quantity is not independent , If the n If a quantity is regarded as a vector, it is linearly related , Can be n-1 A linearly independent vector representation .
If divided by n It means that we know the mean value of the population sample in advance μ( This μ It is known. , Not calculated , Because in reality, it is often unrealistic to calculate the overall average ), At this time, the probability of occurrence of all quantities is 1/n, So the variance at this time The calculation of is divided by n. But this situation can only be regarded as the ideal calculation method , In reality, it is basically impossible , In reality, most cases estimate the population based on samples , Therefore, our common variance calculation formula is to divide by n-1 了 .
3、 covariance Cov(X,Y)
Covariance is used to describe the correlation between two variables . Covariance is a dimensional quantity .
if X And Y Are independent of each other , be .
4、 The correlation coefficient ρ
The correlation coefficient is also used to describe the correlation between two variables , But unlike covariance , The correlation coefficient is a dimensionless quantity , The formula is as follows .
in addition , call
by X、Y Standardization of . Then there are :
The nature of the correlation coefficient :
- . The greater the value of, the greater the degree of linear correlation , When the value is large, it is called X And Y The linear correlation is good ; Time description X And Y There is no linear relationship , But there may be other relationships , For example, for obedience The random variable on X Come on , if X1=sinX,X2=cosX, although , But satisfied .
- The necessary and sufficient conditions for : There is a constant a、b, bring
5、 Covariance matrix
Covariance matrix is used to describe the covariance between different dimensions of multidimensional random variables .
set up n Dimensional random variable The second-order covariance of is
Then the matrix
be called n Dimensional random variable The covariance matrix of . because , Therefore, the covariance matrix is also a symmetric matrix , The variance forms the elements on its diagonal , Covariance constitutes the non diagonal element . In a general way ,n The distribution of dimensional random variables is unknown , Or it's too complicated , So difficult to deal with mathematically , Therefore, covariance matrix is very important in practical application . Covariance matrix is widely used in statistics, machine learning and other fields .
边栏推荐
- Basic realization of line graph
- 1. Linear regression
- Scheme selection and scheme design of multifunctional docking station for type C to VGA HDMI audio and video launched by ange in Taiwan | scheme selection and scheme explanation of usb-c to VGA HDMI c
- Led serial communication
- Ag9310 same function alternative | cs5261 replaces ag9310type-c to HDMI single switch screen alternative | low BOM replaces ag9310 design
- The whole life cycle of commodity design can be included in the scope of industrial Internet
- Frrouting BGP protocol learning
- Several frequently used OCR document scanning tools | no watermark | avoid IQ tax
- [loss function] entropy / relative entropy / cross entropy
- Transportation, new infrastructure and smart highway
猜你喜欢
130. Zones environnantes
2022 tea master (intermediate) examination questions and tea master (intermediate) examination skills
Ag9310 same function alternative | cs5261 replaces ag9310type-c to HDMI single switch screen alternative | low BOM replaces ag9310 design
2021 Shanghai safety officer C certificate examination registration and analysis of Shanghai safety officer C certificate search
130. Surrounding area
Basic realization of line chart (II)
Redis master-slave replication
Capstone/cs5210 chip | cs5210 design scheme | cs5210 design data
2022 safety officer-c certificate examination summary and safety officer-c certificate reexamination examination
5. Discrete control and continuous control
随机推荐
4、策略學習
5. Over fitting, dropout, regularization
Macro definition and multiple parameters
Chapter IV decision tree
1. Linear regression
Two methods for full screen adaptation of background pictures, background size: cover; Or (background size: 100% 100%;)
2022 safety officer-c certificate examination paper and safety officer-c certificate simulated examination question bank
Vscode reading Notepad Chinese display garbled code
Understanding of sidelobe cancellation
General configuration tooltip
130. 被圍繞的區域
2022 high altitude installation, maintenance and demolition examination materials and high altitude installation, maintenance and demolition operation certificate examination
For the first time in China, three Tsinghua Yaoban undergraduates won the stoc best student thesis award
A little experience from reading "civilization, modernization, value investment and China"
Redis 主从复制
7. Regularization application
Definition and classification of energy
[loss function] entropy / relative entropy / cross entropy
Chapter 7 Bayesian classifier
Guojingxin center "APEC education +" Shanghai Jiaotong University Japan Cooperation Center x Fudan philosophy class "Zhe Yi" 2022 New Year greetings