当前位置:网站首页>Correlation analysis related knowledge
Correlation analysis related knowledge
2022-06-26 14:12:00 【Orange tea must be ^ -^】
Correlation analysis is to analyze the relationship between characteristic data , If there is a positive correlation 、 negative correlation 、 It's all about 、 Not entirely relevant 、 And modeling and forecasting with mathematical models .
Illustrate with examples : Analyze the correlation between cost data and advertising exposure
1. Covariance and covariance matrix
covariance Cov(X,Y) It describes the relationship between two components of a two-dimensional random variable Degree of correlation A characteristic number of , set up (X ,Y) It's a two-dimensional random variable , if E{ [ X-E(X) ] [ Y-E(Y) ] } There is , We call it Mathematical expectation by X And Y The covariance , And remember to Cov(X,Y)=E{ [ X-E(X) ] [ Y-E(Y) ] },
According to the covariance, the relationship between two eigenvectors can be calculated , If cov(X,Y) Values that are regular are positively correlated , If no negative , It's a negative correlation , by 0, Is irrelevant . When there are more than two eigenvectors , The covariance matrix can be used to calculate the correlation easily . When you want a simple solution , It can be used directly excel Medium COVAR Function directly finds .

Covariance can only be used for correlation analysis of two groups of data , When there are more than two sets of data, we need to use the covariance matrix . Here are three sets of data x,y,z, Calculation formula of covariance matrix of .

When there are many types of characteristic data , Covariance and covariance matrix cannot be used to calculate which groups of data have changed in correlation , It can only roughly calculate whether the correlation is positive or negative , To compare the magnitude of the correlation is , The method of correlation coefficient should be used for comparison .
2. The correlation coefficient
The correlation coefficient is a statistical index that reflects the closeness of the relationship between variables ,1 Indicates that two variables are completely linearly correlated ,-1 Indicates that the two variables are completely negatively correlated ,0 Indicates that two variables are not related . The data is getting closer to 0 The weaker the correlation is . The calculation formula is as follows :

among rxy Indicates the sample correlation coefficient ,Sxy Represents the sample covariance ,Sx Express X The sample standard deviation of ,Sy Express y Sample standard deviation of . Here are Sxy Covariance and Sx and Sy The formula for calculating the standard deviation . Because it is the sample covariance and the sample standard deviation , So the denominator is n-1.
Sxy Sample covariance formula calculation :

Sx Calculation formula of sample standard deviation :

Sy Calculation formula of sample standard deviation :

3. Univariate regression and multivariate regression
There are two preparations before regression analysis , First, determine the number of variables . Second, determine the independent variable and dependent variable . The following is the univariate regression equation , among y Indicates advertising exposure ,x Indicates the cost of expenses .b0 Is the intercept of the equation ,b1 Is the slope , It also shows the relationship between the two variables . Our goal is b0 and b1 Value , Knowing these two values will also know the relationship between variables . And we can use this relationship to predict the advertising exposure when the cost is known .

This is a b1 Calculation formula , We pass the known expense cost x And advertising exposure y To calculate b1 Value .

Here are b0 Calculation formula , In known b1 And the mean value of independent variable and dependent variable ,b0 The value of is easy to calculate .
![]()
stay Excel Use the regression function in data analysis , After inputting the range of independent variable and dependent variable, you can automatically obtain b0(Intercept) Value 362.15 and b1 Value 5.84. there b0 There is some difference from the value obtained by manual calculation before , Because the previous calculation b1 The value retains only two decimal places .
Here is a separate explanation R Square Value 0.87. This value is called the decision coefficient , Used to measure the goodness of fit of regression equation . The bigger this is , The more meaningful the regression equation is , The higher the explanatory degree of the independent variable to the dependent variable .
4. Information entropy
Calculate the information entropy of the data , We can get the entropy between the data , The greater the entropy , Explain that the greater the uncertainty , The smaller the probability of occurrence , The lower the entropy , Explain that the greater the certainty , The more times it appears , The greater the probability , The more relevant .
边栏推荐
- Pytorch based generation countermeasure Network Practice (7) -- using pytorch to build SGAN (semi supervised GaN) to generate handwritten digits and classify them
- [sdoi2013] forest
- Hard (magnetic) disk (II)
- 虫子 运算符重载的一个好玩的
- Obtain information about hard disk and volume or partition (capacity, ID, volume label name, etc.)
- New specification of risc-v chip architecture
- 9 regulations and 6 prohibitions! The Ministry of education and the emergency management department jointly issued the nine provisions on fire safety management of off campus training institutions
- Detailed sorting of HW blue team traceability process
- GEE——全球人类居住区网格数据 1975-1990-2000-2014
- How to check if a text field is empty or not in swift
猜你喜欢

永远不要使用Redis过期监听实现定时任务!

I met the problem of concurrent programming in an interview: how to safely interrupt a running thread

Select tag - uses the default text as a placeholder prompt but is not considered a valid value

Use of wangeditor rich text editor

爱可可AI前沿推介(6.26)

程序员必备,一款让你提高工作效率N倍的神器uTools

8. Ribbon load balancing service call

免费的机器学习数据集网站(6300+数据集)

Wechat applet -picker component is repackaged and the disabled attribute is added -- below

Teacher Li Hang's new book "machine learning methods" is on the market! Purchase link attached
随机推荐
爱可可AI前沿推介(6.26)
Included angle of 3D vector
HW蓝队溯源流程详细整理
【Proteus仿真】Arduino UNO按键启停 + PWM 调速控制直流电机转速
MySQL configuration improves data insertion efficiency
Postman自动化接口测试
Freefilesync folder comparison and synchronization software
Reprint - easy to use wechat applet UI component library
Logical operation
Svn commit error after deleting files locally
Wechat applet Registration Guide
[proteus simulation] Arduino uno key start / stop + PWM speed control DC motor speed
Detailed sorting of HW blue team traceability process
2021-10-09
Exercises under insect STL string
Is it safe to open a securities account? Is there any danger
嵌入式virlog代码运行流程
Range of types
Hard (magnetic) disk (II)
Cloudcompare - Poisson reconstruction