当前位置:网站首页>Correlation analysis related knowledge
Correlation analysis related knowledge
2022-06-26 14:12:00 【Orange tea must be ^ -^】
Correlation analysis is to analyze the relationship between characteristic data , If there is a positive correlation 、 negative correlation 、 It's all about 、 Not entirely relevant 、 And modeling and forecasting with mathematical models .
Illustrate with examples : Analyze the correlation between cost data and advertising exposure
1. Covariance and covariance matrix
covariance Cov(X,Y) It describes the relationship between two components of a two-dimensional random variable Degree of correlation A characteristic number of , set up (X ,Y) It's a two-dimensional random variable , if E{ [ X-E(X) ] [ Y-E(Y) ] } There is , We call it Mathematical expectation by X And Y The covariance , And remember to Cov(X,Y)=E{ [ X-E(X) ] [ Y-E(Y) ] },
According to the covariance, the relationship between two eigenvectors can be calculated , If cov(X,Y) Values that are regular are positively correlated , If no negative , It's a negative correlation , by 0, Is irrelevant . When there are more than two eigenvectors , The covariance matrix can be used to calculate the correlation easily . When you want a simple solution , It can be used directly excel Medium COVAR Function directly finds .

Covariance can only be used for correlation analysis of two groups of data , When there are more than two sets of data, we need to use the covariance matrix . Here are three sets of data x,y,z, Calculation formula of covariance matrix of .

When there are many types of characteristic data , Covariance and covariance matrix cannot be used to calculate which groups of data have changed in correlation , It can only roughly calculate whether the correlation is positive or negative , To compare the magnitude of the correlation is , The method of correlation coefficient should be used for comparison .
2. The correlation coefficient
The correlation coefficient is a statistical index that reflects the closeness of the relationship between variables ,1 Indicates that two variables are completely linearly correlated ,-1 Indicates that the two variables are completely negatively correlated ,0 Indicates that two variables are not related . The data is getting closer to 0 The weaker the correlation is . The calculation formula is as follows :

among rxy Indicates the sample correlation coefficient ,Sxy Represents the sample covariance ,Sx Express X The sample standard deviation of ,Sy Express y Sample standard deviation of . Here are Sxy Covariance and Sx and Sy The formula for calculating the standard deviation . Because it is the sample covariance and the sample standard deviation , So the denominator is n-1.
Sxy Sample covariance formula calculation :

Sx Calculation formula of sample standard deviation :

Sy Calculation formula of sample standard deviation :

3. Univariate regression and multivariate regression
There are two preparations before regression analysis , First, determine the number of variables . Second, determine the independent variable and dependent variable . The following is the univariate regression equation , among y Indicates advertising exposure ,x Indicates the cost of expenses .b0 Is the intercept of the equation ,b1 Is the slope , It also shows the relationship between the two variables . Our goal is b0 and b1 Value , Knowing these two values will also know the relationship between variables . And we can use this relationship to predict the advertising exposure when the cost is known .

This is a b1 Calculation formula , We pass the known expense cost x And advertising exposure y To calculate b1 Value .

Here are b0 Calculation formula , In known b1 And the mean value of independent variable and dependent variable ,b0 The value of is easy to calculate .
![]()
stay Excel Use the regression function in data analysis , After inputting the range of independent variable and dependent variable, you can automatically obtain b0(Intercept) Value 362.15 and b1 Value 5.84. there b0 There is some difference from the value obtained by manual calculation before , Because the previous calculation b1 The value retains only two decimal places .
Here is a separate explanation R Square Value 0.87. This value is called the decision coefficient , Used to measure the goodness of fit of regression equation . The bigger this is , The more meaningful the regression equation is , The higher the explanatory degree of the independent variable to the dependent variable .
4. Information entropy
Calculate the information entropy of the data , We can get the entropy between the data , The greater the entropy , Explain that the greater the uncertainty , The smaller the probability of occurrence , The lower the entropy , Explain that the greater the certainty , The more times it appears , The greater the probability , The more relevant .
边栏推荐
- [ahoi2005] route planning
- 虫子 类和对象 中
- 9 articles, 6 interdits! Le Ministère de l'éducation et le Ministère de la gestion des urgences publient et publient conjointement neuf règlements sur la gestion de la sécurité incendie dans les établ
- 网络远程访问的方式使用树莓派
- 虫子 运算符重载的一个好玩的
- I met the problem of concurrent programming in an interview: how to safely interrupt a running thread
- 【HCSD应用开发实训营】一行代码秒上云评测文章—实验过程心得
- [node.js] MySQL module
- 李航老师新作《机器学习方法》上市了!附购买链接
- Common operation and Principle Exploration of stream
猜你喜欢

创建一个自己的跨域代理服务器

Hard (magnetic) disk (II)

去某东面试遇到并发编程问题:如何安全地中断一个正在运行的线程

Guruiwat rushed to the Hong Kong stock exchange for listing: set "multiple firsts" and obtained an investment of 900million yuan from IDG capital

基于PyTorch的生成对抗网络实战(7)——利用Pytorch搭建SGAN(Semi-Supervised GAN)生成手写数字并分类

Exercise set 1

Wechat applet magic bug - choose to replace the token instead of clearing the token, wx Getstoragesync will take the old token value instead of the new token value

ThreadLocal giant pit! Memory leaks are just Pediatrics
![[MySQL from introduction to mastery] [advanced part] (II) representation of MySQL directory structure and tables in the file system](/img/03/a1885e4740bbfdbdee2446e3dd81d0.png)
[MySQL from introduction to mastery] [advanced part] (II) representation of MySQL directory structure and tables in the file system

程序员必备,一款让你提高工作效率N倍的神器uTools
随机推荐
Guruiwat rushed to the Hong Kong stock exchange for listing: set "multiple firsts" and obtained an investment of 900million yuan from IDG capital
虫子 STL string 下 练习题
Luogu p4513 xiaobaiguang Park
[cqoi2015] task query system
9項規定6個嚴禁!教育部、應急管理部聯合印發《校外培訓機構消防安全管理九項規定》
RISC-V 芯片架构新规范
Gurivat sprint Harbour Exchange listed: created “multiple first”, received 900 million yuan Investment from IDG capital
Formal parameters vs actual parameters
永远不要使用Redis过期监听实现定时任务!
9 articles, 6 interdits! Le Ministère de l'éducation et le Ministère de la gestion des urgences publient et publient conjointement neuf règlements sur la gestion de la sécurité incendie dans les établ
ICML 2022 | limo: a new method for rapid generation of targeted molecules
Range of types
Basic type of typescript
Solutions to the failure of last child and first child styles of wechat applet
同花顺股票开户选哪个证券公司是比较好,比较安全的
Svn commit error after deleting files locally
网络远程访问的方式使用树莓派
Free machine learning dataset website (6300+ dataset)
Variable declaration of typescript
I met the problem of concurrent programming in an interview: how to safely interrupt a running thread