当前位置:网站首页>Collinearity problem
Collinearity problem
2022-06-30 03:54:00 【stone_ tigerLI】
List of articles
introduction
Traditional modeling and analysis , When there are many dependent variables entered , Collinearity analysis is required .
What is collinearity analysis ? It is called multicollinearity in statistics , It is not accurate to say whether there is a high correlation between multiple dependent variables , If there is , Then only some variables can be reserved . For example, analyze precipitation 、 Sunshine duration 、 temperature 、 slope 、 Slope direction 、 Elevation and other dependent variables and vegetation growth ( Such as NDVI Equal independent variable ) The coupling of , It is necessary to perform collinearity analysis on dependent variables , Based on common sense , There is a high correlation between slope and elevation , So two are at the same time with NDVI It is unreasonable to do correlation analysis .
Why is it unreasonable
As mentioned in the introduction, if there is a strong multicollinearity between slope and elevation , The result may be unreasonable . See below for details :
Let's assume a simplified relational model
If the following relationship exists , among x i , i ⊂ ( 1....3 ) x_{i},i\subset (1....3) xi,i⊂(1....3) Is the above slope 、 Elevation and other dependent variables , y y y by NDVI Equal independent variable .
y = a ⋅ x 1 + b ⋅ x 2 + c ⋅ x 3 ( 1 ) y=a\cdot x_{1}+b\cdot x_{2} + c\cdot x_{3} (1) y=a⋅x1+b⋅x2+c⋅x3(1)
We can a 、 b 、 c a、b、c a、b、c Consider each dependent variable pair NDVI Contribution rate of .
If there is no multicollinearity , Then the above relationship is clear .
If x1,x2 There is multicollinearity , And x1 And x2 The correlation of 0.9, be x1,x2 The relation of can be simply expressed as
x 1 = A ⋅ x 2 + B , R 2 = 0.9 x_{1} = A\cdot x_{2}+ B,R^{2}=0.9 x1=A⋅x2+B,R2=0.9
With a small loss of accuracy , type (1) Can be expressed as
y = a ⋅ ( A ⋅ x 2 + B ) + b ⋅ x 2 + c ⋅ x 3 = ( a ⋅ A + b ) ⋅ x 2 + c ⋅ x 3 + D = d ⋅ x 2 + c ⋅ x 3 y=a\cdot (A\cdot x2+ B)+b\cdot x_{2} + c\cdot x_{3} =(a\cdot A+b)\cdot x_{2} + c\cdot x_{3} +D =d\cdot x_{2} + c\cdot x_{3} y=a⋅(A⋅x2+B)+b⋅x2+c⋅x3=(a⋅A+b)⋅x2+c⋅x3+D=d⋅x2+c⋅x3
Then we found out , The model can eliminate redundancy with little loss of accuracy , Besides , We can also see the coefficient d d d by ( a ⋅ A + b ) (a\cdot A+b) (a⋅A+b) The sum of the . that a、b You can take any value , As long as the sum of the two is a constant value d that will do , such as A=1 when ,a=-10,b=20, perhaps a=20,b=-10. At this time, the model results cannot be explained .
How to eliminate the influence of collinearity ?
It's simple , It's to eliminate . Eliminate redundant dependent variables , Before that, we can try to explain the effects of different dependent variables in the model , One of them is preferred . For example, slope 、 Slope direction 、 The elevation has multicollinearity , Then the slope is divided into 、 Slope direction 、 Elevation input model , Choose the one with the best effect .
In addition, some bloggers say that increasing the sample size can eliminate the collinearity effect , It must be good to increase the sample size , But I doubt whether collinearity can be eliminated , You are welcome to provide documentary evidence .
This article is written according to my own understanding , Mistakes and omissions are inevitable , Welcome criticism .
Information about collinearity :https://zhuanlan.zhihu.com/p/88025370
Academician Li Xiaowen talked about ill conditioned inversion , I feel that the idea behind it is essentially consistent with collinearity :https://blog.sciencenet.cn/blog-2984-20778.html
边栏推荐
- (Reprinted) an article will take you to understand the reproducing kernel Hilbert space (RKHS) and various spaces
- 【作业】2022.5.23 MySQL入门
- Simple theoretical derivation of SVM (notes)
- 声网自研传输层协议 AUT 的落地实践丨Dev for Dev 专栏
- 【论文阅读|深读】DANE:Deep Attributed Network Embedding
- Feign 坑
- Geometric objects in shapely
- How to analyze and solve the problem of easycvr kernel port error through process startup?
- UML图与List集合
- SQL server2005中SUM函数中条件筛选(IF)语法报错
猜你喜欢

Litjson parses the generated JSON file and reads the dictionary in the JSON file

LitJson解析 生成json文件 读取json文件中的字典

Interpretation score of bilstm-crf in NER_ sentence

Chapter 2 control structure and function (programming problem)

UML图与List集合

Number of students from junior college to Senior College (III)

关于智能视觉组上的机械臂

华为云原生——数据开发与DataFactory

How to view Tencent's 2022 school recruitment salary, the total contract of cabbage is 40W?

DO280私有仓库持久存储与章节实验
随机推荐
Hebb and delta learning rules
【论文阅读|深读】DANE:Deep Attributed Network Embedding
[Thesis reading | deep reading] dane:deep attributed network embedding
第十一天 脚本与游戏AI
利用反射整合ViewBinding和ViewHolder
Interpretation score of bilstm-crf in NER_ sentence
Version correspondence table of tensorflow, CUDA and bazel
Interface test tool postman
[image fusion] multi focus and multi spectral image fusion based on cross bilateral filter and weighted average with matlab code
Product thinking - is the future of UAV express worth looking forward to?
Buffer pool of MySQL notes
Analysis of similarities and differences of various merged features (Union, merge, append, resolve) in ArcGIS
dotnet-exec 0.5.0 released
Redis中的Hash设计和节省内存数据结构设计
[FAQ] page cross domain and interface Cross Domain
声网自研传输层协议 AUT 的落地实践丨Dev for Dev 专栏
[frequently asked questions] modularization of browser environment and node environment
The school training needs to make a registration page. It needs to open the database and save the contents entered on the registration page into the database
December2020 - true questions and analysis of C language (Level 2) in the youth level examination of the Electronic Society
Installation and use of yarn