当前位置:网站首页>[basic knowledge of deep learning - 50] PCA dimensionality reduction principal component analysis
[basic knowledge of deep learning - 50] PCA dimensionality reduction principal component analysis
2022-07-27 19:41:00 【Yanyu up】
PCA The principle of dimension reduction
- PCA High dimensional variables that may have correlation can be synthesized into low dimensional variables that are linearly independent , It's called the main component ( principal components). The new low dimensional data set preserves the variables of the original data as much as possible .
- The way of dimensionality reduction is to analyze the principal components of data , Without losing too much information , Through mapping, high-dimensional data is projected into lower latitude data .
Calculation of principal components
- The principal component of the matrix is the eigenvector of its covariance matrix , Sorted according to the corresponding eigenvalue size . The largest eigenvalue is the first principal component , The second largest eigenvalue is the second principal component , And so on .
Variance and covariance
- variance : Used to measure the dispersion of a set of data , Is the mean of the square of the difference between each sample and the sample mean . The formula is as follows :
s 2 = ∑ i = 1 n ( X i − X ‾ ) n − 1 s^2=\frac {\sum^n_{i=1}(X_i-\overline X)} {n-1} s2=n−1∑i=1n(Xi−X) - covariance : Measure the degree of linear correlation between two variables . If the covariance is zero , It means that the two are linearly independent ( Not completely independent , Just no linear correlation ) Covariance greater than zero means that one variable increases and the other increases , That is, positive correlation . Covariance less than zero means that one variable increases and the other decreases , Negative correlation .
c o n v ( X , Y ) = ∑ i = 1 n ( X i − X ‾ ) ( Y i − Y ‾ ) n − 1 conv(X,Y) = \frac {\sum^n_{i=1}(X_i-\overline X)(Y_i - \overline Y)} {n-1} conv(X,Y)=n−1∑i=1n(Xi−X)(Yi−Y) - Covariance matrix : It consists of the covariance of two variables in the data set . In matrix ( i , j ) (i,j) (i,j) The first element in the data set is i i i And the j j j Covariance of elements .
Eigenvectors and eigenvalues
- The eigenvector is equivalent to the coordinate axis , Eigenvalues are equivalent to coordinates .
- The eigenvector is a non-zero vector obtained by satisfying the following matrix :
A v ⃗ = λ v ⃗ A\vec v = \lambda\vec v Av=λv - among A A A It's a matrix , v ⃗ \vec v v It's the eigenvector , λ \lambda λ It's characteristic value .
PCA The role of dimension reduction
Dimensionality reduction is committed to solving three types of problems .
- Dimensionality reduction can alleviate the problem of dimensionality disaster ;
- Dimensionality reduction can compress data while minimizing information loss ;
- It's difficult to understand the structure of hundreds of dimensions , The data of two or three dimensions is easier to understand through visualization .
Bloggers will continue to update some basic knowledge related to in-depth learning, as well as problems and insights encountered in work , Please pay attention if you like 、 give the thumbs-up 、 Collection .
边栏推荐
- C language: 14. Preprocessing
- Flink简介以及运行架构
- 【深度学习基础知识 - 39】BN、LN、WN的比较
- ReferenceError: __dirname is not defined in ES module scope
- C language: clion debugging method
- Webmagic+selenium+chromedriver+jdbc grabs data vertically.
- Yanghui triangle
- Kettle JVM memory setting - the effect is not obvious
- The first entry-level operation of kettle (reading excel, outputting Excel)
- 时间复杂度和空间复杂度
猜你喜欢

Flink 算子简介

Daily question (02): inverted string

c语言:13、指针与内存

IEC104 规约详细解读(一) 协议结构

c语言:7、c语言多源码文件使用方法

A lock faster than read-write lock. Don't get to know it quickly

C language: 12. GDB tool debugging C program

Low code implementation exploration (45) business parameters

C language: 9. Return in main function

Kettle learning - the repository configuration in version 8.2 is grayed out, and there is no connect button
随机推荐
Big guys, Oracle CDC, local operation, always encounter this an exception occurred in
C language: 11. Pipeline
【深度学习基础知识 - 43】优势比的概念
【深度学习基础知识 - 39】BN、LN、WN的比较
C language case: password setting and login > clear solution getchar and scanf
Debian recaptured the "debian.community" domain name, but it's still not good to stop and rest
英特尔发布Horse Ridge芯片:22nm工艺,能够控制多个量子位
零知识证明的硬件加速
c语言:9、main函数中的return
Kettle switch / case control to realize classification processing
influxDB系列(三)influxDB配置文件详解
A lock faster than read-write lock. Don't get to know it quickly
Optimization of embedded C language for indefinite cycles
【深度学习基础知识 - 46】贝叶斯定理与条件概率公式
Technology Summit 58 Liu Yuan in the same city was invited to attend qecon 2022 global software quality & effectiveness conference
I want to consult. Our maxcompute spark program needs to access redis, development environment and production environment redis
二叉搜索树
时间复杂度和空间复杂度
Kettle consolidated record data reduction
Make your chat bubbles colorful