当前位置:网站首页>Summary of wuenda's machine learning course (14)_ Dimensionality reduction
Summary of wuenda's machine learning course (14)_ Dimensionality reduction
2022-06-28 00:21:00 【51CTO】
Q1 Motive one : data compression
Dimension reduction of features , Such as reducing the relevant two-dimensional to one-dimensional :

Three dimensional to two dimensional :

And so on 1000 Dimension data is reduced to 100 D data . Reduce memory footprint
Q2 Motivation two : Data visualization
Such as 50 The data of dimensions cannot be visualized , The dimension reduction method can reduce it to 2 dimension , Then visualize .
The dimension reduction algorithm is only responsible for reducing the dimension , The meaning of the new features must be discovered by ourselves .
Q3 Principal component analysis problem
(1) Problem description of principal component analysis :
The problem is to make n Dimensional data reduced to k dimension , The goal is to find k Vector , To minimize the total projection error .
(2) Comparison between principal component analysis and linear regression :

The two are different algorithms , The former is to minimize the projection error , The latter is to minimize the prediction error ; The former does not do any analysis , The latter aims to predict the results .
Linear regression is a projection perpendicular to the axis , Principal component analysis is a projection perpendicular to the red line . As shown in the figure below :

(3)PCA It's a new one “ Principal component ” The importance of vectors , Go to the front important parts as needed , Omit the following dimension .
(4)PCA One of the advantages of is the complete dependence on data , There is no need to set parameters manually , Independent of the user ; At the same time, this can also be seen as a disadvantage , because , If the user has some prior knowledge of the data , Will not come in handy , May not get the desired effect .
Q4 Principal component analysis algorithm
PCA take n Dimension reduced to k dimension :
(1) Mean normalization , Minus the mean divided by the variance ;
(2) Calculate the covariance matrix ;
(3) Compute the eigenvector of the covariance matrix ;

For one n x n The matrix of dimensions , On the type of U It is a matrix composed of the direction vector with the minimum projection error with the data , Just go to the front k Vectors get n x k The vector of the dimension , use Ureduce Express , Then the required new eigenvector is obtained by the following calculation z(i)=UTreduce*x(i).
Q5 Choose the number of principal components
Principal component analysis is to reduce the mean square error of projection , The variance of the training set is :

Hope to reduce the ratio of the two as much as possible , For example, I hope the ratio of the two is less than 1%, Select the smallest dimension that meets this condition .

Q6 Reconstructed compressed representation
Dimension reduction formula :

The reconstruction ( That is, from low dimension to high dimension ):

The schematic diagram is shown below : The picture on the left shows dimension reduction , The picture on the right is reconstruction .

Q7 Suggestions on the application of principal component analysis
Use the case correctly :
100 x 100 Pixel image , namely 10000 Whitman's sign , use PCA Compress it to 1000 dimension , Then run the learning algorithm on the training set , In the prediction of the , Apply what you learned before to the test set Ureduce Will test the x convert to z, Then make a prediction .
Incorrect usage :
(1) Try to use PCA To solve the problem of over fitting ,PCA Cannot solve over fitting , It should be solved by regularization .
(2) By default PCA As part of the learning process , In fact, the original features should be used as much as possible , Only when the algorithm runs too slowly or takes up too much memory, the principal component analysis method should be considered .
author : Your Rego
The copyright of this article belongs to the author , Welcome to reprint , But without the author's consent, the original link must be given on the article page , Otherwise, the right to pursue legal responsibility is reserved .
边栏推荐
- 证券注册账户安全吗,会有风险吗?
- Instructions for vivado FFT IP
- What are the ways to combine the points system with marketing activities
- 什么是cookie,以及v-htm的安全性隐患
- 线程池实现:信号量也可以理解成小等待队列
- 【无标题】
- 【论文阅读|深读】SDNE:Structural Deep Network Embedding
- Thread pool implementation: semaphores can also be understood as small waiting queues
- [microservices sentinel] sentinel data persistence
- 数据仓库入门介绍
猜你喜欢

炼金术(7): 何以解忧,唯有重构
![[读书摘要] 学校的英文阅读教学错在哪里?--经验主义和认知科学的PK](/img/7b/8b3619d7726fdaa58da46b0c8451a4.png)
[读书摘要] 学校的英文阅读教学错在哪里?--经验主义和认知科学的PK

Logging log usage
![[PCL self study: pclplotter] pclplotter draws data analysis chart](/img/ca/db68d5fae392c7976bfc93d2107509.png)
[PCL self study: pclplotter] pclplotter draws data analysis chart

安全省油环保 骆驼AGM启停电池魅力十足

ASP. Net warehouse purchase, sales and inventory ERP management system source code ERP applet source code

Instructions for vivado FFT IP
![[microservices sentinel] sentinel data persistence](/img/9f/2767945db99761bb35e2bb5434b44d.png)
[microservices sentinel] sentinel data persistence

Comprehensive evaluation of free, easy-to-use and powerful open source note taking software

Webserver flow chart -- understand the calling relationship between webserver modules
随机推荐
代码整洁之道--函数
MYSQL的下载与配置安装
炼金术(1): 识别项目开发中的ProtoType、Demo、MVP
The Internet industry has derived new technologies, new models and new types of industries
炼金术(8): 开发和发布的并行
[AI application] detailed parameters of Jetson Xavier nx
Pat class B 1013
Mise en œuvre du pool de Threads: les sémaphores peuvent également être considérés comme de petites files d'attente
快速掌握grep命令及正则表达式
QStringList 的学习笔记
Redis主从复制、哨兵模式、集群的概述与搭建
Summary of wuenda's machine learning course (11)_ Support vector machine
HCIP/HCIE Routing&Switching / Datacom备考宝典系列(十九)PKI知识点全面总结(公钥基础架构)
SCU|通过深度强化学习进行微型游泳机器人的步态切换和目标导航
Arduino uno realizes simple touch switch through direct detection of capacitance
The limits of Technology (11): interesting programming
现代编程语言:Rust (铁锈,一文掌握钢铁是怎样生锈的)
翻译(5): 技术债务墻:一种让技术债务可见并可协商的方法
Count prime [enumeration - > space for time]
翻译(4): 文本自动完成的匹配规则