当前位置:网站首页>T-sne dimensionality reduction
T-sne dimensionality reduction
2022-07-29 01:34:00 【51CTO】
1. SNE principle
The basic principle : Is it through radiative transformation Map the data points to the probability distribution , There are two steps :
- Construct the probability distribution between high-dimensional objects , Make similar objects have a higher probability of being selected , And dissimilar objects have a lower probability .
- SNE Construct these two distributions in low dimensional space , Make the two probability distributions as similar as possible .
t-SNE Is unsupervised dimensionality reduction , Follow kmeans And so on , He can't get something through training and then use it for other data (kmeans You can get... Through training k A little bit , For other data sets , and t-SNE You can only operate on multiple data alone .
The principle is derived : SNE Firstly, Euclidean distance is transformed into conditional probability to express the similarity between points , say concretely , Given N Tall Dimension data ,(N Not a dimension ). The first is to calculate the probability pij, Proportional to xi and xj The similarity between ,

Parameters here

For different xi The values of are different , The following discussion is about how to set , In addition to setting px|x =0, Because we are concerned about the similarity between two , For low dimensional yi, The variance of Gaussian distribution can be specified as

, Therefore, the similarity is

Again qi|i=0.
If the effect of dimension reduction is better , Local features remain intact , that

, So we optimize the KL The divergence . The objective function is as follows :

, there ·Pi Indicates a given point xi Next , Conditional probability distribution of all other data points .KL Divergence is asymmetric , In low dimensional mapping, the penalty weights corresponding to different distances are different . Specifically : The two points that are far away express that the two points that are close will produce greater cost, The two points that are close to each other express the two points that are far away cost Relatively small . for example

Modeling cost=

, Use the same larger

Modeling


, therefore ,SNE Tendency and retention of local features in data .
2 t-SNE
SNE It's hard to optimize , There is Crowing problem ( crowded ) Difference : Use symmetrical SNE, Simplified gradient formula , In low dimensional space , Use heavier long tailed t Distribution instead of Gaussian distribution represents the similarity between two points . To avoid congestion .
2.1 Symmetric SNE
Optimize pi|j and qi|j Of KL The replacement idea of divergence is to use the joint probability distribution to replace the conditional probability distribution , namely P Is the joint probability distribution of each point in high-dimensional space ,Q It's in low dimensional space , The objective function is

here 1 Of pii and qii All for 0, such SNE be called symmetric SNE, Because he assumed that for any i,pij =pji,qij=qji, Therefore, the probability distribution can be rewritten as :

This method will introduce the problem of outliers , such as xi Is the outlier , that ||xi-xj||2 Will be a big , All the corresponding j,pij Will be very small , Lead to low dimensional mapping yi Yes cost The impact is very small . To solve this problem , The joint probability distribution will be modified .
2.2 Crowing problem
The clusters get together , Indistinguishable , For example, the dimension of high-dimensional data is reduced to 10 Weixia , There will be good expression , But the dimension is reduced to 2 Weihou , Unable to get trusted mapping .
How to solve : use sight repulsion Methods
2.3 t-SNE
symmetry SNE In terms of time, in a high dimension , Another way to reduce congestion : In high-dimensional space, Gaussian distribution is used to convert distance into probability distribution , In low dimensional space , Use t Distribution converts distance into probability distribution , Make the middle and low distances in the high dimension have a larger distance after mapping .

t The distribution is less affected by outliers , Fitting is more reasonable , Better capture the overall characteristics of the data .
t-SNE The gradient update of has two advantages :
For dissimilar points , Using a smaller distance will produce a larger gradient to repel these points .
This rejection will not be infinite ( Denominator in gradient ) , Avoid dissimilar points too far away .
2.4 The algorithm process
Data: X=x1,....xn
Calculation cost function Parameters of
Optimization parameters : Set the number of iterations T, Learning rate n, momentum

The target result is a low dimensional data representation ,YT=y1,...,yn
Start optimizing :
Calculate the given Perp The conditional probability of ,pj|i
Make pij=(pj|i +pi|j)/2n
use N(0,10-4I) Random initialization Y
iteration , from t=1 To T, Do the following :
Calculate... In low dimensions qij , Calculate the gradient , to update Yt
end
Let's compare the Gaussian distribution with t Distribution ( Pictured above ,code see probability/distribution.md), t The distribution is less affected by outliers , The fitting result is more reasonable , Better capture the overall characteristics of the data .
Used t After distribution q change , as follows :

Besides ,t Distribution is the superposition of infinite Gaussian distributions , It is not exponential in calculation , It will be much more convenient . The optimized gradient is as follows :


t-sne The effectiveness of the , You can also see from the above figure : The horizontal axis represents the distance , The vertical axis represents the similarity , You can see , For points with large similarity ,t The distance distributed in the low dimensional space needs to be a little smaller ; For points with low similarity ,t The distance distributed in the low dimensional space needs to be longer . This just meets our needs , That is, points in the same cluster ( The distance is close ) Aggregate more tightly , Points between different clusters ( Far away ) More alienated .
To sum up ,t-SNE The gradient update of has two advantages :
- For dissimilar points , Using a smaller distance will produce a larger gradient to repel these points .
- This rejection will not be infinite ( Denominator in gradient ), Avoid dissimilar points too far away .
2.5 Insufficient
1 Mainly used for visualization ,
2 Tendency and preservation of local features
3 There is no unique optimal solution
4 Training is slow
Yes kl The introduction of divergence is as follows
author : Your Rego
The copyright of this article belongs to the author , Welcome to reprint , But without the author's consent, the original link must be given on the article page , Otherwise, the right to pursue legal responsibility is reserved .
边栏推荐
- [MySQL] string to int
- [ManageEngine] what is the LAN monitoring software and what is its function
- Self-attention neural architecture search for semantic image segmentation
- Rewriting method set
- SQL question brushing: find the employee number EMP with more than 15 salary records_ No and its corresponding recording times t
- Redis installation, cluster deployment and common tuning
- 嵌入式分享合集23
- Django uses pymysql module to connect mysql database
- PLATO上线LAAS协议Elephant Swap,用户可借此获得溢价收益
- 一篇万字博文带你入坑爬虫这条不归路 【万字图文】
猜你喜欢

什么是原码、反码和补码

SQL question brushing: find the last of all employees who have been assigned departments_ Name and first_ Name and Dept_ no

T-sne降维

Canal real-time parsing MySQL binlog data practice

BOM系列之定时器

Cloud native application comprehensive exercise

Plato launched the LAAS protocol elephant swap, which allows users to earn premium income

Moonbeam上的多链用例解析——Derek在Polkadot Decoded 2022的演讲文字回顾

Learning summary of time complexity and space complexity

Google Play APK 上传其他国际应用商店
随机推荐
SQL question brushing: find the current salary details and department number Dept_ no
Google Play APK 上传其他国际应用商店
Openpyxl merge cells
Closures and decorators
File “manage.py“, line 14 ) from exc
Canal实时解析mysql binlog数据实战
Intel introduces you to visual recognition -- openvino
Behind the second round of okaleido tiger sales is the strategic support of ecological institutions
RHCE command practice (II)
瑞吉外卖项目实战Day01
Common functions and usage of numpy
matplotlib中文问题
围绕新市民金融聚焦差异化产品设计、智能技术提效及素养教育
Main causes of IT hardware failures and best practices for prevention
跨模态对齐 20220728
如何选择专业、安全、高性能的远程控制软件
Google play APK uploads other international app stores
梅克尔工作室——HarmonyOS实现列表待办
【ManageEngine】助力哈尔滨工程大学实现网络流量一体化监控管理
Flask project construction 2