Paper Information
Title:《Improved Deep Embedded Clustering with Local Structure Preservation》
Authors:Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin
Sources:2017, IJCAI
Other:69 Citations, 71 References
Paper:Download
Code:Download
Abstract
The problems solved in this paper : The previous work of designing clustering loss function according to different situations may destroy the feature space , Produce meaningless feature representation, which reduces the clustering performance .
The solution of this paper :
- Use the clustering loss function to guide the points Distribution ;
- use under-complete autoencoder Maintain the local structure of the data ;
- union Clustering loss and AE Loss To train .
1 Introduction
The epoch-making masterpiece of clustering task :
- 1967 MacQueen Of k-means ;
- 2006 Bishop Of gaussian mixture model ;
- 2007 Von Luxburg Of spectral clustering ;
Because the dimension of input data is very large , There are usually a lot of unreliable characteristic data , This seriously affects the clustering effect . So a widely accepted idea is : Map the input data of high-dimensional space to low-dimensional space , Then clustering .
Q1: Why are traditional dimensionality reduction methods such as PCA 、LDA The ability of learning expression is limited ?
A1: If you know, you can express your opinions in the comment area .
Because of the development of deep learning ,DNN model It can be well realized feature transformation , Learn useful expressions .
Then introduce DEC , You can refer to 《 Interpretation of the thesis (DEC)Unsupervised Deep Embedding for Clustering Analysis》.
Contribution of this paper :
- A deep clustering algorithm is proposed , Clustering and learning can be carried out jointly, which has the representative characteristics of local structure preservation .
- The importance of local structure preservation in deep clustering is proved by empirical evidence .
- IDEC It is superior to its latest rivals in great advantages .
2 Related Work
2.1 Deep Clustering
The current clustering algorithm :
- Two-stage work that applies clustering after having learned a representation;[ This method is based on good representation ]
- Approaches that jointly optimize the feature learning and clustering;[ Clustering is performed at the same time as feature learning ]
about 1 give an example :
- Tian et al., 2014 : First use AE Learn low dimensional useful representations , And then use k-means Clustering ;
- Chen, 2015 : Hierarchical training depth belief network (DBN), And then non-parametric maximum-margin Clustering is applied to learned intermediate representations ;
- Peng et al., 2016 : Use sparse self encoder , At the same time, it adaptively learns the representation of local and global structural information , Then use the traditional clustering algorithm to cluster ;
about 2 give an example :
- Yang et al., 2016 :proposes a recurrent framework in deep representations and image clusters, which integrates two processes into a single model with a unified weighted triplet loss and optimizes it end-to-end.
- Xie et al.,2016 :DEC Learning the mapping from observation space to low dimensional potential space through deep neural network , Feature representation and clustering assignment can be obtained at the same time ;
2.2 Autoencoder
AE There are two parts :
- Encoder: The encoder function is $z=f_{W}(x)$ , The output represents $z$ .
- Decoder: The decoder function is $x^{\prime}=g_{W^{\prime}}(z) $, According to the expression $z$ Reconstruct the original input $x$ .
$x^{\prime}=g_{W^{\prime}}(z)$
Two common self encoders :
- Incomplete self encoder ( Under-complete autoencoder):$z$ The dimension of is smaller than the dimension of the original input .
- De noise self encoder ( Denoising autoencoder):$L=\left\|x-g_{W^{\prime}}\left(f_{W}(\tilde{x})\right)\right\|_{2}^{2} \quad \quad \quad (1)$
Reference:
- Incomplete self encoder : One way to obtain useful features from a self encoder is to limit $h$ The dimension ratio of $x$ Small , This kind of self encoder whose coding dimension is smaller than the input dimension is called incomplete (undercomplete) Self encoder . Learning an incomplete representation will force the self encoder to capture the most significant features in the training data .
- De noise self encoder (denoising autoencoder,DAE) Is a class that accepts corrupted data as input , The self encoder is trained to predict the original undamaged data as input .
2.3 Deep Embedded Clustering
Deep embedded clustering (DEC) [Xieetal.,2016] First, pre train the automatic encoder , Then delete the decoder . The remaining encoders are fine tuned by optimizing the following objectives :
$L=K L(P \| Q)=\sum\limits_{i} \sum\limits_{j} p_{i j} \log \frac{p_{i j}}{q_{i j}}\quad \quad \quad (2)$
among :
- $q_{i j}$ Is said $z_{i}$ And clustering center $\mu_{j}$ The similarity between . Defined as :
${\large q_{i j}=\frac{\left(1+\left\|z_{i}-\mu_{j}\right\|^{2}\right)^{-1}}{\sum_{j}\left(1+\left\|z_{i}-\mu_{j}\right\|^{2}\right)^{-1}}}\quad \quad \quad (3) $
- Eq.2 Medium $p_{ij}$ It's the target distribution , Defined as :
${\large p_{i j}=\frac{q_{i j}^{2} / \sum_{i} q_{i j}}{\sum_{j}\left(q_{i j}^{2} / \sum_{i} q_{i j}\right)}}\quad \quad\quad(4) $
DEC Algorithm :
- First , For the original data set $X$ , Run it over AE , get Encoder Generated representation $z_{i}=f_{W}\left(x_{i}\right)$ ;
- secondly , be based on ${z_i}$ , Use traditional $k-means$ , Obtain several clustering centers ${\mu _j}$;
- then , according to Eq.3 and Eq.4 Calculated $q_{ij}$ and $p_{ij}$ To calculate Eq.2 Medium $L$;
- Last , according to $q_{ij}$ Conduct $label$ Distribute .
3 Improved Deep Embedded Clustering
- Consider a dataset $X$ with $n$ samples and each sample $x_{i} \in \mathbb{R}^{d}$ where $d$ is the dimension.
- The number of clusters $K$ is a priori knowledge and the $j$ th cluster center is represented by $\mu_{j} \in \mathbb{R}^{d}$ . Let the value of $s_{i} \in\{1,2, \ldots, K\}$ represent the cluster index assigned to sample $x_{i}$ .
- Define nonlinear mapping $f_{W}: x_{i} \rightarrow z_{i}$ and $g_{W^{\prime}}: z_{i} \rightarrow x_{i}^{\prime}$ where $z_{i}$ is the embedded point of $x_{i}$ in the low dimensional feature space and $x_{i}^{\prime}$ is the reconstructed sample for $x_{i}$ .
The goal is : Look for the best $f_{W}$ To get better $\left\{z_{i}\right\}_{i=1}^{n}$ , In order to better do clustering tasks .
this paper model There are two essential parts :
- Autoencoder;
- clustering loss;
The model architecture is as follows Fig.1. Shown :
The objective function is defined as :
$L=L_{r}+\gamma L_{c}\quad\quad \quad (6)$
among :
- $L_{r} $ Is the reconfiguration loss ;
- $L_{c}$ Is clustering loss ;
- $ \gamma>0$ Is to control the the degree of distorting embedded space The coefficient of , When $\gamma=1$ or $L_{r} \equiv 0$ That is DEC The objective function of ;
Q2:$ \gamma$ Why is this coefficient added so , I've read many articles like this , But I don't know why I must add this ?
A2: If you know, you can express your opinions in the comment area .
3.1 Clustering loss and Initialization
review DEC Clustering loss function ( Refer to the above Eq.2. 、Eq.3.、Eq.4.):
$L_{c}=K L(P \| Q)=\sum\limits_{i} \sum\limits _{j} p_{i j} \log \frac{p_{i j}}{q_{i j}}\quad\quad \quad (7)$
adopt DEC model Inspired by :
- Preliminary training : Use stacked noise reduction self encoder (stacked denoising autoencoder).
- Then based on the effective representation generated by pre training $\left\{z_{i}=f_{W}\left(x_{i}\right)\right\}_{i=1}^{n}$ Use $k-means $ Get the cluster center $\left\{\mu_{j}\right\}_{j=1}^{K}$ .
3.2 Local structure preservation
because DEC Direct discarding Decoder And through clustering loss $L_{c}$ Directly fine tune the encoder , May cause distortion of embedded space .[ To put it bluntly, research Decoder Influence ]
Therefore, this paper proposes to keep the decoder unchanged , Directly add the clustering loss to the embedding space .
In this paper, stacking noise reduction self encoder is replaced by incomplete self encoder [ The reason is that clustering requires clean data , Personally, I feel that the experimental effect is good, choose the one ], Refactoring loss [ Mean Squared Error ] :
$L_{r}=\sum\limits _{i=1}^{n}\left\|x_{i}-g_{W^{\prime}}\left(z_{i}\right)\right\|_{2}^{2}\quad \quad \quad (8)$
Here's the advice $ \gamma$ Best less than $1$ , This will be in 4.3 Section proved by experiments .
3.3 Optimization
Eq.6 Small batch random gradient descent method is used to optimize , There are three parameters that need to be optimized , Namely :
- Weight parameter of self encoder
- Cluster center $u_j$
- Target distribution $P$
First of all, explain : Update the self encoder weight parameters and clustering center
Fixed target distribution $P$ , Optimize
$\frac{\partial L_{c}}{\partial z_{i}}=2 \sum\limits _{j=1}^{K}\left(1+\left\|z_{i}-\mu_{j}\right\|^{2}\right)^{-1}\left(p_{i j}-q_{i j}\right)\left(z_{i}-\mu_{j}\right)\quad\quad\quad (9)$
$\frac{\partial L_{c}}{\partial \mu_{j}}=2 \sum\limits _{i=1}^{n}\left(1+\left\|z_{i}-\mu_{j}\right\|^{2}\right)^{-1}\left(q_{i j}-p_{i j}\right)\left(z_{i}-\mu_{j}\right)\quad\quad\quad (10)$
- Cluster center update formula :
- Decoder weight parameter update formula :
- The encoder weight update formula is :
${\large W=W-\frac{\lambda}{m} \sum\limits _{i=1}^{m}\left(\frac{\partial L_{r}}{\partial W}+\gamma \frac{\partial L_{c}}{\partial W}\right)}\quad \quad \quad (13)$ $
Due to target distribution $P$ Is based on soft label [ $p_{ij}$ Depending on $q_{ij}$ ] , Frequent updates are likely to cause instability , therefore $P$ The update is not in every iter In the update , But in each batch In the update . But actually , This article is in Every time T iterations updated .label The distribution method is as follows :
Here, when the percentage of two consecutive distributions is less than $\delta$ Will stop training .
4 Experiments
4.1 DataSets
- MNIST [ Image data set ]:70000 A hand written figure
- USPS [ Image data set ]:9298 Gray scale handwritten numeral map
- REUTERS-10K [ Text data set ]:810000 News reports with labels , Sample here 10000 Article report .
4.2 Results
experiment 1: The experimental results are as follows Table 2 Shown :
Conclusion :
- Deep clustering method : AE+k-means, DEC and IDEC Performance is significantly better than traditional methods , But there is still a big gap between these three methods .
- AE+k-means and DEC Compared to prove the guiding significance of clustering loss ,DEC and IDEC It is proved that self encoder can improve clustering performance .
experiment 2:DEC and IDEC Comparative experiments :
Conclusion :
- IDEC The clustering accuracy is higher than DEC ;
- IDEC Convergence is slower than DEC ;
- IDEC Clustering loss is higher than DEC ;
- There is little difference between the reconstruction loss of the last few iterations and the initial iteration loss ;
experiment 3:DEC and IDEC Visual contrast experiment :
Up and down are IDEC and DEC Of t-SNE Visualization results .
experiment 4:DEC and IDEC Parameters $\lambda$ and $ \gamma$ A comparative experiment of :
Conclusion :
- IDEC At the best learning rate $\lambda=0.1$ Is better than DEC At the best learning rate $\lambda=0.01$ When $ \gamma \in [0.05,1.0]$ ;
- For the larger $\lambda$ Need to match smaller $\lambda$ ;
5 Conclusion
This paper proposes an improved deep embedded clustering (IDEC) Algorithm , The algorithm performs clustering jointly , And learned the embedded features suitable for clustering , And retain the local structure of data generation distribution .IDEC Based on KL The clustering loss of divergence manipulates the feature space to scatter data . It maintains local structure by merging an automatic encoder . Experiments show that , Structure preservation is very important for deep Clustering Algorithm , Conducive to clustering performance . Future work includes : stay IDEC Add more prior knowledge to the framework ( Such as sparsity ), And set and roll up layers for image data .
Last modify :2022-02-13 17:54:31
『 Summing up difficulties , Just pay attention !』
Interpretation of the thesis (IDEC)《Improved Deep Embedded Clustering with Local Structure Preservation》 More articles about
- 【 neural network 】 Self coding clustering algorithm --DEC (Deep Embedded Clustering)
1. Algorithm description Recently doing AutoEncoder Some of the explorations of , notice 2016 A paper in , Though not the latest , But ideas and methods are worth learning . Link to the original paper http://proceedings.mlr.press/v48 ...
- Interpretation of the thesis 《Learning Deep CNN Denoiser Prior for Image Restoration》
CVPR2017 A paper on Learning Deep CNN Denoiser Prior for Image Restoration: General ,image restoration(IR) The mission aims to start from ...
- Interpretation of the thesis DEC《Unsupervised Deep Embedding for Clustering Analysis》
Junyuan Xie, Ross B. Girshick, Ali Farhadi2015, ICML1243 Citations, 45 ReferencesCode:DownloadPaper: ...
- Interpretation of the thesis (DFCN)《Deep Fusion Clustering Network》
Paper information Titile:Deep Fusion Clustering Network Authors:Wenxuan Tu, Sihang Zhou, Xinwang Liu ...
- Interpretation of the thesis (SDNE)《Structural Deep Network Embedding》
Thesis title :<Structural Deep Network Embedding> Time of publication : KDD 2016 Author of the paper : Aditya Grover;Aditya Grover; Ju ...
- This paper interprets the third generation GCN《 Deep Embedding for CUnsupervisedlustering Analysis》
Paper Information Titlel:<Semi-Supervised Classification with Graph Convolutional Networks>Aut ...
- zz Throw away anchor! real CenterNet——Objects as Points Interpretation of the thesis
Starting with deep learning about those things I have paid attention to writing articles Throw away anchor! real CenterNet——Objects as Points Interpretation of the thesis OLDPAN I don't know if I'm an AI programmer Pay attention to him JustDoIT etc. ...
- [ Interpretation of the thesis ] Ali DIEN Overall code structure
[ Interpretation of the thesis ] Ali DIEN Overall code structure Catalog [ Interpretation of the thesis ] Ali DIEN Overall code structure 0x00 Abstract 0x01 File info 0x02 Overall framework 0x03 Overall code 0x04 Model base class 4.1 Basic logic ...
- CVPR2020 Interpretation of the thesis : Less target detection
CVPR2020 Interpretation of the thesis : With attention RPN And multi relation detector for less point target detection Few-Shot Object Detection with Attention-RPN and Multi-Relation ...
- End to end depth neural network for point cloud registration :ICCV2019 Interpretation of the thesis
End to end depth neural network for point cloud registration :ICCV2019 Interpretation of the thesis DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration ...
Random recommendation
- javascript Basics
javascript summary : javascript history : * 1992 year Nombas Developed C-minus-minus(C--) Embedded scripting language ( Originally bound to CEnvi In software ). Later, it was renamed ScriptEas ...
- Spring Source code analysis ——BeanFactory Abstract class of system 、 Class analysis ( One )
In the last introduction BeanFactory All interfaces of the system ——Spring Source code analysis ——BeanFactory The interface of the system is analyzed in detail , This article goes on to introduce BeanFactory Abstract classes and interfaces of the system . One .BeanFactor ...
- Leica's first full frame camera without anti camera Leica SL Arrival , Make complaints about
http://cn.engadget.com/2015/11/18/leica-sl-hk-hands-on/#continued stand-alone + Single shot =7.5W¥, If you have another fixed focus confocal 10W¥+: Have the heart to kill thieves , unable ...
- bash Some related small code fragments
Batch modify file name : for i in *.html; do mv $i ${i/oldstring/newstring}; done; Batch replacement of file contents : sed -i "s/oldstrin ...
- iOS When developing , stay Xcode Add multiple Targets Version control
stay iOS In development , There are likely to be the following scenarios : Multiple versions need to be developed , Or if you need to differentiate the charging version , Free version , Or because the network environment needs to distinguish between beta versions , Release , Or due to different channels, we need to distinguish the enterprise version ,AppStore Version and so on . The solution is nothing more than CheckOut ...
- Java visualization AWT
AWT Overall Swing Components replace most AWT Components , Yes AWT Graphical user interface programming has an excellent complement and enhancement . package ch11; import java.awt.*; /** * Created by ...
- Thread synchronization synchronized Synchronization code block Synchronization method Synchronization lock
One Synchronization code block 1. In order to solve the possible exception caused by concurrent operation ,java The multithreading support of the introduces synchronous monitor to solve this problem , The common way to use a synchronization monitor is to synchronize code blocks . The syntax is as follows : synchronized(obj){ // ...
- delphi Algorithm
/ Seeking remainder mod modulus var a1,a2,a3 : Integer; b1,b2,b3 : Integer; c1,c2 : Integer;begin a1 := 987; //ShowMess ...
- ( turn )js Chinese verification of regular expressions
Today, do the input box condition verification of form submission , Verify whether Chinese is included : Online search a circle based on js Regular expression verification is not easy to use , And most of them are from one or two original posts ! What on earth is taking doctrine . According to the search results , This article extracts the essence. , Tell you a good one ...
- understand SqlServer Query plan SQL Sentence optimization analysis
from http://www.cnblogs.com/fish-li/archive/2011/06/06/2073626.html Read the directory Start SQL Server How to find records SQL Ser ...