当前位置:网站首页>Unsupervised learning of visual features by contracting cluster assignments

Unsupervised learning of visual features by contracting cluster assignments

2022-07-07 11:17:00 InfoQ


null
Address of thesis :
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments http://arxiv.org/abs/2006.09882

Code address :
https://github.com/facebookresearch/swav

This paper should be read next to the first two papers . It may be a little hard for you to look at it directly . Unless you already know what comparative learning is . It doesn't matter if you don't want to read these two articles . I will try my best , Easy to understand . After all, I'm a beginner , So I feel that my understanding must be easy for Xiaobai to understand . If there are any mistakes, you are welcome to criticize and correct . We call this paper
SwAV
.

  • The comparative study of the gods in the twilight age  
  • “ The arms race ” The comparative study of the period is good .

To put it simply, we need to find positive samples and negative samples to compare with each other . In the previous work of learning , Various methods of selecting government samples have been proposed , such as :

  • Will the whole imagenet Make a dictionary , Extract one from mini batch As a positive sample . And then randomly extract from it 4096 As a negative sample .
  • Extract one from the data set mini batch Expand it , Using a twin network , Put the original image into a network , Put the enhanced graph into another network , Both are trained at the same time , Use one for both NCE loss perhaps infoNCE loss. A picture and its enlargement as a positive sample , The rest of the images and their extensions are taken as negative samples .
  • Extract one from the data set mini batch Expand it twice , Using a twin network , Put a set of image enhancements into a network , Put another set of image enhancements into another network , Both are trained at the same time , Use one for both NCE loss perhaps infoNCE loss.

At first glance, there seems to be no problem with the above methods , The final effect is also very good , But at this time, a big guy doing clustering jumped out . He said that no matter how you choose the negative samples of comparative learning , Your whole mini match from imagenet Randomly selected from , So your negative sample is random .

  • It may repeatedly extract the same data . Although your data set has many pictures , But you may draw the same picture from it . In extreme cases , If you take a group of pictures as a positive sample , Then you take a group of pictures with the same repetition as the negative sample . That will affect the training .
  • It may not be representative of the entire data set . For example, there are many kinds of animals in this data , But all you get is dogs , So the data is not representative .
  • Of course, the more comprehensive the selection, the better the effect , But if you choose too many negative samples, it will cause a waste of computing resources .

So I propose clustering .

null
Look at the image above .

On the left is the conventional contrastive learning . Enhance the data with different internal strengths , Then put the two enhanced data into different models to obtain the corresponding representation , Finally, compare .

On the right is the network proposed by the author . It also enhances the input differently , The enhanced data will also enter a network , After obtaining the corresponding representation, it will not be directly used for comparison . In the middle there's a C modular , This module is a clustering module . The data you get needs to be compared with the center of the cluster .

The cluster center is shown on the right c, It's a prototype, It's actually a matrix , Its dimension is d  multiply  k,d It's the dimension of the feature , there d And characteristic d  It's the same , For example, as I said before 128 dimension , This k Is how many cluster centers there are . In this article, it chooses 3,000,ImageNet A parameter commonly used in data set clustering parameter .

adopt clustering Yield feature z and prototype c Generate a goal , That's the one above q1、q2.

x1、x2 If it is a positive sample , that z1  and  z2 Their characteristics should be very similar , Just like the previous comparative study ,z1 and z2 Be as similar as possible . So if the two features are very similar , It should be possible to predict each other , in other words , If I take z1 This feature goes with c Go do some multiplication , Theoretically speaking, it can also be predicted q2; vice versa ,z2 And this c To do dot multiplication can also predict q1, So the result of dot multiplication is prediction . Through this transposition prediction (Swapped prediction) Methods ,SwAV The model can be trained .

The final objective function is this one :$$-\frac{1}{N} \sum_{n=1}^{N} \sum_{s, t \sim \mathcal{T}}\left[\frac{1}{\tau} \mathbf{z}
{n t}^{\top} \mathbf{C} \mathbf{q}
{n s}+\frac{1}{\tau} \mathbf{z}
{n s}^{\top} \mathbf{C} \mathbf{q}
{n t}-\log \sum_{k=1}^{K} \exp \left(\frac{\mathbf{z}
{n t}^{\top} \mathbf{c}
{k}}{\tau}\right)-\log \sum_{k=1}^{K} \exp \left(\frac{\mathbf{z}
{n s}^{\top} \mathbf{c}
{k}}{\tau}\right)\right]$$

ImageNet only 1000 class , So here the cluster is clustered into 3000 One is very enough . Let's take a look at what can be solved by using clustering .

  • Let's start with the repeating question : Because you use the cluster center for comparison . Although it is a different cluster center , Then it is certainly impossible for him to repeat .
  • Again, there is no representative problem : Clustering is to gather many pictures into different categories . Compare with the center of each category , Is absolutely representative .
  • Let's talk about the waste of resources caused by too many negative samples in the past . If you want to make an analogy with many negative samples , You may need thousands of negative samples , And even so, it is only an approximation , And if you just compare with the cluster center , You can use hundreds or at most 3,000 Cluster centers , That's enough to say . Greatly reduce the consumption of computing resources .

Besides clustering ,SwAV A muti crop Methods , Those who are interested can go and have a look by themselves .

Finally, let's take a look at its effect .

null
Because this article is in BYOL I came out before , So it only compares with the previous model , We can see clearly . It is more effective than the unsupervised comparative learning method we mentioned earlier , The effect is close to the supervised method . In fact, it is even better than the one behind BYOL  and  SimSiam The effect is also better .

And when using larger models , That is to say, just like the picture , Put one Res50 Widen , And here it is 2 times 、4 times 、5 Times this wide ,SwAV The result can keep rising . When using the largest model (5 Times the model ) When ,SwAV Has followed a supervised model , The gap is very small , and SwAV It's also better than SimCLR *2、SimCLR 4 To be tall . This article is worth reading , And the author also puts forward an improved method from another angle , Is to make the sample more representative . This article is highly recommended for you to read .
原网站

版权声明
本文为[InfoQ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207070909198832.html