当前位置：网站首页>Deep learning (self supervision: CPC V2) -- data efficient image recognition with contractual predictive coding

Deep learning (self supervision: CPC V2) -- data efficient image recognition with contractual predictive coding

2022-07-28 06:09:00 【Food to doubt life】

List of articles

Preface
CPC v1 Introduce
CPC v2 Introduce
experiment

Preface

This article is published in ICML 2020 On .

This article is about CPC v1 Improved , stay ImageNet Upper top-1 The accuracy is from 48.7% Up to the 71.5%.

This article will briefly introduce CPC v2, The experiment part is not summarized here .

Of the paper Figure 1 An interesting figure is given , Here's the picture ：
Insert picture description here
The blue line is for use CPC v1 pretrain One ResNet, Put it in ImageNet On finetune Performance after , The red line is ResNet Directly in ImageNet On training from scratch, Horizontal axis finetune/train Training data used , You can see , With the reduction of training data ,training from scratch The performance degradation of the model is particularly obvious , And when using all the data to train the model ,finetune The performance of the model is also better than traning from scratch The effect is good . This shows that compared with training from scratch Model of , Use self-monitoring pretrain Model of , Use less training data , Similar performance can be achieved , namely The self supervised trained model , When applied to downstream tasks , You may need only a small amount of data to achieve good performance .

CPC v1 Introduce

Insert picture description here
Above, CPC v2 Model structure of , For the sake of illustration , I put it on CPC v1 In a section .

The input image will be divided into several overlapping Of patch, $X_{i,j}$ It means the first one $i$ That's ok , The first $j$ Column patch
be-all patch Will go through a feature extractor to extract features （ Corresponding to the blue model ）, Get a series of eigenvectors $Z_{i,j}$
Will be located at $i$ Xing di $j$ The eigenvector of the column $Z_{i,j}$ , And located at $i$ Xing di $j$ The eigenvector above the column $Z_{u,j}$ ( $u < i$ ) concat together , Through a Context network $G_{\phi}$ （ Corresponding to the red model ） Handle , Get one context vector $C_{i,j}$
Yes $C_{i,j}$ Apply a linear change , The linear change matrix is $W_k$ , namely $\hat Z_{i+k,j}=W_k C_{i,j}$ , utilize $\hat Z_{i+k,j}$ And $Z_{i+k,j}$ Contrast learning , It can be simply understood as using the features of the upper half of an image , Predict the characteristics of the lower half of the image

The loss function of comparative learning is InfoNCE, as follows ：
Insert picture description here
Negative example $Z_l$ From other batch The image block of , Or other image blocks of the same image .

Personal view ：CPC v1 The operation of is not difficult to understand , Take people for example , If we understand what a dog looks like , Then we see the top half of the dog in an image , Naturally, it can be associated with the shape of the dog in the lower part of the image . Want to make InfoNCE The loss function goes down , It is necessary to establish the connection between the top half and the bottom half of the dog in the image , These connections may allow the model to understand what a dog looks like , That is, what characteristics dogs have .

CPC v2 Introduce

For self-monitoring ,trick It has a huge impact on performance , This is similar to the previous study continual learning Is not the same .

Compared with CPC v1,CPC v2 Introduced more trick, To be specific

Use a larger model ,CPC v1 Use only the ResNet-101 Top three in residual stack,CPC v2 Deepen the model to ResNet-161（ImageNet top-1 Accuracy improved 5%）, At the same time, the resolution of the input image block is improved （ from 60x60 Turn into 80x80,ImageNet top-1 Accuracy improved 2%）.
because CPC v1 The prediction of is only related to several patch of , and BN Others will be introduced patch Information about , Similar to image generation ,BN The algorithm will damage CPC v1 Performance of , The author uses layer normalization to replace BN,ImageNet top-1 Accuracy improved 2%.
Because large models are easier to over fit , The author raised the difficulty of self-monitoring task , Predict a patch,CPC v2 Used up, down, left and right feature vector, and CPC v1 Only the top feature vector. because CPC v2 Contact more semantic information , Extract and below patch The difficulty of related semantic information will also increase .ImageNet top-1 Accuracy improved 2.5%.
Enhance with better data , First, take out randomly rgb Two of the three channels ,ImageNet top-1 Accuracy improved 3%, Then apply some geometry 、 Color 、 Elastic deformation and other data enhancement ,ImageNet top-1 Accuracy improved 4.5%, It can be seen that data enhancement has a great impact on self-monitoring .

Above trick Yes CPC v1 The impact of is shown in the figure below
Insert picture description here

experiment

Don't make too much summary in the experiment , Here are some interesting parts .
Insert picture description here
ResNet200 With supervision pretrain Model , Then connect the linear classifier finetune,ResNet33 use CPC v2 pretrain Model , Then connect the linear classifier finetune（ At this time, the feature extractor will also finetune, Instead of freezing ）.

As can be seen from the table above , Use CPC v2 pretrain well ResNet33, When the amount of data is small , Performance ratio ResNet200 It is better to , Even with all training data , The effect is still better , And notice ResNet33 The model capacity is not as good as ResNet200 Of . You can see , Self supervision has great potential .

原网站

版权声明
本文为[Food to doubt life]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280518199908.html