当前位置：网站首页>Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations

Deep learning (self supervision: simpl) -- a simple framework for contractual learning of visual representations

2022-07-28 06:09:00 【Food to doubt life】

List of articles

Preface
SimCLR sketch
experiment

Preface

The article is Hinton and Google Published in 2020 ICML A self-monitoring article on .

Code address ： https://github.com/google-research/simclr

In fact, I smell it when I read the article , It must be Google Home works , The experimental data are very detailed , It explores some characteristics of comparative learning for us .

This article will SimCLR Make a brief introduction , And simply record the interesting experiments .

SimCLR sketch

Insert picture description here
The picture above shows SimCLR Model structure of , The specific process is

For an input image $x$ Apply two different data enhancements , Get two pictures $\tilde{x_i}$ 、 $\tilde{x_j}$
Input two pictures into one CNN The Internet $f (x)$ The extracted features , obtain $h_i$ 、 $h_j$ Two feature vector
Two feature vector Through a MLP The Internet $g (x)$ Handle , obtain $z_i$ 、 $z_j$

hypothesis batch size The size is $N$ , Enhanced by data , You can get $2 N$ Zhang image ,SimCLR In comparative learning , Positive and negative examples are required .

Right picture $x$ Apply two different data enhancements , obtain $\tilde{x_i}$ 、 $\tilde{x_j}$ , after CNN、MLP Obtained after processing $z_i$ 、 $z_j$ , $z_i$ And $z_j$ Form a positive example pair , $z_i$ And batch size Other images in （ Including the image after data enhancement ） Of feature vector Form a negative example pair , therefore A picture will exist 1 A positive example is right , $2 N - 2$ A negative example is right . The loss function of a picture is
Insert picture description here
$sim(z_i,z_j)$ It means to calculate the cosine similarity of two vectors , $T$ Is a super parameter , $2 N$ The sum of the loss functions of images is averaged , Get the final loss function , It's actually going on $2 N - 1$ The classification of

Algorithm pseudocode
Insert picture description here

experiment

The experimental part has many valuable parts , This paper explores some trick Yes SimCLR Influence , And some conclusions are given

Unless specifically mentioned , All experimental results in this section are based on SimCLR stay ImageNet1000 Pre train one ResNet-50, next freeze Feature extractor , Connect a linear classifier for training , After the training, the model is ImageNet1000 Accuracy on the test set .

Performance impact of data enhancement

Insert picture description here
Please refer to English for the meaning of the above figure , Three conclusions can be drawn

Use a single data enhancement , The effect of comparative learning will be very poor
random cropping And random color distortion The combination effect is the best
The influence of data enhancement on comparative learning is very obvious , This is not a good property , Many times we need to make exhaustive trials and errors

Unsupervised contrastive learning benefits (more) from bigger models

Insert picture description here
The above figure shows the effect of image widening and deepening on the performance of the model ,R18(2x) Express ResNet18 Double the width , Other symbols, and so on .

To observe the above , I have the following conclusions

When increasing the model capacity , First consider deepening ,ResNet152 Performance and ResNet18 Quite a few , And the parameter quantity does not rise much , Deepening the network is the first choice in practice
Deep enough , Then consider the width , At this time, the parameter quantity will soar , Maybe the training speed will be much slower , Widening the network is the second best choice in practice

A nonlinear projection head improves the representation quality of the layer before it

Insert picture description here
The figure above explores $z$ The influence of the dimension of on the linear classification performance of the model , $z$ See for the meaning of SimCLR Brief section , so $z$ The dimension of has little effect on the performance of the model , And nonlinear MLP Performance is better than linear MLP, This is in MoCo v2 It has also been verified in .

SimCLR There are two features that can be used in linear classification , One is the output of the feature extractor $h$ , Two is MLP Layer output $g (h)$ （ See SimCLR Brief section ）, In linear classification , Use $h$ Better than $g (h)$ （ Greater than 10%）, Probably because MLP Filter out some useful information

Contrastive learning benefits (more) from larger batch sizes and longer training

Insert picture description here
There are two conclusions that can be drawn from the above figure , For the comparative learning algorithm using negative examples

batch size The bigger it is , The better the result. , And significantly improved , But for the comparative learning algorithm that only uses positive examples （ for example BYOL、simsiam）,batch size Size does not have such a significant impact on performance
Training epoch The longer the , The better the result. , This is also true for the comparative learning algorithm that only uses positive examples

原网站

版权声明
本文为[Food to doubt life]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207280518199776.html