当前位置：网站首页>Moco V2 literature research [self supervised learning]

Moco V2 literature research [self supervised learning]

2022-07-05 02:30:00 【A classmate Wang】

Personal profile ：

Nanjing University of Posts and telecommunications , Computer science and technology , Undergraduate

● A foreword ：《MoCo v1 Literature research [ Self supervised learning ]》

● A foreword ：《SimCLR v1 Literature research [ Self supervised learning ]》

Insert picture description here

Original paper address ：https://arxiv.org/pdf/2003.04297.pdf Code: https://github.com/facebookresearch/moco

Ⅰ. Abstract

● Theoretical contribution ：MoCo v2 Integrated MoCo v1 and SimCLR v1 Algorithm , It is the epitome of the two , And comprehensively surpass SimCLR. It absorbs SimCLR Two important improvements of ：
① Use one MLP Projection head [using an MLP projection head]
② More data enhancements [more data augmentation]

● Additional explanation ： No need for image SimCLR Use oversized 『 Batch size (batch size)』, Ordinary 8 Zhang gpu Ready to train .

● experimental result ：MoCo v2 With 『 iteration (epochs)』 and 『 Batch size (batch size)』 All ratio SimCLR Small , But the accuracy is higher than it .

Insert picture description here

Ⅱ. Introduction

● Introduction( Preface ) yes Abstract( Abstract ) An extension of . Here slightly .

Ⅲ. Background

● Here is a simple restatement in the paper MoCo v1.

Ⅳ. Experiments

4.1 Parameter setting [Settings]

● Data sets ： $1.28 M$ ImageNet

● Follow two common evaluation protocols ：
① ImageNet Linear classification ： Freeze feature , Training supervised linear classifiers . And record the single cutting (224×224) after ,Top-1 The accuracy of .
②『 transfer (Transferring)』 To VOC Object detection ： Use COCO『 Measurement Suite (suite of metrics)』 Yes VOC 07 The test set is evaluated .

● They use with MoCo The same super parameter ( Unless otherwise noted ) And the code base . All results are in standard size ResNet-50.

4.2 “MLP Projection head ”[MLP head]

● They will MoCo Medium $f c$ Replace the header with 2 layer $M L P$ head （ Hidden layer is 2048-d, Use ReLU）. Be careful , This only affects the unsupervised training stage . Linear classification or 『 transfer (Transferring)』 Phase does not use this $M L P$ head .

● They first look for one about the following $I n f o N C E$ 『 Compare the loss function (contrastive loss function)』 The best “ Temperature parameters $τ$ ”： $\mathcal{L}_{q,k^+,\{k^-\}}=-log\dfrac{exp(q·k^+/τ)}{exp(q·k^+/τ)+\sum_{k^-}exp(q·k^-/τ)}$

● give the result as follows ：
Insert picture description here

● Then they use the best $τ$ , Let its default value be 0.2 To do follow-up experiments ：

Insert picture description here
◆ Description of the above table ：
① The gray line is the accuracy of self-monitoring (Top-1).
② The second line is “ Intact MoCo v1”.
③ Next “(a)、(b)、、(d)、(e)” It's different comparative experiments .
④ Sinister “unwup. pre-train” Aiming at “ImageNet Linear classification ”.
⑤ Dexter “VOC detection” Aiming at “VOC Object detection ”.
⑥ “MLP”： Contains a multi-layer perceptron [with an MLP head].
⑦ “aug+”： Image enhancement with additional Gaussian blur [with extra blur augmentation].
⑧ “cos”： Use cosine learning rate [cosine learning rate schedule].

● You can find ,“ImageNet Linear classification ” Than “VOC Object detection ” The benefits are greater .

4.3 Image enhancement [Augmentation]

● They expanded the original 『 Data to enhance (data augmentation)』 methods , New 『 Blur enhancement (blur augmentation)』. And they also found that SimCLR Used in the 『 Color distortion (color distortion)』 In their model, the return performance will decrease .

● Detailed results can be found in “4.2” That kind of picture . The last thing to say is ：“ Accuracy of linear classification ” And “ Performance of migrating to target detection ” Not monotonically related . Because the former gains more , The latter is not very profitable .

4.4 and SimCLR Compare [Comparison with SimCLR]

● Obvious ,MoCo v2 Perfect victory SimCLR.

Insert picture description here

4.5 Calculate the cost [Computational cost]

● MoCo v2 Yes, it is 8 individual “V100 16G GPU” , And in Pytorch Implemented on the . It and “ End to end (end-to-end)” The required “ Space / Time cost ” Here's a comparison ：

Insert picture description here

Ⅴ. Discussion

● There is no discussion in the paper . Write a personal opinion here . He Kaiming's team is great , Every time I read their MoCo A series of documents , I feel like I'm “ To see only one spot ”, Clear logic 、 The method is novel 、 The algorithm is concise 、 The results are eye-catching 、 Summarize in place . EH , Though not to , However, my heart is yearning for it .

Ⅵ. Summary

● MoCo v2 Is in SimCLR Published one after another , It's a very short article , Only 2 page . stay MoCo v2 in , The authors integrate SimCLR The two main promotion methods in... Are MoCo in , And verified SimCLR The effectiveness of the algorithm .SimCLR The two ways to raise points are ：
① Use powerful 『 Data to enhance (data augmentation)』 Strategy , Specifically, additional use 『 Gaussian blur (Gaussian Deblur)』 Image enhancement strategies and the use of huge 『 Batch size (batch size)』, Let the self supervised learning model see enough at every step of training 『 Negative sample (negative samples)』, This will help the self supervised learning model to learn better 『 Visual representation (Visual Representation)』.
② Use prediction header “ Projection head $g (\cdot)$ ”. stay SimCLR in ,『 Encoder (Encoder) 』 Got 2 individual 『 Visual representation (Visual Representation)』 Re pass “ Projection head $g (\cdot)$ ” Further features , The prediction head is a 2 Layer of MLP, take 『 Visual representation (Visual Representation)』 This 2048 The vector of dimension is further mapped to 128 dimension 『 Hidden space (latent space)』 in , Get new 『 characterization (Representation)』. Or use the original to ask 『 Contrast the loss (contrastive loss)』 Finish training , Throw it away after training “ Projection head $g (\cdot)$ ”, Retain 『 Encoder (Encoder) 』 Used to get 『 Visual representation (Visual Representation)』.