当前位置：网站首页>Moco V2: further upgrade of Moco series

Moco V2: further upgrade of Moco series

2022-07-28 21:03:00 【Zomi sauce】

He Kaiming Cong CVPR 2020 Published on MoCo V1（Momentum Contrast for Unsupervised Visual Representation Learning）, A few days ago, it was hung on arxiv above MoCo V3（An Empirical Study of Training Self-Supervised Visual Transformers）,MoCo There are three versions .

Today is an introduction. MoCo Series Second Edition MoCo v2 Is in the SimCLR Combined after publication SimCLR Advantages of image self supervised learning method ,MoCo v1 and v2 Is aimed at CNN The design of the , and MoCo v3 Is aimed at Transformer Structural design , Reflects MoCo The universality of the series to the visual model .

[TOC]

MoCo V2 Improvement

stay SimCLR v1 After release ,MoCo The author team quickly put SimCLR The two methods of raising points have been transplanted to MoCo above , Want to see the performance changes , That is to say MoCo v2. Results show ,MoCo v2 The results have been further improved and exceeded SimCLR v1, prove MoCo The status of a series of methods . because MoCo v2 The article just transplanted SimCLR v1 Skills without great innovation , So the author wrote a book with only 2 Page technical report , There is also a long page with a lot of articles cited .

Interested readers can refer to MoCo V2 The article ,Improved Baselines with Momentum Contrastive Learning.

MoCo V2 Previous related work

Momentum contrast （MoCo V1） indicate , Unsupervised pre training can surpass its image supervised pre training in multiple detection and segmentation tasks , and SimCLR It further reduces the performance gap of linear classifiers between unsupervised and supervised pre training representations .

SimSLR Still use an end-to-end approach , Pictured （a） The way , However, it improves the end-to-end variant of instance recognition in three aspects ：(i) A larger batch (4k or 8k), Can provide more negative samples ;(ii) use MLP Head replacement output fc Projection head ;(iii) Stronger data enhancements .

stay SimCLR Specifically , Is to use a powerful data enhancement strategy , Extra use of Gaussian Deblur The strategy and use of Batch size, Let the self supervised learning model see enough negative samples at each step of training (negative samples), This will help the self supervised learning model to learn better visual representations.

Use prediction header Projection head. stay SimCLR in ,Encoder Got 2 individual visual representation Re pass Prediction head Further features , The prediction head is a 2 Layer of MLP, take visual representation This 2048 Dimension vector h_i, h_j Further mapping to 128 In dimensional hidden space , Get new representation z_i, z_j. Using the representation vector z_i, z_j To seek Contrastive loss Finish training , Throw away the prediction head after training , Retain Encoder Used to get visual representation.

We go on according to end-to-end Methods to continue . In the figure End-to-end Methods ： One Batch The data assume that N Zhang image, Here's a sample query q And its corresponding positive sample k+, q and k+ Different... From the same picture Data Augmentation, This Batch The remaining data is the negative sample (negative samples). And then put this Batch The data is input to 2 The same architecture but different parameters Encoder f_q and Encoder f_k. And then to two Encoder Output usage of Contrastive loss The loss function makes query q And positive samples k+ Try to be as similar as possible , bring query q And negative samples k- The degree of similarity is as low as possible , Train in this way Encoder f_q and Encoder f_k, This process is called self supervised pre training . After training Encoder The output of is the image visual representation.

End-to-end The disadvantage of the method is ： because Encoder f_q and Encoder f_k The parameters of are updated through back propagation , therefore Batch size It can't be too big , otherwise NPU Video memory is not enough .Batch size The size of limits the number of negative samples , It also limits the performance of the self-monitoring model .SimCLR yes Google Proposed , There are huge TPU Cluster blessing , You must not worry about food or clothing , But ordinary people can't do this .

MoCo V2 Go straight to the experiment

Back to today's protagonist ,MoCo v2 The bright spot is that you don't need a strong Google TPU The blessing , Only use 8-GPU Can surpass SimCLR v1 Performance of .v2 take SimCLR Two ways to raise points (a Use prediction header b Use powerful data enhancement strategies ) Transplanted to MoCo v1 above , The experiment is as follows .

Training set ：ImageNet Data sets .

Evaluation means ：

Linear Evaluation：Encoder (ResNet-50) The parameters of are fixed , stay Encoder Followed by a classifier （ Specifically, it is a FC layer +softmax Activation function ）, Use all of ImageNet label Only train the parameters of the classifier , Without training Encoder Parameters of ). Look at the last Encoder+ The performance of classifiers .
VOC object detection Use Faster R-CNN detector (C4 backbone), stay VOC 07+12 trainval set The dataset goes on End-to-end Of Fine-tune. stay VOC 07 test The dataset goes on Evaluation.

Use prediction header

predictor Projection head The performance of classification tasks only exists in the self supervised pre training process , stay Linear Evaluation And downstream tasks are removed .MoCo V1 Of Encoder Easy to use ResNet50, Then output through L2-norm Processing to get the final output . stay MoCo V2 Zhongba ResNet Medium output and 1000 Classification related FC The floor is replaced by two FC + Relu, Hidden layer is 2048 dimension .

Linear Evaluation The results are as follows ：

In the picture τ Is the corresponding loss function τ . Using predictor and τ=0.07 The time accuracy ranges from from 60.6% to 62.9%.

Data enhancement strategy

Data enhancement strategy , The author in MoCo v1 Added on the basis of blur augmentation, It is found that stronger color interference is limited . Just add blur augmentation It can make ImageNet The performance of classification tasks ranges from 60.6% Growth to 63.4%, Plus the prediction head Projection head The performance can be further improved to 67.3%.

Summary

MoCo v2 hold SimCLR The two main ways to improve ：1） Use powerful data enhancement strategies , Specifically, additional use Gaussian Deblur The strategy of ;2） Use prediction header Projection head To MoCo in , And verified SimCLR The effectiveness of the algorithm . Last MoCo v2 The result is better than SimCLR v1, prove MoCo The efficiency of a series of self supervised pre training methods .

quote

[1] Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. IEEE, 2006.

[2] Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.

[3] He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[4] Chen, Xinlei, et al. "Improved baselines with momentum contrastive learning." arXiv preprint arXiv:2003.04297 (2020).

[5] New work by he Kaiming MoCo V3！！！ Discuss its past and present life - You know

[6] https://zhuanlan.zhihu.com/p/46

原网站

版权声明
本文为[Zomi sauce]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281858296378.html