当前位置:网站首页>Moco V2: further upgrade of Moco series
Moco V2: further upgrade of Moco series
2022-07-28 21:03:00 【Zomi sauce】
He Kaiming Cong CVPR 2020 Published on MoCo V1(Momentum Contrast for Unsupervised Visual Representation Learning), A few days ago, it was hung on arxiv above MoCo V3(An Empirical Study of Training Self-Supervised Visual Transformers),MoCo There are three versions .
Today is an introduction. MoCo Series Second Edition MoCo v2 Is in the SimCLR Combined after publication SimCLR Advantages of image self supervised learning method ,MoCo v1 and v2 Is aimed at CNN The design of the , and MoCo v3 Is aimed at Transformer Structural design , Reflects MoCo The universality of the series to the visual model .
[TOC]
MoCo V2 Improvement
stay SimCLR v1 After release ,MoCo The author team quickly put SimCLR The two methods of raising points have been transplanted to MoCo above , Want to see the performance changes , That is to say MoCo v2. Results show ,MoCo v2 The results have been further improved and exceeded SimCLR v1, prove MoCo The status of a series of methods . because MoCo v2 The article just transplanted SimCLR v1 Skills without great innovation , So the author wrote a book with only 2 Page technical report , There is also a long page with a lot of articles cited .
Interested readers can refer to MoCo V2 The article ,Improved Baselines with Momentum Contrastive Learning.
MoCo V2 Previous related work
Momentum contrast (MoCo V1) indicate , Unsupervised pre training can surpass its image supervised pre training in multiple detection and segmentation tasks , and SimCLR It further reduces the performance gap of linear classifiers between unsupervised and supervised pre training representations .

SimSLR Still use an end-to-end approach , Pictured (a) The way , However, it improves the end-to-end variant of instance recognition in three aspects :(i) A larger batch (4k or 8k), Can provide more negative samples ;(ii) use MLP Head replacement output fc Projection head ;(iii) Stronger data enhancements .
stay SimCLR Specifically , Is to use a powerful data enhancement strategy , Extra use of Gaussian Deblur The strategy and use of Batch size, Let the self supervised learning model see enough negative samples at each step of training (negative samples), This will help the self supervised learning model to learn better visual representations.
Use prediction header Projection head. stay SimCLR in ,Encoder Got 2 individual visual representation Re pass Prediction head Further features , The prediction head is a 2 Layer of MLP, take visual representation This 2048 Dimension vector h_i, h_j Further mapping to 128 In dimensional hidden space , Get new representation z_i, z_j. Using the representation vector z_i, z_j To seek Contrastive loss Finish training , Throw away the prediction head after training , Retain Encoder Used to get visual representation.

We go on according to end-to-end Methods to continue . In the figure End-to-end Methods : One Batch The data assume that N Zhang image, Here's a sample query q And its corresponding positive sample k+, q and k+ Different... From the same picture Data Augmentation, This Batch The remaining data is the negative sample (negative samples). And then put this Batch The data is input to 2 The same architecture but different parameters Encoder f_q and Encoder f_k. And then to two Encoder Output usage of Contrastive loss The loss function makes query q And positive samples k+ Try to be as similar as possible , bring query q And negative samples k- The degree of similarity is as low as possible , Train in this way Encoder f_q and Encoder f_k, This process is called self supervised pre training . After training Encoder The output of is the image visual representation.
End-to-end The disadvantage of the method is : because Encoder f_q and Encoder f_k The parameters of are updated through back propagation , therefore Batch size It can't be too big , otherwise NPU Video memory is not enough .Batch size The size of limits the number of negative samples , It also limits the performance of the self-monitoring model .SimCLR yes Google Proposed , There are huge TPU Cluster blessing , You must not worry about food or clothing , But ordinary people can't do this .
MoCo V2 Go straight to the experiment
Back to today's protagonist ,MoCo v2 The bright spot is that you don't need a strong Google TPU The blessing , Only use 8-GPU Can surpass SimCLR v1 Performance of .v2 take SimCLR Two ways to raise points (a Use prediction header b Use powerful data enhancement strategies ) Transplanted to MoCo v1 above , The experiment is as follows .
Training set :ImageNet Data sets .
Evaluation means :
- Linear Evaluation:Encoder (ResNet-50) The parameters of are fixed , stay Encoder Followed by a classifier ( Specifically, it is a FC layer +softmax Activation function ), Use all of ImageNet label Only train the parameters of the classifier , Without training Encoder Parameters of ). Look at the last Encoder+ The performance of classifiers .
- VOC object detection Use Faster R-CNN detector (C4 backbone), stay VOC 07+12 trainval set The dataset goes on End-to-end Of Fine-tune. stay VOC 07 test The dataset goes on Evaluation.
Use prediction header
predictor Projection head The performance of classification tasks only exists in the self supervised pre training process , stay Linear Evaluation And downstream tasks are removed .MoCo V1 Of Encoder Easy to use ResNet50, Then output through L2-norm Processing to get the final output . stay MoCo V2 Zhongba ResNet Medium output and 1000 Classification related FC The floor is replaced by two FC + Relu, Hidden layer is 2048 dimension .
Linear Evaluation The results are as follows :

In the picture τ Is the corresponding loss function τ . Using predictor and τ=0.07 The time accuracy ranges from from 60.6% to 62.9%.
Data enhancement strategy
Data enhancement strategy , The author in MoCo v1 Added on the basis of blur augmentation, It is found that stronger color interference is limited . Just add blur augmentation It can make ImageNet The performance of classification tasks ranges from 60.6% Growth to 63.4%, Plus the prediction head Projection head The performance can be further improved to 67.3%.

Summary
MoCo v2 hold SimCLR The two main ways to improve :1) Use powerful data enhancement strategies , Specifically, additional use Gaussian Deblur The strategy of ;2) Use prediction header Projection head To MoCo in , And verified SimCLR The effectiveness of the algorithm . Last MoCo v2 The result is better than SimCLR v1, prove MoCo The efficiency of a series of self supervised pre training methods .
quote
[1] Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. IEEE, 2006.
[2] Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
[3] He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[4] Chen, Xinlei, et al. "Improved baselines with momentum contrastive learning." arXiv preprint arXiv:2003.04297 (2020).
[5] New work by he Kaiming MoCo V3!!! Discuss its past and present life - You know
边栏推荐
- Space shooting lesson 14: player life
- 【1331. 数组序号转换】
- JS fly into JS special effect pop-up login box
- 第六七八次作业
- Why on earth is it not recommended to use select *?
- C # basic 1-events and commissions
- 又一款装机神器
- Space shooting Lesson 15: props
- Nat experiment demonstration (Huawei switch equipment configuration)
- Ask if you don't understand, and quickly become an advanced player of container service!
猜你喜欢

Prize essay solicitation | 2022 cloud native programming challenge draft activity opens

Why on earth is it not recommended to use select *?

有奖征文 | 2022 云原生编程挑战赛征稿活动开启

C language function program example (super complete)

JS picture hanging style photo wall JS special effect

Confusing knowledge points of software designer examination

How do we do full link grayscale on the database?

什么是数据中台?数据中台带来了哪些价值?_光点科技

EfficientFormer:轻量化ViT Backbone

《软件设计师考试》易混淆知识点
随机推荐
The average altitude is 4000 meters! We built a cloud on the roof of the world
SQL audit tool self introduction owls
Understanding of C # delegate
Guo Mingxuan: meta contraction is conducive to the development of VR competitors, and apple XR headshow will change the industry rules
[tool class] util package of map, common entity classes are converted to map and other operations
十七年运维老兵万字长文讲透优维低代码~
Nat experiment demonstration (Huawei switch equipment configuration)
Interesting pictures and words
LVS+KeepAlived高可用部署实战应用
Interpretation of ue4.25 slate source code
New development of letinar in Korea: single lens 4.55G, light efficiency up to 10%
C # basic 1-events and commissions
After Europe, it entered Japan and South Korea again, and the globalization of Pico consumer VR accelerated
Explain various coordinate systems in unity in detail
ntp服务器 时间(查看服务器时间)
Tested interviewed Zuckerberg: reveal more details of four VR prototypes
GIS数据漫谈(六)— 投影坐标系统
Huawei cloud digital asset chain, "chain" connects the digital economy, infinite splendor
The cloud native programming challenge is hot, with 510000 bonus waiting for you to challenge!
Is it necessary to disconnect all connections before deleting the PostgreSQL database?