当前位置:网站首页>Moco V2: further upgrade of Moco series
Moco V2: further upgrade of Moco series
2022-07-28 21:03:00 【Zomi sauce】
He Kaiming Cong CVPR 2020 Published on MoCo V1(Momentum Contrast for Unsupervised Visual Representation Learning), A few days ago, it was hung on arxiv above MoCo V3(An Empirical Study of Training Self-Supervised Visual Transformers),MoCo There are three versions .
Today is an introduction. MoCo Series Second Edition MoCo v2 Is in the SimCLR Combined after publication SimCLR Advantages of image self supervised learning method ,MoCo v1 and v2 Is aimed at CNN The design of the , and MoCo v3 Is aimed at Transformer Structural design , Reflects MoCo The universality of the series to the visual model .
[TOC]
MoCo V2 Improvement
stay SimCLR v1 After release ,MoCo The author team quickly put SimCLR The two methods of raising points have been transplanted to MoCo above , Want to see the performance changes , That is to say MoCo v2. Results show ,MoCo v2 The results have been further improved and exceeded SimCLR v1, prove MoCo The status of a series of methods . because MoCo v2 The article just transplanted SimCLR v1 Skills without great innovation , So the author wrote a book with only 2 Page technical report , There is also a long page with a lot of articles cited .
Interested readers can refer to MoCo V2 The article ,Improved Baselines with Momentum Contrastive Learning.
MoCo V2 Previous related work
Momentum contrast (MoCo V1) indicate , Unsupervised pre training can surpass its image supervised pre training in multiple detection and segmentation tasks , and SimCLR It further reduces the performance gap of linear classifiers between unsupervised and supervised pre training representations .

SimSLR Still use an end-to-end approach , Pictured (a) The way , However, it improves the end-to-end variant of instance recognition in three aspects :(i) A larger batch (4k or 8k), Can provide more negative samples ;(ii) use MLP Head replacement output fc Projection head ;(iii) Stronger data enhancements .
stay SimCLR Specifically , Is to use a powerful data enhancement strategy , Extra use of Gaussian Deblur The strategy and use of Batch size, Let the self supervised learning model see enough negative samples at each step of training (negative samples), This will help the self supervised learning model to learn better visual representations.
Use prediction header Projection head. stay SimCLR in ,Encoder Got 2 individual visual representation Re pass Prediction head Further features , The prediction head is a 2 Layer of MLP, take visual representation This 2048 Dimension vector h_i, h_j Further mapping to 128 In dimensional hidden space , Get new representation z_i, z_j. Using the representation vector z_i, z_j To seek Contrastive loss Finish training , Throw away the prediction head after training , Retain Encoder Used to get visual representation.

We go on according to end-to-end Methods to continue . In the figure End-to-end Methods : One Batch The data assume that N Zhang image, Here's a sample query q And its corresponding positive sample k+, q and k+ Different... From the same picture Data Augmentation, This Batch The remaining data is the negative sample (negative samples). And then put this Batch The data is input to 2 The same architecture but different parameters Encoder f_q and Encoder f_k. And then to two Encoder Output usage of Contrastive loss The loss function makes query q And positive samples k+ Try to be as similar as possible , bring query q And negative samples k- The degree of similarity is as low as possible , Train in this way Encoder f_q and Encoder f_k, This process is called self supervised pre training . After training Encoder The output of is the image visual representation.
End-to-end The disadvantage of the method is : because Encoder f_q and Encoder f_k The parameters of are updated through back propagation , therefore Batch size It can't be too big , otherwise NPU Video memory is not enough .Batch size The size of limits the number of negative samples , It also limits the performance of the self-monitoring model .SimCLR yes Google Proposed , There are huge TPU Cluster blessing , You must not worry about food or clothing , But ordinary people can't do this .
MoCo V2 Go straight to the experiment
Back to today's protagonist ,MoCo v2 The bright spot is that you don't need a strong Google TPU The blessing , Only use 8-GPU Can surpass SimCLR v1 Performance of .v2 take SimCLR Two ways to raise points (a Use prediction header b Use powerful data enhancement strategies ) Transplanted to MoCo v1 above , The experiment is as follows .
Training set :ImageNet Data sets .
Evaluation means :
- Linear Evaluation:Encoder (ResNet-50) The parameters of are fixed , stay Encoder Followed by a classifier ( Specifically, it is a FC layer +softmax Activation function ), Use all of ImageNet label Only train the parameters of the classifier , Without training Encoder Parameters of ). Look at the last Encoder+ The performance of classifiers .
- VOC object detection Use Faster R-CNN detector (C4 backbone), stay VOC 07+12 trainval set The dataset goes on End-to-end Of Fine-tune. stay VOC 07 test The dataset goes on Evaluation.
Use prediction header
predictor Projection head The performance of classification tasks only exists in the self supervised pre training process , stay Linear Evaluation And downstream tasks are removed .MoCo V1 Of Encoder Easy to use ResNet50, Then output through L2-norm Processing to get the final output . stay MoCo V2 Zhongba ResNet Medium output and 1000 Classification related FC The floor is replaced by two FC + Relu, Hidden layer is 2048 dimension .
Linear Evaluation The results are as follows :

In the picture τ Is the corresponding loss function τ . Using predictor and τ=0.07 The time accuracy ranges from from 60.6% to 62.9%.
Data enhancement strategy
Data enhancement strategy , The author in MoCo v1 Added on the basis of blur augmentation, It is found that stronger color interference is limited . Just add blur augmentation It can make ImageNet The performance of classification tasks ranges from 60.6% Growth to 63.4%, Plus the prediction head Projection head The performance can be further improved to 67.3%.

Summary
MoCo v2 hold SimCLR The two main ways to improve :1) Use powerful data enhancement strategies , Specifically, additional use Gaussian Deblur The strategy of ;2) Use prediction header Projection head To MoCo in , And verified SimCLR The effectiveness of the algorithm . Last MoCo v2 The result is better than SimCLR v1, prove MoCo The efficiency of a series of self supervised pre training methods .
quote
[1] Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. IEEE, 2006.
[2] Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
[3] He, Kaiming, et al. "Momentum contrast for unsupervised visual representation learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[4] Chen, Xinlei, et al. "Improved baselines with momentum contrastive learning." arXiv preprint arXiv:2003.04297 (2020).
[5] New work by he Kaiming MoCo V3!!! Discuss its past and present life - You know
边栏推荐
- [C language brush questions] explanation of linked list application
- C # basic 4-written examination question 1
- Integrating database Ecology: using eventbridge to build CDC applications
- C # basic 1-events and commissions
- 什么是“安全感”?沃尔沃用它自己独特的理解以及行动来告诉你
- 【1331. 数组序号转换】
- Nat experiment demonstration (Huawei switch equipment configuration)
- Easynlp Chinese text and image generation model takes you to become an artist in seconds
- Guo Mingxuan: meta contraction is conducive to the development of VR competitors, and apple XR headshow will change the industry rules
- 全链路灰度在数据库上我们是怎么做的?
猜你喜欢

After Europe, it entered Japan and South Korea again, and the globalization of Pico consumer VR accelerated

既要便捷、安全+智能,也要颜值,萤石发布北斗星人脸锁DL30F和极光人脸视频锁Y3000FV

Ask if you don't understand, and quickly become an advanced player of container service!

DeiT:注意力Attention也能蒸馏

What is "security"? Volvo tells you with its unique understanding and action

阿里云 MSE 支持 Go 语言流量防护

LVS+KeepAlived高可用部署实战应用
![【ADB常用命令及其用法大全(来自[醒不了的星期八]的全面总结)】](/img/63/91b53b0ba718537383a97df59fe573.png)
【ADB常用命令及其用法大全(来自[醒不了的星期八]的全面总结)】

什么是数据中台?数据中台带来了哪些价值?_光点科技

Explain in detail the rays and radiographic testing in unity
随机推荐
全链路灰度在数据库上我们是怎么做的?
Unity3d tutorial notes - unity initial 03
4.2 Virtual Member Functions
Hangao database best practice configuration tool Hg_ BP log collection content
Unity foundation 2 editor expansion
DeiT:注意力Attention也能蒸馏
瀚高数据库最佳实践配置工具HG_BP日志采集内容
C foundation 8-reflection and dependency injection
广和通&高通物联网技术开放日成功举办
Explain the camera in unity and its application
又一款装机神器
Explain the imported 3D model in unity
Space shooting Lesson 13: explosion effect
Alibaba cloud MSE supports go language traffic protection
Explain mesh Collider in unity
不懂就问,快速成为容器服务进阶玩家!
Unity knowledge points summary (1)
ntp服务器 时间(查看服务器时间)
作业 ce
Why on earth is it not recommended to use select *?