当前位置:网站首页>ECCV 2022 | Tencent Youtu proposed disco: the effect of saving small models in self supervised learning
ECCV 2022 | Tencent Youtu proposed disco: the effect of saving small models in self supervised learning
2022-07-04 23:17:00 【Zhiyuan community】
DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning
The paper :https://arxiv.org/abs/2104.09124 Code ( Open source ):https://github.com/Yuting-Gao/DisCo-pytorch
Motivation
Self supervised learning usually refers to the model learning general representations on large-scale unlabeled data , Migrate to downstream related tasks . Because the learned general characterization can significantly improve the performance of downstream tasks , Self supervised learning is widely used in various scenarios . Generally speaking , The larger the model capacity , The better the effect of self supervised learning [1,2]. conversely , Lightweight model (EfficientNet-B0, MobileNet-V3, EfficientNet-B1) The effect of self supervised learning is far less than that of the relatively large capacity model (ResNet50/101/152/50*2).
At present, the way to improve the performance of lightweight models in self supervised learning is mainly through distillation , Transfer the knowledge of the model with larger capacity to the student model .SEED [2] be based on MoCo-V2 frame [3,4], Large capacity model as Teacher, Lightweight model as Student, share MoCo-V2 Negative sample space in the frame (Queue), Through cross entropy, positive samples and the same negative samples are forced to Student And Teacher The distribution in space should be as same as possible .CompRess [1] And tried Teacher and Student Maintain their respective negative sample spaces , Use at the same time KL Divergence to narrow the distribution . The above methods can effectively Teacher Knowledge transferred to Student, So as to improve the lightweight model Student The effect of ( This article will use... Alternately Student And lightweight models ).
This paper proposes Distilled Contrastive Learning (DisCo), A simple and effective self supervised learning method based on distillation lightweight model , This method can significantly improve Student And Some lightweight models can be very close Teacher Performance of . This method has the following observations :
- Distillation learning based on self-monitoring , because The last layer of representation contains the global absolute position and local relative position information of different samples in the whole representation space , and Teacher This kind of information in Student Better , So just pull closer Teacher And Student The representation of the last layer may be the best .
- stay CompRess [1] in ,Teacher And Student The model shares a negative sample queue (1q) And have their own negative sample queue (2q) The gap is 1% Inside . This method is migrated to the downstream task data set CUB200, Car192, This method has its own negative sample queue and can even significantly exceed the shared negative sample queue . This explanation ,Student Not from Teacher Learn enough effective knowledge in the shared negative sample space .Student There is no need to rely on Teacher Negative sample space of .
- One of the benefits of abandoning shared queues , As a whole The framework does not depend on MoCo-V2, The whole framework is more concise .Teacher/Student The model can be compared with others MoCo-V2 More effective self-monitoring / Unsupervised representation learning method combined , Further improve the final performance of the lightweight model after distillation .
In the current self-monitoring methods ,MLP The low dimension of the hidden layer may be the bottleneck of distillation performance . Adding the dimension of the hidden layer of this structure in the self supervised learning and distillation stage can further improve the effect of the final lightweight model after distillation , There will be no extra cost in the deployment phase . Change the hidden layer dimension from 512->2048,ResNet-18 Can significantly improve 3.5%.
Method
This paper proposes a simple but effective framework Distilled Contrastive Learning (DisCo) .Student Self supervised learning will be carried out at the same time as learning the same sample in Teacher In the representation space of .
DisCo Framework
As shown in the figure above , Expand through data (Data Augmentation) The operation generates the image into two views (View). In addition to self supervised learning , A self supervised learning is also introduced Teacher Model . Require the same view of the same sample , after Student And fixed parameter Teacehr The final characterization of is consistent . In the main experiment of this paper , Self supervised learning is based on MoCo-V2 (Contrastive Learning), And keep the same sample passing Teacher And Student The characterization similarity of the output characterization is through consistent regularization (Consistency Regularization). This paper uses mean square error to make Student Learn that the sample is corresponding Teacher Distribution in space .
边栏推荐
- Async await used in map
- 时间 (计算)总工具类 例子: 今年开始时间和今年结束时间等
- Redis:Redis的事务
- A complete tutorial for getting started with redis: understanding and using APIs
- One of the commonly used technical indicators, reading boll Bollinger line indicators
- MP advanced operation: time operation, SQL, querywapper, lambdaquerywapper (condition constructor) quick filter enumeration class
- Notepad++--编辑的技巧
- 高通WLAN框架学习(30)-- 支持双STA的组件
- ICML 2022 || 3DLinker: 用于分子链接设计的E(3)等变变分自编码器
- 【js】-【排序-相关】-笔记
猜你喜欢
C language to quickly solve the reverse linked list
Redis démarrer le tutoriel complet: Pipeline
Excel 快捷键-随时补充
LabVIEW中比较两个VI
【剑指Offer】6-10题
壁仞科技研究院前沿技术文章精选
Redis入門完整教程:Pipeline
Redis getting started complete tutorial: publish and subscribe
Blue sky nh55 series notebook memory reading and writing speed is extremely slow, solution process record
可观测|时序数据降采样在Prometheus实践复盘
随机推荐
JS 3D explosive fragment image switching JS special effect
Redis入门完整教程:集合详解
The solution to the lack of pcntl extension under MAMP, fatal error: call to undefined function pcntl_ signal()
Qt加法计算器(简单案例)
小程序vant tab组件解决文字过多显示不全的问题
ScriptableObject
法国学者:最优传输理论下对抗攻击可解释性探讨
可观测|时序数据降采样在Prometheus实践复盘
Intelligence test to see idioms guess ancient poems wechat applet source code
[ODX studio edit PDX] - 0.2-how to compare two pdx/odx files of compare
Compare two vis in LabVIEW
Editplus-- usage -- shortcut key / configuration / background color / font size
colResizable. JS auto adjust table width plug-in
解决无法通过ssh服务远程连接虚拟机
A mining of edu certificate station
微软禁用IE浏览器后,打开IE浏览器闪退解决办法
Async await used in map
Docker镜像的缓存特性和Dockerfile
Basic knowledge of database
The difference between cout/cerr/clog