当前位置:网站首页>Deep learning (self supervision: simple Siam) -- Exploring simple Siamese representation learning
Deep learning (self supervision: simple Siam) -- Exploring simple Siamese representation learning
2022-07-28 06:09:00 【Food to doubt life】
List of articles
Preface
This article was published by he Kaiming group in CVPR2021 Articles on , At present, it has been nominated for the best paper , It mainly solves the problem of collapse in self supervised comparative learning . Collapse means no matter what input , The feature vectors output by the feature extractor are the same .
This article will briefly introduce SimSiam, Record the interesting experimental results .
The author did not explain why SimSiam Can avoid collapse , But the article is really brilliant
SimSiam sketch

The figure above shows SimSiam Overall structure , To be specific
- For the input image x Apply data enhancement , obtain x 1 x_1 x1、 x 2 x_2 x2
- take x 1 x_1 x1、 x 2 x_2 x2 Input into the same feature extractor , And through a projection MLP Get processed z 1 z_1 z1、 z 2 z_2 z2
- z 1 z_1 z1 after prediction MLP Handle , obtain p 1 p_1 p1
Contrast learning loss by 
Back propagation , z 2 ∣ ∣ z 2 ∣ ∣ 2 \frac{z_2}{||z_2||_2} ∣∣z2∣∣2z2 It will be regarded as a constant , Only p 1 ∣ ∣ p 1 ∣ ∣ 2 \frac{p_1}{||p_1||_2} ∣∣p1∣∣2p1 There will be gradients , We can see that the collapse solution exists in the solution space .
The author explains the above optimization process , Suppose our loss function is 
F θ ( x ) F_\theta(x) Fθ(x) For neural networks , T ( x ) T(x) T(x) Indicates that the data x Do data enhancement , η x \eta_x ηx It can be regarded as a parameter to be estimated , The parameters to be estimated in the above formula are θ \theta θ、 η x \eta_x ηx,loss The specific optimization process of minimization is similar to Coordinate descent , As shown below 
η t − 1 \eta^{t-1} ηt−1 Express t-1 After secondary optimization , η \eta η Value , θ t \theta^t θt Empathy , First of all, will η t − 1 \eta^{t-1} ηt−1 As a constant , Get θ t \theta^{t} θt, In all θ \theta θ In value , L ( θ t , η t − 1 ) L(\theta^t,\eta^{t-1}) L(θt,ηt−1) The value will be the minimum , The same can be found in η t \eta^t ηt, In fact, it is the coordinate descent method . η t \eta^t ηt The mathematical expression of can be obtained by the following formula
∂ L ( θ , η ) ∂ η = − E T [ 2 ( F θ t ( T ( x ) ) − η x ) ] = 0 \frac{\partial L(\theta,\eta)}{\partial \eta}=-E_T[2(F_{\theta^t}(T(x))-\eta_x)]=0 ∂η∂L(θ,η)=−ET[2(Fθt(T(x))−ηx)]=0
Solution 
adopt Monte Carlo approximation , We can approximate it with a sample 
T ′ ( x ) T'(x) T′(x) Said to x Apply data enhancement , and T ( x ) T(x) T(x) It's the same , This writing is helpful for the subsequent writing of mathematical expressions , Substitute the above formula into formula 7 Available in 
The above formula can be regarded as a picture x x x Apply two data enhancements , obtain T ( x ) 、 T ′ ( x ) T(x)、T'(x) T(x)、T′(x), After neural network processing , Do in feature space L2 distance , Back propagation , F θ t ( T ′ ( x ) ) F_{\theta^t}(T'(x)) Fθt(T′(x)) Look, it becomes a constant . When F θ t ( T ′ ( x ) ) 、 F θ ( T ( x ) ) F_{\theta^t}(T'(x))、F_{\theta}(T(x)) Fθt(T′(x))、Fθ(T(x)) after L2 After normalization , The above formula can be compared with SimSiam Of loss Make an equivalent .
therefore ,SimSiam It can be regarded as an optimization problem with two parameter sets to be evaluated . To test the hypothesis , The author did a set of experiments , As shown below 
k-step Means to store k k k individual F θ t ( T ′ ( x ) ) F_{\theta^t}(T'(x)) Fθt(T′(x)), Think of it as a constant , fitting 11 Medium F θ ( T ( x ) ) F_{\theta}(T(x)) Fθ(T(x)) Conduct k The sub gradient update results in θ t + k \theta^{t+k} θt+k, Similar to optimization 7.0. Then optimize η \eta η, the F θ ( T ( x ) ) F_{\theta}(T(x)) Fθ(T(x)) As a constant , fitting 11 Medium F θ t + k ( T ′ ( x ) ) F_{\theta^{t+k}}(T'(x)) Fθt+k(T′(x)) Gradient update , Similar to optimization 8.0. You can see , The optimization result is very good , Proved the author's hypothesis .
In the above process , I deliberately omitted prediction MLP, Because of formula 10.0 It is the right form 9.0 A rough estimate of , So the author assumes that prediction MLP It makes up for the error caused by rough estimation , It is verified by experiments , No record here .
The algorithm pseudo code is as follows 
experiment
Verification is not recorded here SimSiam Relevant experiments that can avoid collapse , Only record some experimental results that are helpful to practice
SimSiam It is a comparative learning algorithm without negative examples , So it's right batch size The size of is insensitive , As shown below 
besides , The author proves that prediction MLP The role of , As shown below , so prediction MLP about SimSiam It's a huge impact 
besides , The author also explores in prediction MLP and projection MLP Add the output layer of BN Influence , As shown below ,BN Layer pair SimSiam The impact is also so significant ( Over your face ), look Comparative learning is extremely sensitive to some details .
边栏推荐
- Create a virtual environment using pycharm
- 【2】 Redis basic commands and usage scenarios
- 深度学习——MetaFormer Is Actually What You Need for Vision
- The signature of the update package is inconsistent with that of the installed app
- How much does small program development cost? Analysis of two development methods!
- 深度学习(自监督:SimCLR)——A Simple Framework for Contrastive Learning of Visual Representations
- 【一】redis简介
- What are the detailed steps of wechat applet development?
- Idempotent component
- Use Python to encapsulate a tool class that sends mail regularly
猜你喜欢

How much does it cost to make a small program mall? What are the general expenses?

【6】 Redis cache policy

tensorboard可视化

Applet development

深度学习(自监督:SimSiam)——Exploring Simple Siamese Representation Learning

4个角度教你选小程序开发工具?

深度学习(自监督:CPC v2)——Data-Efficient Image Recognition with Contrastive Predictive Coding

Centos7 installing MySQL

小程序商城制作一个需要多少钱?一般包括哪些费用?

【六】redis缓存策略
随机推荐
面试官:让你设计一套图片加载框架,你会怎么设计?
小程序开发要多少钱?两种开发方法分析!
Sales notice: on July 22, the "great heat" will be sold, and the [traditional national wind 24 solar terms] will be sold in summer.
What are the advantages of small program development system? Why choose it?
Svn incoming content cannot be updated, and submission error: svn: e155015: aborting commit: XXX remains in conflict
Structured streaming in spark
Create a virtual environment using pycharm
【一】redis简介
It's not easy to travel. You can use digital collections to brush the sense of existence in scenic spots
Construction of redis master-slave architecture
强化学习——策略学习
Marsnft: how do individuals distribute digital collections?
What is the detail of the applet development process?
Interface anti duplicate submission
Distributed lock redis implementation
深度学习(自监督:MoCo V3):An Empirical Study of Training Self-Supervised Vision Transformers
2: Why read write separation
小程序开发哪家更靠谱呢?
深度学习(自监督:MoCo v2)——Improved Baselines with Momentum Contrastive Learning
uView上传组件upload上传auto-upload模式图片压缩