当前位置:网站首页>[comparative learning] understanding the behavior of contractual loss (CVPR '21)
[comparative learning] understanding the behavior of contractual loss (CVPR '21)
2022-07-25 12:00:00 【chad_ lee】
Understanding the Behaviour of Contrastive Loss (CVPR’21)
Contrastive Loss Temperature coefficient in τ \tau τ Is a key parameter , Most papers put τ \tau τ Set to a small number , This article starts with the analysis of temperature parameters τ \tau τ set out , Analysis shows that :
- Contrast loss can actually automatically mine hard negative samples , Therefore, we can learn high-quality self-monitoring representations . In particular , For negative samples that have been far away , There is no need to keep it away ; Mainly focus on negative samples that are not far away ( Hard negative sample ), Thus, the representation space is more uniform ( It's similar to the red circle chart below ).
- temperature coefficient τ \tau τ The degree of mining negative samples can be controlled , τ \tau τ The smaller the sample, the more attention is paid to the difficult negative sample .
Hardness-Awareness
The widely used comparison loss function is InfoNCE:
L ( x i ) = − log [ exp ( s i , i / τ ) ∑ k ≠ i exp ( s i , k / τ ) + exp ( s i , i / τ ) ] \mathcal{L}\left(x_{i}\right)=-\log \left[\frac{\exp \left(s_{i, i} / \tau\right)}{\sum_{k \neq i} \exp \left(s_{i, k} / \tau\right)+\exp \left(s_{i, i} / \tau\right)}\right] L(xi)=−log[∑k=iexp(si,k/τ)+exp(si,i/τ)exp(si,i/τ)]
This loss function requires the second i Samples and it's another amplified ( just ) Similarity between samples s i , i s_{i,i} si,i As big as possible , And with other examples ( Negative sample ) The similarity between s i , k s_{i,k} si,k As small as possible . But there are many loss functions that satisfy this condition , For example, the simplest function L simple \mathcal{L}_{\text {simple }} Lsimple :
L simple ( x i ) = − s i , i + λ ∑ i ≠ j s i , j \mathcal{L}_{\text {simple }}\left(x_{i}\right)=-s_{i, i}+\lambda \sum_{i \neq j} s_{i, j} Lsimple (xi)=−si,i+λi=j∑si,j
But the training effect of these two loss functions is much worse :
| Data sets | Contrastive Loss | Simple Loss |
|---|---|---|
| CIFAR-10 | 79.75 | 74 |
| CIFAR-100 | 51.82 | 49 |
| ImageNet-100 | 71.53 | 74.31 |
| SVHN | 92.55 | 94.99 |
This is because Simple Loss The same weight penalty is given to all negative sample similarity : ∂ L simple ∂ s i , k = λ \frac{\partial L_{\text {simple }}}{\partial s_{i, k}}=\lambda ∂si,k∂Lsimple =λ, That is, the gradient of the similarity of the loss function to all negative samples is equal . But in Contrastive Loss in , It will automatically give higher penalties to negative samples with higher similarity :
The gradient of the positive sample : ∂ L ( x i ) ∂ s i , i = − 1 τ ∑ k ≠ i P i , k The gradient of negative samples : ∂ L ( x i ) ∂ s i , j = 1 τ P i , j ∝ s i , j \text { The gradient of the positive sample : } \frac{\partial \mathcal{L}\left(x_{i}\right)}{\partial s_{i, i}}=-\frac{1}{\tau} \sum_{k \neq i} P_{i, k} \\ \text { The gradient of negative samples : } \frac{\partial \mathcal{L}\left(x_{i}\right)}{\partial s_{i, j}}=\frac{1}{\tau} P_{i, j} \propto s_{i, j} The gradient of the positive sample : ∂si,i∂L(xi)=−τ1k=i∑Pi,k The gradient of negative samples : ∂si,j∂L(xi)=τ1Pi,j∝si,j
among P i , j = exp ( s i , j / τ ) ∑ k ≠ i exp ( s i , k / τ ) + exp ( s i , i / τ ) P_{i, j}=\frac{\exp \left(s_{i, j /} \tau\right)}{\sum_{k \neq i} \exp \left(s_{i, k} / \tau\right)+\exp \left(s_{i, i} / \tau\right)} Pi,j=∑k=iexp(si,k/τ)+exp(si,i/τ)exp(si,j/τ), For all negative samples , P i , j P_{i, j} Pi,j The denominator of is the same , therefore s i , j s_{i, j} si,j The bigger it is , The gradient term of negative samples is also larger , This gives the negative sample a greater gradient away from the sample .( It can be understood as focal loss, The harder it is, the greater the gradient ). Thus, all samples are encouraged to be evenly distributed on a hypersphere .
To verify the truth Contrastive Loss It's really because we can mine the characteristics of difficult negative samples , The article shows that some additional difficult samples are selected for Simple Loss On ( Select for each sample 4096 A hard negative sample ), Improved performance :
| Data sets | Contrastive Loss | Simple Loss + Hard |
|---|---|---|
| CIFAR-10 | 79.75 | 84.84 |
| CIFAR-100 | 51.82 | 55.71 |
| ImageNet-100 | 71.53 | 74.31 |
| SVHN | 92.55 | 94.99 |
temperature coefficient τ \tau τ Degree of control
temperature coefficient τ \tau τ The smaller it is , The loss function pays more attention to hard negative samples , Specially :
When τ \tau τ Tend to be 0 when ,Contrastive Loss Degenerate into focusing only on the hardest samples :
lim τ → 0 + 1 τ max [ s max − s i , i , 0 ] \lim _{\tau \rightarrow 0^{+}} \frac{1}{\tau} \max \left[s_{\max }-s_{i, i}, 0\right] τ→0+limτ1max[smax−si,i,0]
This means that One by one Push each negative sample to the same distance from yourself :

When τ \tau τ Approaching infinity ,Contrastive Loss Almost degenerate into Simple Loss, The weight is the same for all negative samples .
So the temperature coefficient τ \tau τ The smaller it is , The more uniform the distribution of sample characteristics , But this is not a good thing , Because the potential positive sample (False Negative) Also pushed away :

边栏推荐
- 微信公众号开发 入手
- brpc源码解析(五)—— 基础类resource pool详解
- JS数据类型以及相互转换
- 【高并发】我用10张图总结出了这份并发编程最佳学习路线!!(建议收藏)
- W5500 multi node connection
- 油猴脚本链接
- Flinksql client connection Kafka select * from table has no data error, how to solve it?
- JVM performance tuning methods
- Teach you how to configure S2E to UDP working mode through MCU
- Solved files' name is invalid or doors not exist (1205)
猜你喜欢
随机推荐
Similarity matrix, diagonalization condition
奉劝那些刚参加工作的学弟学妹们:要想进大厂,这些并发编程知识是你必须要掌握的!完整学习路线!!(建议收藏)
[high concurrency] a lock faster than read-write lock in high concurrency scenarios. I'm completely convinced after reading it!! (recommended Collection)
【高并发】SimpleDateFormat类到底为啥不是线程安全的?(附六种解决方案,建议收藏)
How to solve the problem that "w5500 chip cannot connect to the server immediately after power failure and restart in tcp_client mode"
JS常用内置对象 数据类型的分类 传参 堆栈
软件缺陷的管理
I advise those students who have just joined the work: if you want to enter the big factory, you must master these concurrent programming knowledge! Complete learning route!! (recommended Collection)
The applet image cannot display Base64 pictures. The solution is valid
Chapter 4 linear equations
什么是全局事件总线?
【RS采样】A Gain-Tuning Dynamic Negative Sampler for Recommendation (WWW 2022)
brpc源码解析(一)—— rpc服务添加以及服务器启动主要过程
winddows 计划任务执行bat 执行PHP文件 失败的解决办法
【对比学习】Understanding the Behaviour of Contrastive Loss (CVPR‘21)
dirReader. Readentries compatibility issues. Exception error domexception
[MySQL learning 09]
一文入门Redis
JS数据类型以及相互转换
W5500通过上位机控制实现调节LED灯带的亮度

![[MySQL learning 08]](/img/9e/6e5f0c4c956ca8dc31d82560262013.png)






