当前位置:网站首页>Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
2022-07-06 22:37:00 【Rainylt】
paper: https://arxiv.org/pdf/2203.15332.pdf
One sentence summary : Solve the problem that the dominant mode is trained too fast during multimodal training, resulting in insufficient training of auxiliary mode
Cross entropy loss function :
among ,f(x) by 
Decoupling :
among ,a Express audio Modality ,v Express visual Modality ,f(x) by softmax The first two modes are jointly output logits. In this task a Is the dominant mode , namely about gt Category ,a Modal output logits Bigger
With W a W^a Wa For example ,L Yes W a W^a Wa Derivation :
You can see , According to the chain derivation rule , φ a \varphi^a φa Is with the a Modal dependent output , ∂ L ∂ f ( x i ) \frac{\partial{L}}{\partial{f(x_i)}} ∂f(xi)∂L The value of is the same for both modes , Therefore, the impact on Different modes Of Gradient difference Is the latter part , That is to say φ \varphi φ Value . Due to the generally dominant mode output logits Higher , namely φ \varphi φ and W W W It's worth more , Therefore, the gradient of reverse transmission is also larger , Convergence is also faster .
Therefore, the dominant mode may appear. Train first ,loss Lower , Auxiliary mode has not been well trained . Specifically, why can't the auxiliary mode be trained well , To be explored .
For this article , in order to Deceleration dominates modal training , So when we find the gradient, we add Attenuation coefficient , Reduce the gradient of dominant mode backpropagation , It is equivalent to reducing the learning rate of the dominant mode alone :
Use two modes to output respectively logits Of softmax After score Ratio to determine
Make the ratio greater than 1 Of ( Dominant mode ) Set the attenuation factor k(0~1), The auxiliary mode is 1( unchanged )
Multiply with the learning rate , Equivalent to reducing the learning rate 
Besides , according to SGD Gradient back propagation process , The gradient can be pushed to the original gradient + Gaussian noise :
The higher the learning rate => The greater the covariance of Gaussian noise => The stronger the generalization ability . Reducing the learning rate here is equivalent to weakening the generalization ability of the dominant mode . The gradient after adding the attenuation coefficient , The variance is reduced to the original k^2 times :
therefore , This paper artificially adds a Gaussian noise , variance =batch Variance of inner sample :

The covariance equivalent to noise is larger than before :
边栏推荐
- Attack and defense world ditf Misc
- 视图(view)
- 新手程序员该不该背代码?
- BasicVSR_ Plusplus master test videos and pictures
- [leetcode] 19. Delete the penultimate node of the linked list
- Chapter 19 using work queue manager (2)
- (18) LCD1602 experiment
- NPDP certification | how do product managers communicate across functions / teams?
- 【LeetCode】19、 删除链表的倒数第 N 个结点
- 剑指offer刷题记录1
猜你喜欢

二分图判定

Export MySQL table data in pure mode

Adavit -- dynamic network with adaptive selection of computing structure

Aardio - integrate variable values into a string of text through variable names

Clip +json parsing converts the sound in the video into text

Senior soft test (Information System Project Manager) high frequency test site: project quality management

MySQL----初识MySQL

在IPv6中 链路本地地址的优势

网络基础入门理解

Aardio - 通过变量名将变量值整合到一串文本中
随机推荐
Aardio - integrate variable values into a string of text through variable names
NPDP certification | how do product managers communicate across functions / teams?
qt quick项目offscreen模式下崩溃的问题处理
three. JS gorgeous bubble effect
自制J-Flash烧录工具——Qt调用jlinkARM.dll方式
Mise en place d'un environnement de développement OP - tee basé sur qemuv8
Installation and use of labelimg
Export MySQL table data in pure mode
Pit encountered by handwritten ABA
(十八)LCD1602实验
LeetCode 练习——剑指 Offer 26. 树的子结构
What are the specific steps and schedule of IELTS speaking?
Web APIs DOM 时间对象
Windows auzre background operation interface of Microsoft's cloud computing products
基於 QEMUv8 搭建 OP-TEE 開發環境
【雅思口语】安娜口语学习记录part1
机试刷题1
使用云服务器搭建代理
网络基础入门理解
Sword finger offer question brushing record 1