当前位置:网站首页>Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
2022-07-06 22:37:00 【Rainylt】
paper: https://arxiv.org/pdf/2203.15332.pdf
One sentence summary : Solve the problem that the dominant mode is trained too fast during multimodal training, resulting in insufficient training of auxiliary mode
Cross entropy loss function :
among ,f(x) by
Decoupling :
among ,a Express audio Modality ,v Express visual Modality ,f(x) by softmax The first two modes are jointly output logits. In this task a Is the dominant mode , namely about gt Category ,a Modal output logits Bigger
With W a W^a Wa For example ,L Yes W a W^a Wa Derivation :
You can see , According to the chain derivation rule , φ a \varphi^a φa Is with the a Modal dependent output , ∂ L ∂ f ( x i ) \frac{\partial{L}}{\partial{f(x_i)}} ∂f(xi)∂L The value of is the same for both modes , Therefore, the impact on Different modes Of Gradient difference Is the latter part , That is to say φ \varphi φ Value . Due to the generally dominant mode output logits Higher , namely φ \varphi φ and W W W It's worth more , Therefore, the gradient of reverse transmission is also larger , Convergence is also faster .
Therefore, the dominant mode may appear. Train first ,loss Lower , Auxiliary mode has not been well trained . Specifically, why can't the auxiliary mode be trained well , To be explored .
For this article , in order to Deceleration dominates modal training , So when we find the gradient, we add Attenuation coefficient , Reduce the gradient of dominant mode backpropagation , It is equivalent to reducing the learning rate of the dominant mode alone :
Use two modes to output respectively logits Of softmax After score Ratio to determine
Make the ratio greater than 1 Of ( Dominant mode ) Set the attenuation factor k(0~1), The auxiliary mode is 1( unchanged )
Multiply with the learning rate , Equivalent to reducing the learning rate
Besides , according to SGD Gradient back propagation process , The gradient can be pushed to the original gradient + Gaussian noise :
The higher the learning rate => The greater the covariance of Gaussian noise => The stronger the generalization ability . Reducing the learning rate here is equivalent to weakening the generalization ability of the dominant mode . The gradient after adding the attenuation coefficient , The variance is reduced to the original k^2 times :
therefore , This paper artificially adds a Gaussian noise , variance =batch Variance of inner sample :
The covariance equivalent to noise is larger than before :
边栏推荐
- Inno setup packaging and signing Guide
- Export MySQL table data in pure mode
- 2022-07-04 the high-performance database engine stonedb of MySQL is compiled and run in centos7.9
- Aardio - 利用customPlus库+plus构造一个多按钮组件
- HDU 5077 NAND (violent tabulation)
- 0 basic learning C language - digital tube
- three.js绚烂的气泡效果
- Void keyword
- 2022-07-05 stonedb的子查询处理解析耗时分析
- OpenCV VideoCapture. Get() parameter details
猜你喜欢
MySQL数据库基本操作-DML
LeetCode 练习——剑指 Offer 26. 树的子结构
2022-07-04 mysql的高性能数据库引擎stonedb在centos7.9编译及运行
剑指offer刷题记录1
leetcode:面试题 17.24. 子矩阵最大累加和(待研究)
Attack and defense world ditf Misc
【编译原理】做了一半的LR(0)分析器
自定义 swap 函数
Heavyweight news | softing fg-200 has obtained China 3C explosion-proof certification to provide safety assurance for customers' on-site testing
Aardio - 封装库时批量处理属性与回调函数的方法
随机推荐
volatile关键字
How do I write Flask's excellent debug log message to a file in production?
软考高级(信息系统项目管理师)高频考点:项目质量管理
HDU 5077 NAND (violent tabulation)
Improving Multimodal Accuracy Through Modality Pre-training and Attention
OpenCV VideoCapture. Get() parameter details
GD32F4XX串口接收中断和闲时中断配置
Aardio - 封装库时批量处理属性与回调函数的方法
UDP编程
2014阿里巴巴web前实习生项目分析(1)
General implementation and encapsulation of go diversified timing tasks
const关键字
二分图判定
Aardio - 利用customPlus库+plus构造一个多按钮组件
Sizeof keyword
网络基础入门理解
MySQL数据库基本操作-DML
Aardio - integrate variable values into a string of text through variable names
Unity3d minigame unity webgl transform plug-in converts wechat games to use dlopen, you need to use embedded 's problem
2022-07-05 stonedb的子查询处理解析耗时分析