当前位置:网站首页>Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
2022-07-06 22:37:00 【Rainylt】
paper: https://arxiv.org/pdf/2203.15332.pdf
One sentence summary : Solve the problem that the dominant mode is trained too fast during multimodal training, resulting in insufficient training of auxiliary mode
Cross entropy loss function :
among ,f(x) by
Decoupling :
among ,a Express audio Modality ,v Express visual Modality ,f(x) by softmax The first two modes are jointly output logits. In this task a Is the dominant mode , namely about gt Category ,a Modal output logits Bigger
With W a W^a Wa For example ,L Yes W a W^a Wa Derivation :
You can see , According to the chain derivation rule , φ a \varphi^a φa Is with the a Modal dependent output , ∂ L ∂ f ( x i ) \frac{\partial{L}}{\partial{f(x_i)}} ∂f(xi)∂L The value of is the same for both modes , Therefore, the impact on Different modes Of Gradient difference Is the latter part , That is to say φ \varphi φ Value . Due to the generally dominant mode output logits Higher , namely φ \varphi φ and W W W It's worth more , Therefore, the gradient of reverse transmission is also larger , Convergence is also faster .
Therefore, the dominant mode may appear. Train first ,loss Lower , Auxiliary mode has not been well trained . Specifically, why can't the auxiliary mode be trained well , To be explored .
For this article , in order to Deceleration dominates modal training , So when we find the gradient, we add Attenuation coefficient , Reduce the gradient of dominant mode backpropagation , It is equivalent to reducing the learning rate of the dominant mode alone :
Use two modes to output respectively logits Of softmax After score Ratio to determine
Make the ratio greater than 1 Of ( Dominant mode ) Set the attenuation factor k(0~1), The auxiliary mode is 1( unchanged )
Multiply with the learning rate , Equivalent to reducing the learning rate
Besides , according to SGD Gradient back propagation process , The gradient can be pushed to the original gradient + Gaussian noise :
The higher the learning rate => The greater the covariance of Gaussian noise => The stronger the generalization ability . Reducing the learning rate here is equivalent to weakening the generalization ability of the dominant mode . The gradient after adding the attenuation coefficient , The variance is reduced to the original k^2 times :
therefore , This paper artificially adds a Gaussian noise , variance =batch Variance of inner sample :
The covariance equivalent to noise is larger than before :
边栏推荐
- Leetcode exercise - Sword finger offer 26 Substructure of tree
- Build op-tee development environment based on qemuv8
- return 关键字
- 如何用程序确认当前系统的存储模式?
- Return keyword
- Mise en place d'un environnement de développement OP - tee basé sur qemuv8
- Volatile keyword
- 关于声子和热输运计算中BORN电荷和non-analytic修正的问题
- 2014 Alibaba web pre intern project analysis (1)
- 2022-07-05 stonedb sub query processing parsing time analysis
猜你喜欢
Web APIs DOM 时间对象
Mysql database basic operations DML
uniapp滑动到一定的高度后固定某个元素到顶部效果demo(整理)
Slide the uniapp to a certain height and fix an element to the top effect demo (organize)
【编译原理】做了一半的LR(0)分析器
Aardio - 封装库时批量处理属性与回调函数的方法
[leetcode] 19. Delete the penultimate node of the linked list
Installation and use of labelimg
Attack and defense world ditf Misc
Clip +json parsing converts the sound in the video into text
随机推荐
POJ 1094 sorting it all out
Typescript get function parameter type
Netxpert xg2 helps you solve the problem of "Cabling installation and maintenance"
Inno setup packaging and signing Guide
Config:invalid signature solution and troubleshooting details
MySQL约束的分类、作用及用法
2014 Alibaba web pre intern project analysis (1)
Aardio - does not declare the method of directly passing float values
HDU 5077 NAND (violent tabulation)
Traversal of a tree in first order, middle order, and then order
基于 QEMUv8 搭建 OP-TEE 开发环境
2022-07-04 mysql的高性能数据库引擎stonedb在centos7.9编译及运行
View
Aardio - Method of batch processing attributes and callback functions when encapsulating Libraries
UVa 11732 – strcmp() Anyone?
机试刷题1
Mise en place d'un environnement de développement OP - tee basé sur qemuv8
Aardio - integrate variable values into a string of text through variable names
Windows Auzre 微软的云计算产品的后台操作界面
Senior soft test (Information System Project Manager) high frequency test site: project quality management