当前位置:网站首页>Improving Multimodal Accuracy Through Modality Pre-training and Attention
Improving Multimodal Accuracy Through Modality Pre-training and Attention
2022-07-06 22:37:00 【Rainylt】
paper:
It is found that the convergence speed of different modes of the multimodal model is inconsistent , So they pre train separately , Reuse attention( Not self-attn) Get the weights of different modes , Multiply by the weight concat->FC->logits
First of all, let's talk about attention. No self-attention That kind of Q*K The mechanism of , It is Put the three modes directly feature concat after , too FC Get the weight :
H There are three modes (v, a, t) Of feature,shape by (3,m). Output three modes The weight
According to the author's observation , Direct training of multimodal models , Of different modes Loss The descent speed is inconsistent ( Convergence rate ):
The three figures are different data sets , The second and third datasets are slightly better , The first data set is text Convergence too fast .
Look at the weights of different modes :
You can see that the first data set is before the pre training text Account for most of the weight , Maybe because it is more important , It may also be because of his feature Better quality . After pre training video Catch up , explain video Before it was just feature It's just not well trained .
Three modes , Who is more important is to As the case may be Of :
All three modes here can show fear
here text and audio Can show surprise , but img No way. , The expression is relatively flat ( It's hard to say )
This example is better , Although he is apologizing , But actually I was laughing , It should be a happy mood , So this should be audio Dominant
What this article puts forward attention Weight and these importance can correspond :
边栏推荐
- 【雅思口语】安娜口语学习记录part1
- 【LeetCode】19、 删除链表的倒数第 N 个结点
- Balanced Multimodal Learning via On-the-fly Gradient Modulation(CVPR2022 oral)
- uniapp滑动到一定的高度后固定某个元素到顶部效果demo(整理)
- signed、unsigned关键字
- Heavyweight news | softing fg-200 has obtained China 3C explosion-proof certification to provide safety assurance for customers' on-site testing
- 柔性数组到底如何使用呢?
- How to confirm the storage mode of the current system by program?
- The ceiling of MySQL tutorial. Collect it and take your time
- 0 basic learning C language - digital tube
猜你喜欢
Aardio - integrate variable values into a string of text through variable names
LeetCode 练习——剑指 Offer 26. 树的子结构
Netxpert xg2 helps you solve the problem of "Cabling installation and maintenance"
【LeetCode】19、 删除链表的倒数第 N 个结点
Mysql database basic operations DML
树的先序中序后序遍历
Crawler obtains real estate data
View
将MySQL的表数据纯净方式导出
Slide the uniapp to a certain height and fix an element to the top effect demo (organize)
随机推荐
Aardio - construct a multi button component with customplus library +plus
CocosCreator+TypeScripts自己写一个对象池
Comparison between variable and "zero value"
That's why you can't understand recursion
Signed and unsigned keywords
金融人士必读书籍系列之六:权益投资(基于cfa考试内容大纲和框架)
NPDP certification | how do product managers communicate across functions / teams?
0 basic learning C language - interrupt
手写ABA遇到的坑
pytorch_ Yolox pruning [with code]
POJ 1094 sorting it all out
Inno setup packaging and signing Guide
config:invalid signature 解决办法和问题排查详解
【无标题】
2022-07-05 use TPCC to conduct sub query test on stonedb
剪映+json解析将视频中的声音转换成文本
LeetCode 练习——剑指 Offer 26. 树的子结构
npm无法安装sharp
软考高级(信息系统项目管理师)高频考点:项目质量管理
HDU 5077 NAND (violent tabulation)