当前位置:网站首页>Improving Multimodal Accuracy Through Modality Pre-training and Attention
Improving Multimodal Accuracy Through Modality Pre-training and Attention
2022-07-06 22:37:00 【Rainylt】
paper:
It is found that the convergence speed of different modes of the multimodal model is inconsistent , So they pre train separately , Reuse attention( Not self-attn) Get the weights of different modes , Multiply by the weight concat->FC->logits
First of all, let's talk about attention. No self-attention That kind of Q*K The mechanism of , It is Put the three modes directly feature concat after , too FC Get the weight :
H There are three modes (v, a, t) Of feature,shape by (3,m). Output three modes The weight
According to the author's observation , Direct training of multimodal models , Of different modes Loss The descent speed is inconsistent ( Convergence rate ):
The three figures are different data sets , The second and third datasets are slightly better , The first data set is text Convergence too fast .
Look at the weights of different modes :
You can see that the first data set is before the pre training text Account for most of the weight , Maybe because it is more important , It may also be because of his feature Better quality . After pre training video Catch up , explain video Before it was just feature It's just not well trained .
Three modes , Who is more important is to As the case may be Of :
All three modes here can show fear
here text and audio Can show surprise , but img No way. , The expression is relatively flat ( It's hard to say )
This example is better , Although he is apologizing , But actually I was laughing , It should be a happy mood , So this should be audio Dominant
What this article puts forward attention Weight and these importance can correspond :
边栏推荐
- 金融人士必读书籍系列之六:权益投资(基于cfa考试内容大纲和框架)
- Use ECs to set up an agent
- Uniapp setting background image effect demo (sorting)
- The SQL response is slow. What are your troubleshooting ideas?
- 雅思口语的具体步骤和时间安排是什么样的?
- signed、unsigned关键字
- pytorch_YOLOX剪枝【附代码】
- Mise en place d'un environnement de développement OP - tee basé sur qemuv8
- Comparison between variable and "zero value"
- Aardio - 不声明直接传float数值的方法
猜你喜欢
二分图判定
软考高级(信息系统项目管理师)高频考点:项目质量管理
0 basic learning C language - digital tube
leetcode:面试题 17.24. 子矩阵最大累加和(待研究)
Signed and unsigned keywords
Aardio - construct a multi button component with customplus library +plus
Aardio - Method of batch processing attributes and callback functions when encapsulating Libraries
【LeetCode】19、 删除链表的倒数第 N 个结点
AdaViT——自适应选择计算结构的动态网络
Web APIs DOM time object
随机推荐
云原生技术--- 容器知识点
手写ABA遇到的坑
uniapp滑动到一定的高度后固定某个元素到顶部效果demo(整理)
雅思口语的具体步骤和时间安排是什么样的?
signed、unsigned关键字
Rust knowledge mind map XMIND
That's why you can't understand recursion
Aardio - 利用customPlus库+plus构造一个多按钮组件
rust知识思维导图xmind
Spatial domain and frequency domain image compression of images
UVa 11732 – strcmp() Anyone?
做国外LEAD2022年下半年几点建议
pytorch_ Yolox pruning [with code]
柔性数组到底如何使用呢?
软考高级(信息系统项目管理师)高频考点:项目质量管理
MySQL ---- first acquaintance with MySQL
Installation and use of labelimg
What are the specific steps and schedule of IELTS speaking?
Aardio - construct a multi button component with customplus library +plus
Config:invalid signature solution and troubleshooting details