当前位置:网站首页>Improving Multimodal Accuracy Through Modality Pre-training and Attention
Improving Multimodal Accuracy Through Modality Pre-training and Attention
2022-07-06 22:37:00 【Rainylt】
paper:
It is found that the convergence speed of different modes of the multimodal model is inconsistent , So they pre train separately , Reuse attention( Not self-attn) Get the weights of different modes , Multiply by the weight concat->FC->logits
First of all, let's talk about attention. No self-attention That kind of Q*K The mechanism of , It is Put the three modes directly feature concat after , too FC Get the weight :
H There are three modes (v, a, t) Of feature,shape by (3,m). Output three modes The weight
According to the author's observation , Direct training of multimodal models , Of different modes Loss The descent speed is inconsistent ( Convergence rate ):
The three figures are different data sets , The second and third datasets are slightly better , The first data set is text Convergence too fast .
Look at the weights of different modes :
You can see that the first data set is before the pre training text Account for most of the weight , Maybe because it is more important , It may also be because of his feature Better quality . After pre training video Catch up , explain video Before it was just feature It's just not well trained .
Three modes , Who is more important is to As the case may be Of :
All three modes here can show fear 
here text and audio Can show surprise , but img No way. , The expression is relatively flat ( It's hard to say )
This example is better , Although he is apologizing , But actually I was laughing , It should be a happy mood , So this should be audio Dominant
What this article puts forward attention Weight and these importance can correspond :
边栏推荐
- Export MySQL table data in pure mode
- npm无法安装sharp
- Aardio - 不声明直接传float数值的方法
- Config:invalid signature solution and troubleshooting details
- 自制J-Flash烧录工具——Qt调用jlinkARM.dll方式
- CocosCreator+TypeScripts自己写一个对象池
- rust知识思维导图xmind
- 2022-07-05 使用tpcc对stonedb进行子查询测试
- 将MySQL的表数据纯净方式导出
- Typescript get function parameter type
猜你喜欢

Rust knowledge mind map XMIND

自定义 swap 函数

Installation and use of labelimg

Improving Multimodal Accuracy Through Modality Pre-training and Attention

Adavit -- dynamic network with adaptive selection of computing structure

pytorch_YOLOX剪枝【附代码】

Machine test question 1

Web APIs DOM 时间对象

Mise en place d'un environnement de développement OP - tee basé sur qemuv8

机试刷题1
随机推荐
void关键字
Const keyword
[leetcode] 19. Delete the penultimate node of the linked list
Chapter 19 using work queue manager (2)
How to use flexible arrays?
memcached
【无标题】
Web APIs DOM 时间对象
二分图判定
Self made j-flash burning tool -- QT calls jlinkarm DLL mode
Mise en place d'un environnement de développement OP - tee basé sur qemuv8
On the problems of born charge and non analytical correction in phonon and heat transport calculations
Leetcode: interview question 17.24 Maximum cumulative sum of submatrix (to be studied)
(十八)LCD1602实验
Extern keyword
Is there any requirement for the value after the case keyword?
QT信号和槽
关于声子和热输运计算中BORN电荷和non-analytic修正的问题
变量与“零值”的比较
Plafond du tutoriel MySQL, bien collecté, regardez lentement