当前位置:网站首页>Improving Multimodal Accuracy Through Modality Pre-training and Attention

Improving Multimodal Accuracy Through Modality Pre-training and Attention

2022-07-06 22:37:00 Rainylt

paper:


It is found that the convergence speed of different modes of the multimodal model is inconsistent , So they pre train separately , Reuse attention( Not self-attn) Get the weights of different modes , Multiply by the weight concat->FC->logits

First of all, let's talk about attention. No self-attention That kind of Q*K The mechanism of , It is Put the three modes directly feature concat after , too FC Get the weight
 Insert picture description here
H There are three modes (v, a, t) Of feature,shape by (3,m). Output three modes The weight
According to the author's observation , Direct training of multimodal models , Of different modes Loss The descent speed is inconsistent ( Convergence rate ):
 Insert picture description here
The three figures are different data sets , The second and third datasets are slightly better , The first data set is text Convergence too fast .
Look at the weights of different modes :
 Insert picture description here
You can see that the first data set is before the pre training text Account for most of the weight , Maybe because it is more important , It may also be because of his feature Better quality . After pre training video Catch up , explain video Before it was just feature It's just not well trained .


Three modes , Who is more important is to As the case may be Of :
 Insert picture description here
All three modes here can show fear
 Insert picture description here
here text and audio Can show surprise , but img No way. , The expression is relatively flat ( It's hard to say )
 Insert picture description here
This example is better , Although he is apologizing , But actually I was laughing , It should be a happy mood , So this should be audio Dominant
What this article puts forward attention Weight and these importance can correspond :
 Insert picture description here

原网站

版权声明
本文为[Rainylt]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207061453416245.html