当前位置:网站首页>Reading the paper [sensor enlarged egocentric video captioning with dynamic modal attention]
Reading the paper [sensor enlarged egocentric video captioning with dynamic modal attention]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Summary
- publish :ACMM 2021
- Code :MMAC
- idea: This paper proposes a new video description task , Self centered visual description ( For example, first person perspective 、 Third person perspective ), It can be used for closer visual description . meanwhile , In order to alleviate motion blur caused by equipment and other reasons 、 Occlusion and so on , An auxiliary tool for visual description using sensors .
In network design , There are mainly two modules :AMMT Modules are used to merge visual features h v h_v hv And sensor characteristics h s h_s hs Get merged features h V + S h_{V+S} hV+S, Then these three characteristics ( h v , h s , h V + S h_v, h_s, h_{V+S} hv,hs,hV+S) Input to DMA Selective attention learning in the module . Then input GRU In the middle of word Generate
Detailed design
1. feature extraction
- Visual features h V h_V hV:Vgg16
- Sensor characteristics h S h_S hS:LSTM( sequential )
2. Asymmetric Multi-modal Transformation(AMMT)
In essence, it is feature merging
Source :FiLM: Visual Reasoning with a General Conditioning Layer, Knowledge point reference feature-wise linear modulation
ps: initialization W c = I , b c = 0 W_c=I, b_c=0 Wc=I,bc=0, Is initialized to concate, With the deepening of training , Learn the merging characteristics of the two
Note that the output features here are three kinds of features :
(1) Visual features h V h_V hV
(2) Sensor characteristics h S h_S hS
(3) Merged features h V + S h_{V+S} hV+S
- Some use asymmetric explanations
On the one hand, it can alleviate the over fitting caused by data redundancy ; On the other hand , Sensor data sometimes contains unwanted noise , Therefore, it needs to be adjusted .
3. Dynamic Modal Attention (DMA)
Dynamically select attention for three features
It's used here Gumbel Softmax
ps: Reasons for using three features : Because in many cases , It is desirable to use only a single mode ( for example , Sensor data containing unwanted noise ).
边栏推荐
- Wonderful express | Tencent cloud database June issue
- 删除文件时提示‘源文件名长度大于系统支持的长度’无法删除解决办法
- [Oracle] simple date and time formatting and sorting problem
- Pytest testing framework -- data driven
- Use Zhiyun reader to translate statistical genetics books
- K6EL-100漏电继电器
- [论文阅读] Semi-supervised Left Atrium Segmentation with Mutual Consistency Training
- NPDP产品经理认证,到底是何方神圣?
- How can project managers counter attack with NPDP certificates? Look here
- Mapbox Chinese map address
猜你喜欢
[PM products] what is cognitive load? How to adjust cognitive load reasonably?
A cool "ghost" console tool
Leakage relay jd1-100
Initial experience of annotation
5. 数据访问 - EntityFramework集成
《5》 Table
EGR-20USCM接地故障继电器
JHOK-ZBL1漏电继电器
The year of the tiger is coming. Come and make a wish. I heard that the wish will come true
人体传感器好不好用?怎么用?Aqara绿米、小米之间到底买哪个
随机推荐
In memory, I moved from CSDN to blog park!
Leetcode (417) -- Pacific Atlantic current problem
TabLayout修改自定义的Tab标题不生效问题
Jhok-zbg2 leakage relay
做自媒体视频剪辑,专业的人会怎么寻找背景音乐素材?
Life experience of an update statement
The navigation bar changes colors according to the route
Batch size setting skills
Design, configuration and points for attention of network specified source multicast (SSM) simulation using OPNET
Use, configuration and points for attention of network layer protocol (taking QoS as an example) when using OPNET for network simulation
淘寶商品詳情頁API接口、淘寶商品列錶API接口,淘寶商品銷量API接口,淘寶APP詳情API接口,淘寶詳情API接口
京东商品详情页API接口、京东商品销量API接口、京东商品列表API接口、京东APP详情API接口、京东详情API接口,京东SKU信息接口
Pytest testing framework -- data driven
How Alibaba cloud's DPCA architecture works | popular science diagram
Jhok-zbl1 leakage relay
删除文件时提示‘源文件名长度大于系统支持的长度’无法删除解决办法
sql优化常用技巧及理解
MySQL数据库学习(7) -- pymysql简单介绍
EGR-20USCM接地故障继电器
《5》 Table