当前位置:网站首页>Reading the paper [sensor enlarged egocentric video captioning with dynamic modal attention]
Reading the paper [sensor enlarged egocentric video captioning with dynamic modal attention]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Summary
- publish :ACMM 2021
- Code :MMAC
- idea: This paper proposes a new video description task , Self centered visual description ( For example, first person perspective 、 Third person perspective ), It can be used for closer visual description . meanwhile , In order to alleviate motion blur caused by equipment and other reasons 、 Occlusion and so on , An auxiliary tool for visual description using sensors .
In network design , There are mainly two modules :AMMT Modules are used to merge visual features h v h_v hv And sensor characteristics h s h_s hs Get merged features h V + S h_{V+S} hV+S, Then these three characteristics ( h v , h s , h V + S h_v, h_s, h_{V+S} hv,hs,hV+S) Input to DMA Selective attention learning in the module . Then input GRU In the middle of word Generate
Detailed design
1. feature extraction
- Visual features h V h_V hV:Vgg16
- Sensor characteristics h S h_S hS:LSTM( sequential )
2. Asymmetric Multi-modal Transformation(AMMT)
In essence, it is feature merging
Source :FiLM: Visual Reasoning with a General Conditioning Layer, Knowledge point reference feature-wise linear modulation
ps: initialization W c = I , b c = 0 W_c=I, b_c=0 Wc=I,bc=0, Is initialized to concate, With the deepening of training , Learn the merging characteristics of the two
Note that the output features here are three kinds of features :
(1) Visual features h V h_V hV
(2) Sensor characteristics h S h_S hS
(3) Merged features h V + S h_{V+S} hV+S
- Some use asymmetric explanations
On the one hand, it can alleviate the over fitting caused by data redundancy ; On the other hand , Sensor data sometimes contains unwanted noise , Therefore, it needs to be adjusted .
3. Dynamic Modal Attention (DMA)
Dynamically select attention for three features
It's used here Gumbel Softmax
ps: Reasons for using three features : Because in many cases , It is desirable to use only a single mode ( for example , Sensor data containing unwanted noise ).
边栏推荐
- [论文阅读] Semi-supervised Left Atrium Segmentation with Mutual Consistency Training
- CVE-2021-3156 漏洞复现笔记
- 论文阅读【MM21 Pre-training for Video Understanding Challenge:Video Captioning with Pretraining Techniqu】
- Phenomenon analysis when Autowired annotation is used for list
- 5. Data access - entityframework integration
- 京东商品详情页API接口、京东商品销量API接口、京东商品列表API接口、京东APP详情API接口、京东详情API接口,京东SKU信息接口
- Wonderful express | Tencent cloud database June issue
- 在米家、欧瑞博、苹果HomeKit趋势下,智汀如何从中脱颖而出?
- Writing process of the first paper
- When deleting a file, the prompt "the length of the source file name is greater than the length supported by the system" cannot be deleted. Solution
猜你喜欢
[question] Compilation Principle
Mapbox Chinese map address
Intelligent annotation scheme of entity recognition based on hugging Face Pre training model: generate doccano request JSON format
Digital innovation driven guide
Initial experience of annotation
漏电继电器LLJ-100FS
论文阅读【Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention】
Preliminary practice of niuke.com (9)
4. Object mapping Mapster
[PM products] what is cognitive load? How to adjust cognitive load reasonably?
随机推荐
JVM (XX) -- performance monitoring and tuning (I) -- Overview
NPDP产品经理认证,到底是何方神圣?
JD commodity details page API interface, JD commodity sales API interface, JD commodity list API interface, JD app details API interface, JD details API interface, JD SKU information interface
5阶多项式轨迹
利用OPNET进行网络单播(一服务器多客户端)仿真的设计、配置及注意点
Initial experience of annotation
利用OPNET进行网络仿真时网络层协议(以QoS为例)的使用、配置及注意点
Most commonly used high number formula
Egr-20uscm ground fault relay
在米家、欧瑞博、苹果HomeKit趋势下,智汀如何从中脱颖而出?
How can project managers counter attack with NPDP certificates? Look here
Flink SQL 实现读写redis,并动态生成Hset key
Is the human body sensor easy to use? How to use it? Which do you buy between aqara green rice and Xiaomi
京东商品详情页API接口、京东商品销量API接口、京东商品列表API接口、京东APP详情API接口、京东详情API接口,京东SKU信息接口
Taobao Commodity details page API interface, Taobao Commodity List API interface, Taobao Commodity sales API interface, Taobao app details API interface, Taobao details API interface
Leakage relay jelr-250fg
消息队列:消息积压如何处理?
消息队列:重复消息如何处理?
一条 update 语句的生命经历
Mysql database learning (8) -- MySQL content supplement