当前位置:网站首页>Reading the paper [sensor enlarged egocentric video captioning with dynamic modal attention]
Reading the paper [sensor enlarged egocentric video captioning with dynamic modal attention]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Summary
- publish :ACMM 2021
- Code :MMAC
- idea: This paper proposes a new video description task , Self centered visual description ( For example, first person perspective 、 Third person perspective ), It can be used for closer visual description . meanwhile , In order to alleviate motion blur caused by equipment and other reasons 、 Occlusion and so on , An auxiliary tool for visual description using sensors .
In network design , There are mainly two modules :AMMT Modules are used to merge visual features h v h_v hv And sensor characteristics h s h_s hs Get merged features h V + S h_{V+S} hV+S, Then these three characteristics ( h v , h s , h V + S h_v, h_s, h_{V+S} hv,hs,hV+S) Input to DMA Selective attention learning in the module . Then input GRU In the middle of word Generate
Detailed design
1. feature extraction
- Visual features h V h_V hV:Vgg16
- Sensor characteristics h S h_S hS:LSTM( sequential )
2. Asymmetric Multi-modal Transformation(AMMT)
In essence, it is feature merging
Source :FiLM: Visual Reasoning with a General Conditioning Layer, Knowledge point reference feature-wise linear modulation
ps: initialization W c = I , b c = 0 W_c=I, b_c=0 Wc=I,bc=0, Is initialized to concate, With the deepening of training , Learn the merging characteristics of the two
Note that the output features here are three kinds of features :
(1) Visual features h V h_V hV
(2) Sensor characteristics h S h_S hS
(3) Merged features h V + S h_{V+S} hV+S
- Some use asymmetric explanations
On the one hand, it can alleviate the over fitting caused by data redundancy ; On the other hand , Sensor data sometimes contains unwanted noise , Therefore, it needs to be adjusted .
3. Dynamic Modal Attention (DMA)
Dynamically select attention for three features 


It's used here Gumbel Softmax
ps: Reasons for using three features : Because in many cases , It is desirable to use only a single mode ( for example , Sensor data containing unwanted noise ).
边栏推荐
- 做自媒体,有哪些免费下载视频剪辑素材的网站?
- 利用OPNET进行网络单播(一服务器多客户端)仿真的设计、配置及注意点
- sql优化常用技巧及理解
- If you want to choose some departments to give priority to OKR, how should you choose pilot departments?
- app clear data源码追踪
- 阿里云的神龙架构是怎么工作的 | 科普图解
- 利用OPNET进行网络仿真时网络层协议(以QoS为例)的使用、配置及注意点
- Jhok-zbl1 leakage relay
- Batch size setting skills
- Preliminary practice of niuke.com (9)
猜你喜欢

JHOK-ZBL1漏电继电器

Use, configuration and points for attention of network layer protocol (taking QoS as an example) when using OPNET for network simulation

MySQL数据库学习(8) -- mysql 内容补充

【js组件】date日期显示。

《5》 Table

不同网段之间实现GDB远程调试功能

Egr-20uscm ground fault relay

JVM(十九) -- 字节码与类的加载(四) -- 再谈类的加载器

Leakage relay llj-100fs

Annotation初体验
随机推荐
[JS component] date display.
1. AVL tree: left-right rotation -bite
Safe landing practice of software supply chain under salesforce containerized ISV scenario
Jhok-zbg2 leakage relay
5. Data access - entityframework integration
消息队列:如何确保消息不会丢失
《2》 Label
Dbsync adds support for mongodb and ES
高压漏电继电器BLD-20
Under the trend of Micah, orebo and apple homekit, how does zhiting stand out?
论文阅读【Open-book Video Captioning with Retrieve-Copy-Generate Network】
ThinkPHP Association preload with
When deleting a file, the prompt "the length of the source file name is greater than the length supported by the system" cannot be deleted. Solution
CentOS 7.9 installing Oracle 21C Adventures
分布式事务解决方案之2PC
Scheduledexecutorservice timer
Codeforces Round #416 (Div. 2) D. Vladik and Favorite Game
一条 update 语句的生命经历
分布式事务介绍
利用OPNET进行网络单播(一服务器多客户端)仿真的设计、配置及注意点