当前位置:网站首页>论文阅读【Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention】
论文阅读【Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention】
2022-07-06 23:35:00 【hei_hei_hei_】
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
概要
- 发表:ACMM 2021
- 代码:MMAC
- idea:本文提出了一种新的视频描述任务,以自我为中心的视觉描述(例如第一人称视角、第三人称视角),可以用于更近距离的视觉描述。同时,为了缓解设备等原因可能导致的运动模糊、遮挡等问题,使用传感器进行视觉描述的辅助工具。
在网络设计上,主要是两大模块:AMMT模块用于合并视觉特征 h v h_v hv和传感器特征 h s h_s hs得到合并的特征 h V + S h_{V+S} hV+S,然后将这三种特征( h v , h s , h V + S h_v, h_s, h_{V+S} hv,hs,hV+S)输入到DMA模块中对其进行选择性的注意力学习。然后输入GRU中进行word生成
详细设计
1. 特征提取
- 视觉特征 h V h_V hV:Vgg16
- 传感器特征 h S h_S hS:LSTM(时序)
2. Asymmetric Multi-modal Transformation(AMMT)
实质上是特征合并
出处:FiLM: Visual Reasoning with a General Conditioning Layer,知识点参考feature-wise linear modulation
ps:初始化 W c = I , b c = 0 W_c=I, b_c=0 Wc=I,bc=0,即初始化为concate,随着训练的深入,学习二者的合并特征
注意这里输出的特征是三种特征:
(1) 视觉特征 h V h_V hV
(2)传感器特征 h S h_S hS
(3)合并的特征 h V + S h_{V+S} hV+S
- 一些使用不对称的解释
一方面缓解数据冗余可能带来的过拟合;另一方面,传感器数据中有时包含不需要的噪声,因此需要对它进行调节。
3. Dynamic Modal Attention (DMA)
对三种特征进行动态选择注意力
这里使用了Gumbel Softmax
ps:使用三种特征的原因:因为在许多情况下,只使用单一模态是可取的(例如,包含不需要的噪声的传感器数据)。
边栏推荐
- 在米家、欧瑞博、苹果HomeKit趋势下,智汀如何从中脱颖而出?
- DOM node object + time node comprehensive case
- Operand of null-aware operation ‘!‘ has type ‘SchedulerBinding‘ which excludes null.
- Harmonyos fourth training
- When deleting a file, the prompt "the length of the source file name is greater than the length supported by the system" cannot be deleted. Solution
- Array initialization of local variables
- Intelligent annotation scheme of entity recognition based on hugging Face Pre training model: generate doccano request JSON format
- 《5》 Table
- Development thoughts of adding new requirements in secondary development
- The founder has a debt of 1billion. Let's start the class. Is it about to "end the class"?
猜你喜欢
Make web content editable
Is it necessary to renew the PMP certificate?
DOM-节点对象+时间节点 综合案例
K6EL-100漏电继电器
漏电继电器JOLX-GS62零序孔径Φ100
Y58. Chapter III kubernetes from entry to proficiency - continuous integration and deployment (Sany)
Safe landing practice of software supply chain under salesforce containerized ISV scenario
Use, configuration and points for attention of network layer protocol (taking QoS as an example) when using OPNET for network simulation
【js组件】自定义select
全链路压测:影子库与影子表之争
随机推荐
Senior programmers must know and master. This article explains in detail the principle of MySQL master-slave synchronization, and recommends collecting
Is it necessary to renew the PMP certificate?
[question] Compilation Principle
导航栏根据路由变换颜色
Batch size setting skills
Array initialization of local variables
Where is NPDP product manager certification sacred?
JHOK-ZBL1漏电继电器
Annotation初体验
与利润无关的背包问题(深度优先搜索)
Photo selector collectionview
Tencent cloud database public cloud market ranks top 2!
Record a pressure measurement experience summary
Longest palindrome substring (dynamic programming)
高压漏电继电器BLD-20
Writing process of the first paper
Operand of null-aware operation ‘!‘ has type ‘SchedulerBinding‘ which excludes null.
SQL injection HTTP header injection
Phenomenon analysis when Autowired annotation is used for list
Auto.js 获取手机所有app名字