当前位置:网站首页>论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
2022-07-01 18:44:00 【hei_hei_hei_】
Discriminative Latent Semantic Graph for Video Captioning
文章目录
概要
- 发表:ACM MultiMedia 2021
- 代码:D-LSG
- idea:为了加强object-level interactions和frame-level information(其实是为了常用的处理后的特征:2D-CNN、3D-CNN、R-CNN),作者主要分为三部分主要工作:Enhanced Object Proposal:使用Graph将时空上的特征融合到 latent object中;Visual Knowledge:聚合上述特征于 latent nodes 中并用来预测 semantic words;Sentence Validation:使用GAN模型对重构的视觉特征进行判别。
详细设计
- 核心设计:特征融合/聚合方式(在图中)

ps:感觉有点attention的味道
1. Multiple Feature Extraction
- 常规处理,一般都会用2D-CNN提取appearance(frame-level)特征 V a V^a Va,3D-CNN提取motion特征 V m V^m Vm,R-CNN提取region(object)特征 R R R
2. Enhanced Object Proposal
- 将 region feature 分别聚合到 motion feature 和 appearance feature 中。使用GNN将每个region feature都视为一个node。

根据公式强行解释: v a v^a va与所有region feature都有边相连,所以聚合了所有region feature的特征
这里 Ψ Ψ Ψ和 Φ Φ Φ都是Linear function之后跟了一个Tanh激活。 v ^ t m \hat v_t^m v^tm的计算类似
3. Visual Knowledge
- 主要是在Graph引入了一些新的节点(latent nodes),聚合上述信息分别生成K个候选object visual words和K个motion visual words(计算类似)

4. Discriminative Language Validation
- 为了让生成的caption具有更好的语义方面的信息(semantic concepts)。作者通过从生成的captions重构 P o P^o Po和 P m P^m Pm,然后通过一个判别器进行判别重构的视觉特征 P ^ o , P ^ m \hat P^o,\hat P^m P^o,P^m和真实的征 P o , P m P^o, P^m Po,Pm。
- 具体实现是将生成的caption通过一些1D CNN+残差 的层得到sentence feature S S S,然后让 P o P^o Po“聚合” S S S的特征

- 给生成的视觉特征 P ^ o \hat P^o P^o和真实的视觉特征 P o P^o Po打分,将其视为一个pair,类似于计算他们的相似性


- 判别式模型的输出分数(学习给生成特征低分,真实特征高分)

- 判别式模型Loss(后者是正则化项)

- 生成式模型的损失

代码
边栏推荐
- Taiaisu M source code construction, peak store app premium consignment source code sharing
- 洞态在某互联⽹⾦融科技企业的最佳落地实践
- Three ways for redis to realize current limiting
- 学习笔记【gumbel softmax】
- Bao, que se passe - t - il si le serveur 100 + O & M a mal à la tête? Utilisez le majordome xingyun!
- Redis 实现限流的三种方式
- 华为游戏初始化init失败,返回错误码907135000
- Dlib+opencv library for fatigue detection
- 记一次 .NET 差旅管理后台 CPU 爆高分析
- Superoptimag superconducting magnet system - SOM, Som2 series
猜你喜欢

3. "Create your own NFT collections and publish a Web3 application to show them" cast NFT locally

Digital business cloud: from planning to implementation, how does Minmetals Group quickly build a new pattern of digital development?

DTD建模

MySQL common graphics management tools | dark horse programmers

网易游戏,激进出海

ACM mm 2022 video understanding challenge video classification track champion autox team technology sharing

C-end dream is difficult to achieve. What does iFLYTEK rely on to support the goal of 1billion users?

Lumiprobe cell imaging study PKH26 cell membrane labeling kit

Lumiprobe 活性染料丨吲哚菁绿说明书

中英说明书丨人可溶性晚期糖基化终末产物受体(sRAGE)Elisa试剂盒
随机推荐
Stanford, salesforce|maskvit: masked vision pre training for video prediction
Transform + ASM data
How to realize the applet in its own app to realize continuous live broadcast
学习笔记【gumbel softmax】
案例分享:QinQ基本组网配置
ACM mm 2022 video understanding challenge video classification track champion autox team technology sharing
Appgallery connect scenario development practice - image storage and sharing
ES6数组去重的三个简单办法
Today, with the popularity of micro services, how does service mesh exist?
Once the SQL is optimized, the database query speed is increased by 60 times
11. Users, groups, and permissions (1)
Graduation summary
Learn MySQL from scratch - database and data table operations
Yyds dry inventory ravendb start client API (III)
indexof和includes的区别
前4A高管搞代运营,拿下一个IPO
记一次 .NET 差旅管理后台 CPU 爆高分析
Solution of digital supply chain centralized purchase platform in mechanical equipment industry: optimize resource allocation and realize cost reduction and efficiency increase
【AGC】如何解决事件分析数据本地和AGC面板中显示不一致的问题?
Dlib+opencv library for fatigue detection