当前位置:网站首页>论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
2022-07-01 18:44:00 【hei_hei_hei_】
Discriminative Latent Semantic Graph for Video Captioning
文章目录
概要
- 发表:ACM MultiMedia 2021
- 代码:D-LSG
- idea:为了加强object-level interactions和frame-level information(其实是为了常用的处理后的特征:2D-CNN、3D-CNN、R-CNN),作者主要分为三部分主要工作:Enhanced Object Proposal:使用Graph将时空上的特征融合到 latent object中;Visual Knowledge:聚合上述特征于 latent nodes 中并用来预测 semantic words;Sentence Validation:使用GAN模型对重构的视觉特征进行判别。
详细设计
- 核心设计:特征融合/聚合方式(在图中)
ps:感觉有点attention的味道
1. Multiple Feature Extraction
- 常规处理,一般都会用2D-CNN提取appearance(frame-level)特征 V a V^a Va,3D-CNN提取motion特征 V m V^m Vm,R-CNN提取region(object)特征 R R R
2. Enhanced Object Proposal
- 将 region feature 分别聚合到 motion feature 和 appearance feature 中。使用GNN将每个region feature都视为一个node。
根据公式强行解释: v a v^a va与所有region feature都有边相连,所以聚合了所有region feature的特征
这里 Ψ Ψ Ψ和 Φ Φ Φ都是Linear function之后跟了一个Tanh激活。 v ^ t m \hat v_t^m v^tm的计算类似
3. Visual Knowledge
- 主要是在Graph引入了一些新的节点(latent nodes),聚合上述信息分别生成K个候选object visual words和K个motion visual words(计算类似)
4. Discriminative Language Validation
- 为了让生成的caption具有更好的语义方面的信息(semantic concepts)。作者通过从生成的captions重构 P o P^o Po和 P m P^m Pm,然后通过一个判别器进行判别重构的视觉特征 P ^ o , P ^ m \hat P^o,\hat P^m P^o,P^m和真实的征 P o , P m P^o, P^m Po,Pm。
- 具体实现是将生成的caption通过一些1D CNN+残差 的层得到sentence feature S S S,然后让 P o P^o Po“聚合” S S S的特征
- 给生成的视觉特征 P ^ o \hat P^o P^o和真实的视觉特征 P o P^o Po打分,将其视为一个pair,类似于计算他们的相似性
- 判别式模型的输出分数(学习给生成特征低分,真实特征高分)
- 判别式模型Loss(后者是正则化项)
- 生成式模型的损失
代码
边栏推荐
- ECS summer money saving secret, this time @ old users come and take it away
- Openai video pre training (VPT): action learning based on watching unmarked online videos
- Graduation summary
- Games202 operation 0 - environment building process & solving problems encountered
- Solution of intelligent supply chain management platform in aquatic industry: support the digitalization of enterprise supply chain and improve enterprise management efficiency
- 线程的并行、并发、生命周期
- Three simple methods of ES6 array de duplication
- AppGallery Connect场景化开发实战—图片存储分享
- 洞态在某互联⽹⾦融科技企业的最佳落地实践
- MySQL常用图形管理工具 | 黑马程序员
猜你喜欢
Example explanation: move graph explorer to jupyterlab
Chaos engineering platform chaosblade box new heavy release
Cdga | if you are engaged in the communication industry, you should get a data management certificate
精耕渠道共谋发展 福昕携手伟仕佳杰开展新产品培训大会
Graduation season | Huawei experts teach the interview secret: how to get a high paying offer from a large factory?
More information about M91 fast hall measuring instrument
一次SQL优化,数据库查询速度提升 60 倍
XML语法、约束
Lake Shore 连续流动低温恒温器传输线
Solidity - 合约结构 - 错误(error)- ^0.8.4版本新增
随机推荐
【AGC】如何解决事件分析数据本地和AGC面板中显示不一致的问题?
混沌工程平台 ChaosBlade-Box 新版重磅发布
Implement a Prometheus exporter
Lake shore M91 fast hall measuring instrument
小红书上的爱情买卖
论文阅读【Learning to Discretely Compose Reasoning Module Networks for Video Captioning】
[quick application] win7 system cannot run and debug projects using Huawei ide
Manufacturing SRM management system supplier all-round closed-loop management, to achieve procurement sourcing and process efficient collaboration
Shell array
Cache problems after app release
June issue | antdb database participated in the preparation of the "Database Development Research Report" and appeared on the list of information technology and entrepreneurship industries
indexof和includes的区别
AI training speed breaks Moore's law; Song shuran's team won the RSS 2022 Best Paper Award
Docker deploy mysql8.0
组队学习! 14天鸿蒙设备开发“学练考”实战营限时免费加入!
CDGA|从事通信行业,那你应该考个数据管理证书
市值蒸发740亿,这位大佬转身杀入预制菜
Viewing the whole ecology of Tiktok from a macro perspective
Boost the development of digital economy and consolidate the base of digital talents - the digital talent competition was successfully held in Kunming
Learn MySQL from scratch - database and data table operations