当前位置:网站首页>论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
论文阅读【Discriminative Latent Semantic Graph for Video Captioning】
2022-07-01 18:44:00 【hei_hei_hei_】
Discriminative Latent Semantic Graph for Video Captioning
文章目录
概要
- 发表:ACM MultiMedia 2021
- 代码:D-LSG
- idea:为了加强object-level interactions和frame-level information(其实是为了常用的处理后的特征:2D-CNN、3D-CNN、R-CNN),作者主要分为三部分主要工作:Enhanced Object Proposal:使用Graph将时空上的特征融合到 latent object中;Visual Knowledge:聚合上述特征于 latent nodes 中并用来预测 semantic words;Sentence Validation:使用GAN模型对重构的视觉特征进行判别。
详细设计
- 核心设计:特征融合/聚合方式(在图中)

ps:感觉有点attention的味道
1. Multiple Feature Extraction
- 常规处理,一般都会用2D-CNN提取appearance(frame-level)特征 V a V^a Va,3D-CNN提取motion特征 V m V^m Vm,R-CNN提取region(object)特征 R R R
2. Enhanced Object Proposal
- 将 region feature 分别聚合到 motion feature 和 appearance feature 中。使用GNN将每个region feature都视为一个node。

根据公式强行解释: v a v^a va与所有region feature都有边相连,所以聚合了所有region feature的特征
这里 Ψ Ψ Ψ和 Φ Φ Φ都是Linear function之后跟了一个Tanh激活。 v ^ t m \hat v_t^m v^tm的计算类似
3. Visual Knowledge
- 主要是在Graph引入了一些新的节点(latent nodes),聚合上述信息分别生成K个候选object visual words和K个motion visual words(计算类似)

4. Discriminative Language Validation
- 为了让生成的caption具有更好的语义方面的信息(semantic concepts)。作者通过从生成的captions重构 P o P^o Po和 P m P^m Pm,然后通过一个判别器进行判别重构的视觉特征 P ^ o , P ^ m \hat P^o,\hat P^m P^o,P^m和真实的征 P o , P m P^o, P^m Po,Pm。
- 具体实现是将生成的caption通过一些1D CNN+残差 的层得到sentence feature S S S,然后让 P o P^o Po“聚合” S S S的特征

- 给生成的视觉特征 P ^ o \hat P^o P^o和真实的视觉特征 P o P^o Po打分,将其视为一个pair,类似于计算他们的相似性


- 判别式模型的输出分数(学习给生成特征低分,真实特征高分)

- 判别式模型Loss(后者是正则化项)

- 生成式模型的损失

代码
边栏推荐
- Graduation summary
- VBA simple macro programming of Excel
- Lefse analysis
- 【pytorch记录】模型的分布式训练DataParallel、DistributedDataParallel
- June issue | antdb database participated in the preparation of the "Database Development Research Report" and appeared on the list of information technology and entrepreneurship industries
- Go language self-study series | go language data type
- Team up to learn! 14 days of Hongmeng equipment development "learning, practicing and testing" practical camp, free of charge!
- Lean thinking: source, pillar, landing. I understand it after reading this article
- Huawei game failed to initialize init with error code 907135000
- Digital business cloud: from planning to implementation, how does Minmetals Group quickly build a new pattern of digital development?
猜你喜欢
![[AGC] how to solve the problem that the local display of event analysis data is inconsistent with that in AGC panel?](/img/66/674a06d8e45a31ae879b81554ef373.png)
[AGC] how to solve the problem that the local display of event analysis data is inconsistent with that in AGC panel?

Solution of intelligent supply chain management platform in aquatic industry: support the digitalization of enterprise supply chain and improve enterprise management efficiency

CDGA|从事通信行业,那你应该考个数据管理证书

Lake Shore - crx-em-hf low temperature probe station

3. "Create your own NFT collections and publish a Web3 application to show them" cast NFT locally

Docker deploy mysql8.0

Manufacturing SRM management system supplier all-round closed-loop management, to achieve procurement sourcing and process efficient collaboration

Dlib+opencv library for fatigue detection

Getting started with kubernetes command (namespaces, pods)

How to use the low code platform of the Internet of things for personal settings?
随机推荐
Games202 operation 0 - environment building process & solving problems encountered
机械设备行业数字化供应链集采平台解决方案:优化资源配置,实现降本增效
PMP是被取消了吗??
Solution: you can ping others, but others can't ping me
Manufacturing SRM management system supplier all-round closed-loop management, to achieve procurement sourcing and process efficient collaboration
Today, with the popularity of micro services, how does service mesh exist?
Openai video pre training (VPT): action learning based on watching unmarked online videos
Intensive cultivation of channels for joint development Fuxin and Weishi Jiajie held a new product training conference
苹果产品在日本全面涨价,iPhone13涨19%
Team up to learn! 14 days of Hongmeng equipment development "learning, practicing and testing" practical camp, free of charge!
【Go ~ 0到1 】 第四天 6月30 defer,结构体,方法
Junit单元测试框架详解
Qfile read / write file operation in QT
助力数字经济发展,夯实数字人才底座—数字人才大赛在昆成功举办
云服务器ECS夏日省钱秘籍,这次@老用户快来领走
JS find the next adjacent element of the number in the array
nacos配置文件发布失败,请检查参数是否正确的解决方案
Dom4J解析XML、Xpath检索XML
Go语言高级
PostgreSQL varchar[] 数组类型操作