当前位置:网站首页>Paper reading [open book video captioning with retrieve copy generate network]
Paper reading [open book video captioning with retrieve copy generate network]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Open-book Video Captioning with Retrieve-Copy-Generate Network
Summary
- publish :CVPR 2021
- idea: The author believes that the previous method is due to generation caption Lack of guidance when , So generated caption It's monotonous , And because the training data set is fixed , Therefore, the knowledge learned after model training is not scalable . The author thought of passing video-to-text Search task , Retrieve sentences from the corpus as caption Guidance of . Similar to open book examination (open-domain mechanism)
Detailed design
1. Effective Video-to-Text Retriever
Put all in the corpus sentences Through one textual encoder Mapping to d dimension ,videos adopt visual encoder Mapping to d dimension , Find the similarity as the selection standard
Textual Encoder:bi-LSTM
ps: L L L Indicates the length of the sentence , W s W_s Ws It's learnable embedding matrix , η s \eta _s ηs by LSTM Parameters of
Will the length L Of sentence Aggregate into one d Dimensional vector:
v s v_s vs Is the aggregation parameterVisual Encoder:appearance features && motion features
v a , v m v_a,v_m va,vm Is the aggregation parametervideo-to-text similarity:
The resulting k Search out the guiding sentences
2. Copy-mechanism Caption Generator
adopt Hierarchical Caption Decoder To generate caption, Just in every step adopt Dynamic Multi-pointers Module Decide whether to copy Guided word
2.1 Hierarchical Caption Decoder
By a attention-LSTM And a language-LSTM form .attention-LSTM For attention visual features The probability distribution used to aggregate the current state and visual context to generate a vocabulary p v o c p_{voc} pvoc
- attention-LSTM
x = [ x m ; x a ] x = [x^m;x^a] x=[xm;xa], y t − 1 y_{t-1} yt−1 Indicates the last step Generated words - language-LSTM
W b o c , b b o c W_{boc},b_{boc} Wboc,bboc Are learnable parameters
2.2 Dynamic Multi-pointers Module
Premise : Already got K Candidates sentences Every sentence Yes L Word
Deal with each sentence separately . take decoder Medium hidden state h t l h^l_t htl As Q In the sentence L Words do attention, obtain L Attention probability distribution of words
p r e t , i p_{ret,i} pret,i It means the first one i The weight of attention distribution of each word in a sentence ; c i , t r c_{i,t}^r ci,tr Represents the weighted result .Decide whether to copy The selected word
Get the probability distribution of all the final words ( p r e t p_{ret} pret Be extended , p c o p y p_{copy} pcopy Be broadcast )
3. Training
- Strategy 1: In order to expand the corpus , It can be fixed retriever,fine-tuning generator.
- Strategy 2: You can also train together , But if you update directly retriever It can lead to generator Poor training from the beginning , So for Loss Added restrictions
experimental result
- Ablation Experiment
different K
different corpus size - Comparison Performance
The result is actually average , No more than 20 Some experiments in
边栏推荐
猜你喜欢
ThinkPHP Association preload with
基于 hugging face 预训练模型的实体识别智能标注方案:生成doccano要求json格式
K6EL-100漏电继电器
Use Zhiyun reader to translate statistical genetics books
MySQL数据库学习(8) -- mysql 内容补充
Jhok-zbg2 leakage relay
利用OPNET进行网络仿真时网络层协议(以QoS为例)的使用、配置及注意点
Leetcode (417) -- Pacific Atlantic current problem
Senior programmers must know and master. This article explains in detail the principle of MySQL master-slave synchronization, and recommends collecting
Digital innovation driven guide
随机推荐
[PM products] what is cognitive load? How to adjust cognitive load reasonably?
Most commonly used high number formula
删除文件时提示‘源文件名长度大于系统支持的长度’无法删除解决办法
张平安:加快云上数字创新,共建产业智慧生态
Senior programmers must know and master. This article explains in detail the principle of MySQL master-slave synchronization, and recommends collecting
Design, configuration and points for attention of network specified source multicast (SSM) simulation using OPNET
【oracle】简单的日期时间的格式化与排序问题
4. Object mapping Mapster
5. Data access - entityframework integration
人体传感器好不好用?怎么用?Aqara绿米、小米之间到底买哪个
高级程序员必知必会,一文详解MySQL主从同步原理,推荐收藏
[论文阅读] A Multi-branch Hybrid Transformer Network for Corneal Endothelial Cell Segmentation
说一说MVCC多版本并发控制器?
Disk monitoring related commands
Preliminary practice of niuke.com (9)
基于NCF的多模块协同实例
漏电继电器JD1-100
Use Zhiyun reader to translate statistical genetics books
一条 update 语句的生命经历
MySQL数据库学习(8) -- mysql 内容补充