当前位置:网站首页>Paper reading [open book video captioning with retrieve copy generate network]
Paper reading [open book video captioning with retrieve copy generate network]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Open-book Video Captioning with Retrieve-Copy-Generate Network
Summary
- publish :CVPR 2021
- idea: The author believes that the previous method is due to generation caption Lack of guidance when , So generated caption It's monotonous , And because the training data set is fixed , Therefore, the knowledge learned after model training is not scalable . The author thought of passing video-to-text Search task , Retrieve sentences from the corpus as caption Guidance of . Similar to open book examination (open-domain mechanism)
Detailed design
1. Effective Video-to-Text Retriever
Put all in the corpus sentences Through one textual encoder Mapping to d dimension ,videos adopt visual encoder Mapping to d dimension , Find the similarity as the selection standard
Textual Encoder:bi-LSTM
ps: L L L Indicates the length of the sentence , W s W_s Ws It's learnable embedding matrix , η s \eta _s ηs by LSTM Parameters of
Will the length L Of sentence Aggregate into one d Dimensional vector:
v s v_s vs Is the aggregation parameterVisual Encoder:appearance features && motion features
v a , v m v_a,v_m va,vm Is the aggregation parametervideo-to-text similarity:
The resulting k Search out the guiding sentences
2. Copy-mechanism Caption Generator
adopt Hierarchical Caption Decoder To generate caption, Just in every step adopt Dynamic Multi-pointers Module Decide whether to copy Guided word
2.1 Hierarchical Caption Decoder
By a attention-LSTM And a language-LSTM form .attention-LSTM For attention visual features The probability distribution used to aggregate the current state and visual context to generate a vocabulary p v o c p_{voc} pvoc
- attention-LSTM
x = [ x m ; x a ] x = [x^m;x^a] x=[xm;xa], y t − 1 y_{t-1} yt−1 Indicates the last step Generated words - language-LSTM
W b o c , b b o c W_{boc},b_{boc} Wboc,bboc Are learnable parameters
2.2 Dynamic Multi-pointers Module
Premise : Already got K Candidates sentences Every sentence Yes L Word
Deal with each sentence separately . take decoder Medium hidden state h t l h^l_t htl As Q In the sentence L Words do attention, obtain L Attention probability distribution of words
p r e t , i p_{ret,i} pret,i It means the first one i The weight of attention distribution of each word in a sentence ; c i , t r c_{i,t}^r ci,tr Represents the weighted result .Decide whether to copy The selected word
Get the probability distribution of all the final words ( p r e t p_{ret} pret Be extended , p c o p y p_{copy} pcopy Be broadcast )
3. Training
- Strategy 1: In order to expand the corpus , It can be fixed retriever,fine-tuning generator.
- Strategy 2: You can also train together , But if you update directly retriever It can lead to generator Poor training from the beginning , So for Loss Added restrictions
experimental result
- Ablation Experiment
different K
different corpus size - Comparison Performance
The result is actually average , No more than 20 Some experiments in
边栏推荐
- Leetcode 1189 maximum number of "balloons" [map] the leetcode road of heroding
- Let f (x) = Σ x^n/n^2, prove that f (x) + F (1-x) + lnxln (1-x) = Σ 1/n^2
- Disk monitoring related commands
- Y58. Chapter III kubernetes from entry to proficiency - continuous integration and deployment (Sany)
- 拼多多商品详情接口、拼多多商品基本信息、拼多多商品属性接口
- MySQL数据库学习(7) -- pymysql简单介绍
- How digitalization affects workflow automation
- App clear data source code tracking
- How can project managers counter attack with NPDP certificates? Look here
- NPDP产品经理认证,到底是何方神圣?
猜你喜欢
Design, configuration and points for attention of network unicast (one server, multiple clients) simulation using OPNET
Getting started with DES encryption
[JS component] custom select
高压漏电继电器BLD-20
Leakage relay jd1-100
导航栏根据路由变换颜色
基于 hugging face 预训练模型的实体识别智能标注方案:生成doccano要求json格式
漏电继电器JELR-250FG
[binary tree] binary tree path finding
Leetcode: maximum number of "balloons"
随机推荐
JD commodity details page API interface, JD commodity sales API interface, JD commodity list API interface, JD app details API interface, JD details API interface, JD SKU information interface
Leetcode: maximum number of "balloons"
高压漏电继电器BLD-20
【oracle】简单的日期时间的格式化与排序问题
nodejs获取客户端ip
JHOK-ZBL1漏电继电器
K6el-100 leakage relay
做自媒体视频剪辑,专业的人会怎么寻找背景音乐素材?
Linkedblockingqueue source code analysis - initialization
How Alibaba cloud's DPCA architecture works | popular science diagram
拼多多商品详情接口、拼多多商品基本信息、拼多多商品属性接口
[JS component] date display.
Batch size setting skills
batch size设置技巧
Use Zhiyun reader to translate statistical genetics books
Educational Codeforces Round 22 B. The Golden Age
Mysql database learning (8) -- MySQL content supplement
CentOS 7.9 installing Oracle 21C Adventures
Tablayout modification of customized tab title does not take effect
Vector and class copy constructors