当前位置:网站首页>Paper reading [open book video captioning with retrieve copy generate network]
Paper reading [open book video captioning with retrieve copy generate network]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Open-book Video Captioning with Retrieve-Copy-Generate Network
Summary
- publish :CVPR 2021
- idea: The author believes that the previous method is due to generation caption Lack of guidance when , So generated caption It's monotonous , And because the training data set is fixed , Therefore, the knowledge learned after model training is not scalable . The author thought of passing video-to-text Search task , Retrieve sentences from the corpus as caption Guidance of . Similar to open book examination (open-domain mechanism)
Detailed design

1. Effective Video-to-Text Retriever
Put all in the corpus sentences Through one textual encoder Mapping to d dimension ,videos adopt visual encoder Mapping to d dimension , Find the similarity as the selection standard

Textual Encoder:bi-LSTM

ps: L L L Indicates the length of the sentence , W s W_s Ws It's learnable embedding matrix , η s \eta _s ηs by LSTM Parameters of
Will the length L Of sentence Aggregate into one d Dimensional vector:
v s v_s vs Is the aggregation parameterVisual Encoder:appearance features && motion features


v a , v m v_a,v_m va,vm Is the aggregation parametervideo-to-text similarity:

The resulting k Search out the guiding sentences
2. Copy-mechanism Caption Generator
adopt Hierarchical Caption Decoder To generate caption, Just in every step adopt Dynamic Multi-pointers Module Decide whether to copy Guided word
2.1 Hierarchical Caption Decoder
By a attention-LSTM And a language-LSTM form .attention-LSTM For attention visual features The probability distribution used to aggregate the current state and visual context to generate a vocabulary p v o c p_{voc} pvoc
- attention-LSTM

x = [ x m ; x a ] x = [x^m;x^a] x=[xm;xa], y t − 1 y_{t-1} yt−1 Indicates the last step Generated words - language-LSTM

W b o c , b b o c W_{boc},b_{boc} Wboc,bboc Are learnable parameters
2.2 Dynamic Multi-pointers Module
Premise : Already got K Candidates sentences
Every sentence Yes L Word 
Deal with each sentence separately . take decoder Medium hidden state h t l h^l_t htl As Q In the sentence L Words do attention, obtain L Attention probability distribution of words

p r e t , i p_{ret,i} pret,i It means the first one i The weight of attention distribution of each word in a sentence ; c i , t r c_{i,t}^r ci,tr Represents the weighted result .Decide whether to copy The selected word

Get the probability distribution of all the final words ( p r e t p_{ret} pret Be extended , p c o p y p_{copy} pcopy Be broadcast )

3. Training
- Strategy 1: In order to expand the corpus , It can be fixed retriever,fine-tuning generator.
- Strategy 2: You can also train together , But if you update directly retriever It can lead to generator Poor training from the beginning , So for Loss Added restrictions

experimental result
- Ablation Experiment
different K
different corpus size
- Comparison Performance

The result is actually average , No more than 20 Some experiments in
边栏推荐
- Summary of the mean value theorem of higher numbers
- 利用OPNET进行网络任意源组播(ASM)仿真的设计、配置及注意点
- nodejs获取客户端ip
- Leetcode (417) -- Pacific Atlantic current problem
- Initial experience of annotation
- Leakage relay llj-100fs
- A cool "ghost" console tool
- High voltage leakage relay bld-20
- Zhang Ping'an: accelerate cloud digital innovation and jointly build an industrial smart ecosystem
- Leakage relay jelr-250fg
猜你喜欢

Leetcode 1189 maximum number of "balloons" [map] the leetcode road of heroding

《5》 Table

4. Object mapping Mapster

JHOK-ZBG2漏电继电器

Design, configuration and points for attention of network specified source multicast (SSM) simulation using OPNET

分布式事务解决方案之TCC

Photo selector collectionview

JVM (19) -- bytecode and class loading (4) -- talk about class loader again

CVE-2021-3156 漏洞复现笔记

CentOS 7.9 installing Oracle 21C Adventures
随机推荐
Record a pressure measurement experience summary
CentOS 7.9 installing Oracle 21C Adventures
How digitalization affects workflow automation
说一说MVCC多版本并发控制器?
app clear data源码追踪
CVE-2021-3156 漏洞复现笔记
AOSP ~binder communication principle (I) - Overview
K6EL-100漏电继电器
利用OPNET进行网络单播(一服务器多客户端)仿真的设计、配置及注意点
Leakage relay jd1-100
Y58. Chapter III kubernetes from entry to proficiency - continuous integration and deployment (Sany)
Is the human body sensor easy to use? How to use it? Which do you buy between aqara green rice and Xiaomi
Addressable pre Download
高压漏电继电器BLD-20
NPDP产品经理认证,到底是何方神圣?
漏电继电器JD1-100
The year of the tiger is coming. Come and make a wish. I heard that the wish will come true
MySQL数据库学习(7) -- pymysql简单介绍
Summary of the mean value theorem of higher numbers
Annotation初体验