当前位置:网站首页>Paper reading [open book video captioning with retrieve copy generate network]
Paper reading [open book video captioning with retrieve copy generate network]
2022-07-07 05:34:00 【hei_ hei_ hei_】
Open-book Video Captioning with Retrieve-Copy-Generate Network
Summary
- publish :CVPR 2021
- idea: The author believes that the previous method is due to generation caption Lack of guidance when , So generated caption It's monotonous , And because the training data set is fixed , Therefore, the knowledge learned after model training is not scalable . The author thought of passing video-to-text Search task , Retrieve sentences from the corpus as caption Guidance of . Similar to open book examination (open-domain mechanism)
Detailed design
1. Effective Video-to-Text Retriever
Put all in the corpus sentences Through one textual encoder Mapping to d dimension ,videos adopt visual encoder Mapping to d dimension , Find the similarity as the selection standard
Textual Encoder:bi-LSTM
ps: L L L Indicates the length of the sentence , W s W_s Ws It's learnable embedding matrix , η s \eta _s ηs by LSTM Parameters of
Will the length L Of sentence Aggregate into one d Dimensional vector:
v s v_s vs Is the aggregation parameterVisual Encoder:appearance features && motion features
v a , v m v_a,v_m va,vm Is the aggregation parametervideo-to-text similarity:
The resulting k Search out the guiding sentences
2. Copy-mechanism Caption Generator
adopt Hierarchical Caption Decoder To generate caption, Just in every step adopt Dynamic Multi-pointers Module Decide whether to copy Guided word
2.1 Hierarchical Caption Decoder
By a attention-LSTM And a language-LSTM form .attention-LSTM For attention visual features The probability distribution used to aggregate the current state and visual context to generate a vocabulary p v o c p_{voc} pvoc
- attention-LSTM
x = [ x m ; x a ] x = [x^m;x^a] x=[xm;xa], y t − 1 y_{t-1} yt−1 Indicates the last step Generated words - language-LSTM
W b o c , b b o c W_{boc},b_{boc} Wboc,bboc Are learnable parameters
2.2 Dynamic Multi-pointers Module
Premise : Already got K Candidates sentences Every sentence Yes L Word
Deal with each sentence separately . take decoder Medium hidden state h t l h^l_t htl As Q In the sentence L Words do attention, obtain L Attention probability distribution of words
p r e t , i p_{ret,i} pret,i It means the first one i The weight of attention distribution of each word in a sentence ; c i , t r c_{i,t}^r ci,tr Represents the weighted result .Decide whether to copy The selected word
Get the probability distribution of all the final words ( p r e t p_{ret} pret Be extended , p c o p y p_{copy} pcopy Be broadcast )
3. Training
- Strategy 1: In order to expand the corpus , It can be fixed retriever,fine-tuning generator.
- Strategy 2: You can also train together , But if you update directly retriever It can lead to generator Poor training from the beginning , So for Loss Added restrictions
experimental result
- Ablation Experiment
different K
different corpus size - Comparison Performance
The result is actually average , No more than 20 Some experiments in
边栏推荐
- nodejs获取客户端ip
- LabVIEW is opening a new reference, indicating that the memory is full
- Preliminary practice of niuke.com (9)
- 【oracle】简单的日期时间的格式化与排序问题
- 拼多多商品详情接口、拼多多商品基本信息、拼多多商品属性接口
- Codeforces Round #416 (Div. 2) D. Vladik and Favorite Game
- 【js组件】自定义select
- How does redis implement multiple zones?
- [PHP SPL notes]
- A cool "ghost" console tool
猜你喜欢
A cool "ghost" console tool
Use Zhiyun reader to translate statistical genetics books
阿里云的神龙架构是怎么工作的 | 科普图解
Jhok-zbl1 leakage relay
Jhok-zbg2 leakage relay
Design, configuration and points for attention of network unicast (one server, multiple clients) simulation using OPNET
Leetcode: maximum number of "balloons"
利用OPNET进行网络指定源组播(SSM)仿真的设计、配置及注意点
Mapbox Chinese map address
张平安:加快云上数字创新,共建产业智慧生态
随机推荐
做自媒体视频剪辑,专业的人会怎么寻找背景音乐素材?
AIDL 与Service
Taobao commodity details page API interface, Taobao commodity list API interface, Taobao commodity sales API interface, Taobao app details API interface, Taobao details API interface
Aidl and service
Record a pressure measurement experience summary
Timer create timer
Y58. Chapter III kubernetes from entry to proficiency - continuous integration and deployment (Sany)
MySQL数据库学习(7) -- pymysql简单介绍
Initial experience of annotation
Complete code of C language neural network and its meaning
Digital innovation driven guide
[question] Compilation Principle
[PHP SPL notes]
Disk monitoring related commands
1.AVL树:左右旋-bite
Is the human body sensor easy to use? How to use it? Which do you buy between aqara green rice and Xiaomi
一条 update 语句的生命经历
EGR-20USCM接地故障继电器
漏电继电器LLJ-100FS
Zhang Ping'an: accelerate cloud digital innovation and jointly build an industrial smart ecosystem