当前位置:网站首页>Covos: no need to decode! Semi supervised Vos acceleration using motion vectors and residuals of compressed video bitstreams (CVPR 2022)
Covos: no need to decode! Semi supervised Vos acceleration using motion vectors and residuals of compressed video bitstreams (CVPR 2022)
2022-07-26 19:04:00 【I love computer vision】
Official account , Find out CV The beauty of Technology
Share this article CVPR 2022 The paper 『Accelerating Video Object Segmentation with Compressed Video』, This paper discusses how to use video compression to accelerate video instance segmentation (Video Object Segmentation,VOS), It is suitable for high-speed segmentation of compressed format video in the network .
The details are as follows :

Thesis link :https://arxiv.org/pdf/2107.12192.pdf
Project links :https://github.com/kai422/CoVOS
01
Preface
At present, ordinary methods basically decode each instance frame by frame in ordinary video , High computational complexity .
To solve this problem , The author proposes an embedded acceleration framework , Can be directly applied to existing VOS Model , Concrete , This paper first compresses the video according to the existing compression methods ( such as HEVC) Compress it into P frame 、I frame 、B frame , Another method based on motion vector ( motion vector ) Methods , Based on bidirectional propagation and multi frame connection mask Communication of . Last , The prediction results are corrected through the residual correction network .
Main contributions :
Put forward a novel VOS Acceleration module , The information from the compressed video bitstream is used for segmented propagation and correction .
Propose a soft propagation modular , It propagates with motion vector as input , And get the output mask.
Propose a mask Calibration module , The propagation error can be corrected according to the motion residual .
It can be directly applied to existing models , To improve 3 More than times the speed , And the accuracy decreases little .
02
Method
In this paper, video instance segmentation is based on compressed video , So we need to compress ordinary video first . Common forms of compression coding are :HEVC codec、MPEG-4、H.264. Based on the above compression method, the original video can be compressed into I frame 、P frame 、B frame , The characteristics of these three frames are as follows :
I-frame:I Frames represent keyframes , It can be understood as the complete retention of this frame , Decoding only needs the frame data to complete ( Because it contains the whole picture ).
P-frame:P Frames are unidirectional differential frames , It means that this frame is the same as the previous key frame or P The difference between frames , There is no complete picture data , It only contains data different from the previous frame .
B-frame:B Frames are two-way differential frames , in other words ,B A frame records the difference between this frame and the preceding and following frames .

As shown in the figure above , The size of the compressed frame decreases significantly , So use I、P、B The amount of calculation of frame propagation will be less than that of original frame propagation .

This article is based on the common VOS Model completion is right P Frame propagation , And then through two-way prediction to complete B Frame propagation .

be based on RGB Images , The motion compensation feature of each frame is obtained by the prediction unit (predicted), Then based on the motion compensation feature and motion vector, the image residual can be obtained (Residual).


among w Is the weight of forward or backward propagation ,ei For the residuals ,Ii by RGB Images
2.1. Soft motion vector propagation module : Propagation module based on motion vector
This section will introduce the propagation of non key frames based on motion vectors .

As shown in the figure above , First of all, based on Base model Get the key frame mask, Then the image features of key frames are obtained through a lightweight encoder Vk. For non keyframes , The image features are also obtained through the lightweight encoder Vn.

secondly , adopt warping The operation integrates the information of the two key frames , Get image features and mask features , Finally, calculate the similarity between the non key frame image features and the front and rear key frame image features , Come on mask Feature selection .
2.2. Residual-based correction module : Residual correction module
The motion vector captures the residuals of each frame , Therefore, these can be used as correction information .

First , Through the prediction of mask Expand to get the foreground area , Then filter the residual information through the foreground area ( Only the foreground part of the residual is retained ), Finally, input the residual information together Decoder To correct .
03
experiment
After model training , In the open dataset YouTube-VOS And DAVIS Has been tested
real Test effect

This article takes MIVOS、STM、STCN And so on base model Experiments were carried out , It can be seen that , Add the acceleration module of this article (CoVOS) after , The reasoning performance of the original model (FPS) All have been significantly improved .
Ablation

04
Conclusion
This paper presents a semi supervised method using motion vectors and residuals of compressed video bitstreams VOS Acceleration framework . Can improve accuracy but slow reasoning VOS The reasoning speed of the model , At the same time, the accuracy decreases slightly . At the same time, non keyword reasoning depends on the results of key frames , The segmentation results of non key frames will be completed after the key frame segmentation .

END
Join in 「 Video target segmentation 」 Exchange group notes :VOS

边栏推荐
- 多商户商城系统功能拆解16讲-平台端会员成长值记录
- SSM整-整合配置
- 【MySQL从入门到精通】【高级篇】(八)聚簇索引&非聚簇索引&联合索引
- Learn UML system modeling from me
- MySQL - 函数及约束命令
- Accused of excessive patent licensing fees! The U.S. Court ruled that Qualcomm violated the antitrust law: Qualcomm's share price fell 10.86%!
- Unity 农场 2 —— 种植系统
- 2022 Shanghai safety officer C certificate operation certificate examination question bank simulated examination platform operation
- MySQL - 多表查询与案例详解
- Automated test tool playwright (quick start)
猜你喜欢

FTP protocol

2022G1工业锅炉司炉上岗证题库及模拟考试

5 best overseas substitutes for WPS Office

Likeshop takeout order system is open source, 100% open source, no encryption

offer-集合(1)

Meta Cambria handle exposure, active tracking + multi tactile feedback scheme

2022 mobile crane driver test questions simulation test platform operation

网络协议:TCP/IP协议

Offer set (1)

Neural network learning (2) introduction 2
随机推荐
Automated test tool playwright (quick start)
MySQL数据库命令大全
14. Gradient detection, random initialization, neural network Summary
rancher部署kubernetes集群
JS使用readline来实现终端输入数据
【考研词汇训练营】Day 14 —— panini,predict,access,apologize,sense,transport,aggregation
Simulated 100 questions and simulated examination of refrigeration and air conditioning equipment operation examination in 2022
Have you ever encountered a deadlock problem in MySQL? How did you solve it?
MySQL日志介绍
How to design test cases well
微软默默给 curl 捐赠一万美元,半年后才通知
Seata 入门简介
Redis core principles
NFT数字藏品开发:数字藏品助力企业发展
.Net CLR GC 动态加载短暂堆阈值的计算及阈值超量的计算
多线程学习笔记-1.CAS
The first ABAP ALV reporter construction process
JS刷题计划——数组
How the test team conducts QA specification
SSM整合-异常处理器和项目异常处理方案