当前位置:网站首页>Vcip2021: super resolution using decoded information

Vcip2021: super resolution using decoded information

2022-06-22 14:10:00 Dillon2015

This article is from VCIP2021《CNN-based Super Resolution for Video Coding Using Decoded Information》

brief introduction


With the increase of high-resolution video , It brings great challenges to video transmission in the case of limited bandwidth . To solve this problem , The encoding method of resampling can be adopted , Such as Fig.1, The video is down sampled before encoding , Then encode low resolution video , After decoding, the decoder performs up sampling to restore the original resolution .AV1 There is a mode of encoding the down sampled frames and then up sampling at the decoder .VVC China also supports RPR.

With based on CNN Superresolution of (SR) The development of , It has great potential in video coding . In this paper, we propose a method to combine the coding information in video coding SR. In the existing research SR And encoder are usually regarded as independent parts , And the paper put forward in SR Not only reconstruction information but also prediction information is used in 、QP And so on .

Model design


Due to the different characteristics of brightness and chroma , Designed separately for brightness and chroma SR Model .

Fig.2 Is the structure of the brightness model , Input includes reconstruction information 、 Forecast information and QP map, Benchmark model selection single-scale EDSR, Since each convolution layer has only 64 Feature channels, so the model has no residual scaling layer . In the figure RB Is the residual block structure , share 16 individual . The last accretion layer is 4 Channels , And then through shuffle Layers produce high-resolution reconstructed images .

Fig.3 Is the structure of the chromaticity model , The main difference from the brightness model is the input , In order to make full use of texture information, brightness reconstruction information is also used as input . The brightness reconstruction information should pass through the step of 2 The same resolution as chromaticity is sampled under the convolution layer of . The model input also includes chromaticity U Reconstruction information 、 chroma V Reconstruction information and QP map, The input does not contain forecast information .

experimental result


Data sets

Use DIV2K Dataset training model , Image to YUV420 Format , Use VTM11.0 stay RPR Configure the next encoding ,QP={22,27,32,37,42}, The training image is encoded first 2 Double down sampling , Then the decoded low resolution image and the corresponding original resolution image are used for training .

Experimental configuration

Use PyTorch Frame training model , use Tesla V-100 GPU Training ,mini-batch size Set to 16, Use Adam The optimizer learning rate is le-4, Every time 200 Wheel press 0.5 The decay factor reduces the learning rate .

experimental result

The encoder adopts All Intra To configure ,QP={22,27,32,37,42}, Results such as table 1 Shown ,Fig.4 Is the of each sequence RD curve . The gain mainly comes from the low bit rate part , This shows that this method is better for low bandwidth scenarios .

Interested parties, please pay attention to WeChat official account Video Coding

原网站

版权声明
本文为[Dillon2015]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206221239470612.html