当前位置：网站首页>Animesr: learnable degradation operator and new real world animation VSR dataset

Animesr: learnable degradation operator and new real world animation VSR dataset

2022-07-01 13:41:00 【I love computer vision】

Official account , Find out CV The beauty of Technology

Xintao Another masterpiece of the big guy team , This paper 『AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos』 For real animation VSR A new animation data set is proposed , In addition, the real-world degradation operator is extended to a learnable operator , stay NIQE And other evaluation indicators SOTA.

Author's unit ： tencent PCG ARC laboratory
Thesis link ：https://arxiv.org/pdf/2206.07038

Watch it

This article summarizes three implementation of animation VSR Three key improvement measures ：

The recent real world VSR The degradation of methods mostly uses basic operators without learning ability , As fuzzy 、 Noise and compression . This article suggests starting from the real LQ Learn these basic operators in animation , And the learned operators are added to the degradation process . This basic operation based on neural network can help to better capture the distribution of real degradation .
Established a large-scale HQ Animation dataset AVC, So that the animation VSR Train and evaluate .
An efficient multiscale network structure is studied AnimeSR, It makes use of the efficiency of one-way loop network and the effectiveness of sliding window method , Achieve better performance than previous advanced methods .

Method

AVC Data sets

Training set AVC-Train contain 553 A high-quality clip , common 55300 frame . Test set AVC-Test contain 30 A fragment , common 3000 frame . In order to evaluate the method in the actual scene , This paper also constructs a real-world test set AVC-RealLQ, It consists of 44 Low quality fragments , The following figure shows some examples of datasets .

Learnable basic operators in degraded synthesis

For lack of LR-HR Training is right , Recent work has designed degradation models as close to the real world as possible , Then use the degradation model from HR In the process of synthesis LR. The above degradation can be described as n Step ：

◦◦

The basic operators in the classical degradation model include fuzzy 、 noise 、 The zoom 、JPEG/FFMPEG Compression etc. . These operators do not have any learning ability , This essentially limits their synthetic ability to degrade the real world , Here's the picture a. The other uses large-scale neural networks and confrontation learning methods to synthesize LR sample .

However , Using a large neural network to learn the whole degradation process and distribution is a challenge . These methods are only effective for a limited range of images , And it usually produces unpleasant artifacts , Here's the picture b.

This paper suggests learning the basic operators for degraded synthesis . Different from using a large network , This paper uses twoorthree convolution layers to train tiny Neural Networks , To capture the main features of real degradation , The neural network is subsequently incorporated into the degradation synthesis process . Neural operators are learnable , And it can synthesize those real degenerates that classical operators cannot simulate . The basic operators that can be learned greatly expand the degenerate space , It can cover more real degradation .

Enter the zoom policy

This article USES the LR-HR Train the basic operators that can be learned in a supervised way . However , Get the real world LQ The video LR-HR It is challenging for training . For real LQ Animation , In this paper, the basic operator is used to train the degenerate model, and the preliminary results are obtained , Here's the picture . As expected , The output is not satisfactory . By using different scaling factors （×1—×0.3） To adjust the size of the input .

Can be observed , As the input resolution decreases , Artifacts gradually decrease . But too much downscaling factor will lead to details / Loss of information . among , By scaling these video samples ×0.5 The input of , A good balance can be achieved between artifact elimination and detail loss . therefore , You can manually select a satisfactory output as a pseudo HR, be called “ Enter the zoom policy ”.

Learnable basic operators

This paper selects several representative real-world LQ Animation to train basic operators that can be learned . First, screen VSR The model performs poorly in the original proportion , But under the appropriate scale factor, it can produce better results LQ video , And determine the best zoom factor for each video . Each paragraph LQ Video capture is about 2000 frame , Enter them into VSR In the network , Get fake HR sample . And then use LR— false HR Basic operators that can be learned for training .

The neural operator is composed of 3 individual 3 × 3 The convolution layer consists of , The dimension of the hidden channel is 64. Use between convolution layers LeakyReLU Activate . This article from different LQ Three basic operators that can be learned are trained in the video , And put them into a pool . At each training iteration, randomly select one from the pool , And incorporate it into the degradation process .

Network architecture

Actual animation VSR The network structure in requires a good balance between performance and efficiency . Current practical models such as Real-ESRGAN and RealBasicVSR Usually a very large network , Processing is very time consuming , Take up a lot of resources . When the existing video super-resolution reaches 4K/8K Resolution time , This shortcoming will become more serious . In practice VSR One way circulation structure is usually used in . However , The lack of subsequent frames hinders the use of time information . Therefore, on the basis of efficient unidirectional structure , This paper further adopts the sliding window structure . The cyclic block receives a sequence of frames .

Pictured above b, In the loop block 10 Multi level design of residual blocks . Use three scales ,×1,×0.5 and ×0.25 These three scales are assigned 5、3 and 2 Block . In this paper AnimeSR Optical flow is not used in , Because the author found from experience that optical flow will not bring significant visual improvement . Besides , The calculation of optical flow also reduces the speed of training and reasoning .

experiment

Ablation Experiment

Data sets 、 Degenerate model 、 Multiscale structure and learnable basic operators (LBO) The ablation experiment

Quantitative assessment

The author thinks that NR-IQA Indicators are not always consistent with visual quality , Especially on finer scales , Used MANIQA Than NIQE More in line with the perceived visual quality .

Qualitative assessment

summary

This article from the xintao Big guy team ,AnimeSR The main contributions are as follows ： From the real LQ Learn degradation operators in animation to better capture the distribution of real degradation ; Built a large-scale HQ Animated video dataset AVC For animation VSR Training and evaluation of ; Effective “ Enter the zoom policy ” Make it possible to learn these neural operations ; An efficient multi-scale network structure is further studied to make AnimeSR Realization SOTA. For the whole article , The author thinks ：

Entering a zoom strategy is slightly subjective , Is a more objective screening scheme the content that can be studied later , secondly , Past single item VSR The input of is and , In this paper, the innovation of adding output as a sliding window is slightly demanding , Also as input, whether it is not strictly ‘unidirectional’ What about it ？
In the selection of training set, optical flow is used to filter static scenes , But in VSR It is mentioned in that the effect of using optical flow is poor , There is no experimental or theoretical demonstration in this part . So is it the problem of optical flow itself that leads to the bad effect , Use other alignment methods ？ Or is it caused by the single range of motion in the data set ？ The author believes that more detailed demonstration is needed .
This article uses more MANIQA As a quantitative evaluation index , Can you add a little more evaluation indicators, such as NRQM、PI、BRISQUE etc. , Besides , Is synthetic data also a kind of real world ？