当前位置:网站首页>Animesr: learnable degradation operator and new real world animation VSR dataset
Animesr: learnable degradation operator and new real world animation VSR dataset
2022-07-01 13:41:00 【I love computer vision】
Official account , Find out CV The beauty of Technology
Xintao Another masterpiece of the big guy team , This paper 『AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos』 For real animation VSR A new animation data set is proposed , In addition, the real-world degradation operator is extended to a learnable operator , stay NIQE And other evaluation indicators SOTA.

Author's unit : tencent PCG ARC laboratory
Thesis link :https://arxiv.org/pdf/2206.07038
01
Watch it
This article summarizes three implementation of animation VSR Three key improvement measures :
The recent real world VSR The degradation of methods mostly uses basic operators without learning ability , As fuzzy 、 Noise and compression . This article suggests starting from the real LQ Learn these basic operators in animation , And the learned operators are added to the degradation process . This basic operation based on neural network can help to better capture the distribution of real degradation .
Established a large-scale HQ Animation dataset AVC, So that the animation VSR Train and evaluate .
An efficient multiscale network structure is studied AnimeSR, It makes use of the efficiency of one-way loop network and the effectiveness of sliding window method , Achieve better performance than previous advanced methods .

02
Method
AVC Data sets
Training set AVC-Train contain 553 A high-quality clip , common 55300 frame . Test set AVC-Test contain 30 A fragment , common 3000 frame . In order to evaluate the method in the actual scene , This paper also constructs a real-world test set AVC-RealLQ, It consists of 44 Low quality fragments , The following figure shows some examples of datasets .

Learnable basic operators in degraded synthesis
For lack of LR-HR Training is right , Recent work has designed degradation models as close to the real world as possible , Then use the degradation model from HR In the process of synthesis LR. The above degradation can be described as n Step :
◦◦
The basic operators in the classical degradation model include fuzzy 、 noise 、 The zoom 、JPEG/FFMPEG Compression etc. . These operators do not have any learning ability , This essentially limits their synthetic ability to degrade the real world , Here's the picture a. The other uses large-scale neural networks and confrontation learning methods to synthesize LR sample .
However , Using a large neural network to learn the whole degradation process and distribution is a challenge . These methods are only effective for a limited range of images , And it usually produces unpleasant artifacts , Here's the picture b.
This paper suggests learning the basic operators for degraded synthesis . Different from using a large network , This paper uses twoorthree convolution layers to train tiny Neural Networks , To capture the main features of real degradation , The neural network is subsequently incorporated into the degradation synthesis process . Neural operators are learnable , And it can synthesize those real degenerates that classical operators cannot simulate . The basic operators that can be learned greatly expand the degenerate space , It can cover more real degradation .

Enter the zoom policy
This article USES the LR-HR Train the basic operators that can be learned in a supervised way . However , Get the real world LQ The video LR-HR It is challenging for training . For real LQ Animation , In this paper, the basic operator is used to train the degenerate model, and the preliminary results are obtained , Here's the picture . As expected , The output is not satisfactory . By using different scaling factors (×1—×0.3) To adjust the size of the input .
Can be observed , As the input resolution decreases , Artifacts gradually decrease . But too much downscaling factor will lead to details / Loss of information . among , By scaling these video samples ×0.5 The input of , A good balance can be achieved between artifact elimination and detail loss . therefore , You can manually select a satisfactory output as a pseudo HR, be called “ Enter the zoom policy ”.

Learnable basic operators
This paper selects several representative real-world LQ Animation to train basic operators that can be learned . First, screen VSR The model performs poorly in the original proportion , But under the appropriate scale factor, it can produce better results LQ video , And determine the best zoom factor for each video . Each paragraph LQ Video capture is about 2000 frame , Enter them into VSR In the network , Get fake HR sample . And then use LR— false HR Basic operators that can be learned for training .
The neural operator is composed of 3 individual 3 × 3 The convolution layer consists of , The dimension of the hidden channel is 64. Use between convolution layers LeakyReLU Activate . This article from different LQ Three basic operators that can be learned are trained in the video , And put them into a pool . At each training iteration, randomly select one from the pool , And incorporate it into the degradation process .
Network architecture
Actual animation VSR The network structure in requires a good balance between performance and efficiency . Current practical models such as Real-ESRGAN and RealBasicVSR Usually a very large network , Processing is very time consuming , Take up a lot of resources . When the existing video super-resolution reaches 4K/8K Resolution time , This shortcoming will become more serious . In practice VSR One way circulation structure is usually used in . However , The lack of subsequent frames hinders the use of time information . Therefore, on the basis of efficient unidirectional structure , This paper further adopts the sliding window structure . The cyclic block receives a sequence of frames .

Pictured above b, In the loop block 10 Multi level design of residual blocks . Use three scales ,×1,×0.5 and ×0.25 These three scales are assigned 5、3 and 2 Block . In this paper AnimeSR Optical flow is not used in , Because the author found from experience that optical flow will not bring significant visual improvement . Besides , The calculation of optical flow also reduces the speed of training and reasoning .
03
experiment
Ablation Experiment
Data sets 、 Degenerate model 、 Multiscale structure and learnable basic operators (LBO) The ablation experiment

Quantitative assessment
The author thinks that NR-IQA Indicators are not always consistent with visual quality , Especially on finer scales , Used MANIQA Than NIQE More in line with the perceived visual quality .

Qualitative assessment

04
summary
This article from the xintao Big guy team ,AnimeSR The main contributions are as follows : From the real LQ Learn degradation operators in animation to better capture the distribution of real degradation ; Built a large-scale HQ Animated video dataset AVC For animation VSR Training and evaluation of ; Effective “ Enter the zoom policy ” Make it possible to learn these neural operations ; An efficient multi-scale network structure is further studied to make AnimeSR Realization SOTA. For the whole article , The author thinks :
Entering a zoom strategy is slightly subjective , Is a more objective screening scheme the content that can be studied later , secondly , Past single item VSR The input of is and , In this paper, the innovation of adding output as a sliding window is slightly demanding , Also as input, whether it is not strictly ‘unidirectional’ What about it ?
In the selection of training set, optical flow is used to filter static scenes , But in VSR It is mentioned in that the effect of using optical flow is poor , There is no experimental or theoretical demonstration in this part . So is it the problem of optical flow itself that leads to the bad effect , Use other alignment methods ? Or is it caused by the single range of motion in the data set ? The author believes that more detailed demonstration is needed .
This article uses more MANIQA As a quantitative evaluation index , Can you add a little more evaluation indicators, such as NRQM、PI、BRISQUE etc. , Besides , Is synthetic data also a kind of real world ?

END
Welcome to join 「 Super resolution 」 Exchange group notes :SR

边栏推荐
- Word2vec training Chinese word vector
- Simplex, half duplex, full duplex, TDD and FDD
- Asp. NETCORE uses dynamic to simplify database access
- Cs5268 advantages replace ag9321mcq typec multi in one docking station scheme
- Google Earth Engine(GEE)——全球人类居住区网格数据 1975-1990-2000-2014 (P2016)
- Spark source code (V) how does dagscheduler taskscheduler cooperate with submitting tasks, and what is the corresponding relationship between application, job, stage, taskset, and task?
- Detailed explanation of leetcode reconstruction binary tree [easy to understand]
- JS discolored Lego building blocks
- 一款Flutter版的记事本
- SAP 智能机器人流程自动化(iRPA)解决方案分享
猜你喜欢

启动solr报错The stack size specified is too small,Specify at least 328k

分布式事务简介(seata)

A Fletter version of Notepad

学历、长相、家境普通的人,未来的发展方向是什么?00后的职业规划都已经整得明明白白......

学会使用LiveData和ViewModel,我相信会让你在写业务时变得轻松

1553B environment construction

2022上半年英特尔有哪些“硬核创新”?看这张图就知道了!

当你真的学会DataBinding后,你会发现“这玩意真香”!

Dragon lizard community open source coolbpf, BPF program development efficiency increased 100 times

Jenkins+webhooks- multi branch parametric construction-
随机推荐
French Data Protection Agency: using Google Analytics or violating gdpr
Shangtang technology crash: a script written at the time of IPO
Simplex, half duplex, full duplex, TDD and FDD
Report on the 14th five year plan and future development trend of China's integrated circuit packaging industry Ⓓ 2022 ~ 2028
Explain IO multiplexing, select, poll, epoll in detail
6年技术迭代,阿里全球化出海&合规的挑战和探索
Global and Chinese polypropylene industry prospect analysis and market demand forecast report Ⓝ 2022 ~ 2027
Build a vc2010 development environment and create a tutorial of "realizing Tetris game in C language"
Qtdeisgner, pyuic detailed use tutorial interface and function logic separation (nanny teaching)
Apache-atlas-2.2.0 independent compilation and deployment
Cs5268 advantages replace ag9321mcq typec multi in one docking station scheme
Analysis report on the development prospect and investment strategy of the global and Chinese laser chip industry Ⓑ 2022 ~ 2027
1.8新特性-List
Solution to 0xc000007b error when running the game [easy to understand]
Machine learning summary (I): linear regression, ridge regression, Lasso regression
学会使用LiveData和ViewModel,我相信会让你在写业务时变得轻松
洞态在某互联⽹⾦融科技企业的最佳落地实践
Spark source code reading outline
研发效能度量框架解读
Apache-Atlas-2.2.0 独立编译部署