当前位置:网站首页>Animesr: learnable degradation operator and new real world animation VSR dataset
Animesr: learnable degradation operator and new real world animation VSR dataset
2022-07-01 13:41:00 【I love computer vision】
Official account , Find out CV The beauty of Technology
Xintao Another masterpiece of the big guy team , This paper 『AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos』 For real animation VSR A new animation data set is proposed , In addition, the real-world degradation operator is extended to a learnable operator , stay NIQE And other evaluation indicators SOTA.

Author's unit : tencent PCG ARC laboratory
Thesis link :https://arxiv.org/pdf/2206.07038
01
Watch it
This article summarizes three implementation of animation VSR Three key improvement measures :
The recent real world VSR The degradation of methods mostly uses basic operators without learning ability , As fuzzy 、 Noise and compression . This article suggests starting from the real LQ Learn these basic operators in animation , And the learned operators are added to the degradation process . This basic operation based on neural network can help to better capture the distribution of real degradation .
Established a large-scale HQ Animation dataset AVC, So that the animation VSR Train and evaluate .
An efficient multiscale network structure is studied AnimeSR, It makes use of the efficiency of one-way loop network and the effectiveness of sliding window method , Achieve better performance than previous advanced methods .

02
Method
AVC Data sets
Training set AVC-Train contain 553 A high-quality clip , common 55300 frame . Test set AVC-Test contain 30 A fragment , common 3000 frame . In order to evaluate the method in the actual scene , This paper also constructs a real-world test set AVC-RealLQ, It consists of 44 Low quality fragments , The following figure shows some examples of datasets .

Learnable basic operators in degraded synthesis
For lack of LR-HR Training is right , Recent work has designed degradation models as close to the real world as possible , Then use the degradation model from HR In the process of synthesis LR. The above degradation can be described as n Step :
◦◦
The basic operators in the classical degradation model include fuzzy 、 noise 、 The zoom 、JPEG/FFMPEG Compression etc. . These operators do not have any learning ability , This essentially limits their synthetic ability to degrade the real world , Here's the picture a. The other uses large-scale neural networks and confrontation learning methods to synthesize LR sample .
However , Using a large neural network to learn the whole degradation process and distribution is a challenge . These methods are only effective for a limited range of images , And it usually produces unpleasant artifacts , Here's the picture b.
This paper suggests learning the basic operators for degraded synthesis . Different from using a large network , This paper uses twoorthree convolution layers to train tiny Neural Networks , To capture the main features of real degradation , The neural network is subsequently incorporated into the degradation synthesis process . Neural operators are learnable , And it can synthesize those real degenerates that classical operators cannot simulate . The basic operators that can be learned greatly expand the degenerate space , It can cover more real degradation .

Enter the zoom policy
This article USES the LR-HR Train the basic operators that can be learned in a supervised way . However , Get the real world LQ The video LR-HR It is challenging for training . For real LQ Animation , In this paper, the basic operator is used to train the degenerate model, and the preliminary results are obtained , Here's the picture . As expected , The output is not satisfactory . By using different scaling factors (×1—×0.3) To adjust the size of the input .
Can be observed , As the input resolution decreases , Artifacts gradually decrease . But too much downscaling factor will lead to details / Loss of information . among , By scaling these video samples ×0.5 The input of , A good balance can be achieved between artifact elimination and detail loss . therefore , You can manually select a satisfactory output as a pseudo HR, be called “ Enter the zoom policy ”.

Learnable basic operators
This paper selects several representative real-world LQ Animation to train basic operators that can be learned . First, screen VSR The model performs poorly in the original proportion , But under the appropriate scale factor, it can produce better results LQ video , And determine the best zoom factor for each video . Each paragraph LQ Video capture is about 2000 frame , Enter them into VSR In the network , Get fake HR sample . And then use LR— false HR Basic operators that can be learned for training .
The neural operator is composed of 3 individual 3 × 3 The convolution layer consists of , The dimension of the hidden channel is 64. Use between convolution layers LeakyReLU Activate . This article from different LQ Three basic operators that can be learned are trained in the video , And put them into a pool . At each training iteration, randomly select one from the pool , And incorporate it into the degradation process .
Network architecture
Actual animation VSR The network structure in requires a good balance between performance and efficiency . Current practical models such as Real-ESRGAN and RealBasicVSR Usually a very large network , Processing is very time consuming , Take up a lot of resources . When the existing video super-resolution reaches 4K/8K Resolution time , This shortcoming will become more serious . In practice VSR One way circulation structure is usually used in . However , The lack of subsequent frames hinders the use of time information . Therefore, on the basis of efficient unidirectional structure , This paper further adopts the sliding window structure . The cyclic block receives a sequence of frames .

Pictured above b, In the loop block 10 Multi level design of residual blocks . Use three scales ,×1,×0.5 and ×0.25 These three scales are assigned 5、3 and 2 Block . In this paper AnimeSR Optical flow is not used in , Because the author found from experience that optical flow will not bring significant visual improvement . Besides , The calculation of optical flow also reduces the speed of training and reasoning .
03
experiment
Ablation Experiment
Data sets 、 Degenerate model 、 Multiscale structure and learnable basic operators (LBO) The ablation experiment

Quantitative assessment
The author thinks that NR-IQA Indicators are not always consistent with visual quality , Especially on finer scales , Used MANIQA Than NIQE More in line with the perceived visual quality .

Qualitative assessment

04
summary
This article from the xintao Big guy team ,AnimeSR The main contributions are as follows : From the real LQ Learn degradation operators in animation to better capture the distribution of real degradation ; Built a large-scale HQ Animated video dataset AVC For animation VSR Training and evaluation of ; Effective “ Enter the zoom policy ” Make it possible to learn these neural operations ; An efficient multi-scale network structure is further studied to make AnimeSR Realization SOTA. For the whole article , The author thinks :
Entering a zoom strategy is slightly subjective , Is a more objective screening scheme the content that can be studied later , secondly , Past single item VSR The input of is and , In this paper, the innovation of adding output as a sliding window is slightly demanding , Also as input, whether it is not strictly ‘unidirectional’ What about it ?
In the selection of training set, optical flow is used to filter static scenes , But in VSR It is mentioned in that the effect of using optical flow is poor , There is no experimental or theoretical demonstration in this part . So is it the problem of optical flow itself that leads to the bad effect , Use other alignment methods ? Or is it caused by the single range of motion in the data set ? The author believes that more detailed demonstration is needed .
This article uses more MANIQA As a quantitative evaluation index , Can you add a little more evaluation indicators, such as NRQM、PI、BRISQUE etc. , Besides , Is synthetic data also a kind of real world ?

END
Welcome to join 「 Super resolution 」 Exchange group notes :SR

边栏推荐
- Beidou communication module Beidou GPS module Beidou communication terminal DTU
- Report on the "14th five year plan" and scale prospect prediction of China's laser processing equipment manufacturing industry Ⓢ 2022 ~ 2028
- 北斗通信模块 北斗gps模块 北斗通信终端DTU
- Dragon lizard community open source coolbpf, BPF program development efficiency increased 100 times
- Leetcode question 1: sum of two numbers (3 languages)
- Declare an abstract class vehicle, which contains the private variable numofwheel and the public functions vehicle (int), horn (), setnumofwheel (int) and getnumofwheel (). Subclass mot
- Application of 5g industrial gateway in scientific and technological overload control; off-site joint law enforcement for over limit, overweight and overspeed
- 6. Wiper part
- Summary of interview questions (1) HTTPS man in the middle attack, the principle of concurrenthashmap, serialVersionUID constant, redis single thread,
- Computer network interview knowledge points
猜你喜欢

Yan Rong looks at how to formulate a multi cloud strategy in the era of hybrid cloud

孔松(信通院)-数字化时代云安全能力建设及趋势

面试题目总结(1) https中间人攻击,ConcurrentHashMap的原理 ,serialVersionUID常量,redis单线程,

流量管理技术

6年技术迭代,阿里全球化出海&合规的挑战和探索

Chen Yu (Aqua) - Safety - & gt; Cloud Security - & gt; Multicloud security

Qtdeisgner, pyuic detailed use tutorial interface and function logic separation (nanny teaching)

The stack size specified is too small, specify at least 328k

2022上半年英特尔有哪些“硬核创新”?看这张图就知道了!

Jenkins+webhooks- multi branch parametric construction-
随机推荐
关于佛萨奇2.0“Meta Force原力元宇宙系统开发逻辑方案(详情)
Solution to 0xc000007b error when running the game [easy to understand]
进入前六!博云在中国云管理软件市场销量排行持续上升
Dragon lizard community open source coolbpf, BPF program development efficiency increased 100 times
Several models of IO blocking, non blocking, IO multiplexing, signal driven and asynchronous IO
详细讲解面试的 IO多路复用,select,poll,epoll
3.4 《数据库系统概论》之数据查询—SELECT(单表查询、连接查询、嵌套查询、集合查询、多表查询)
Benefiting from the Internet, the scientific and technological performance of overseas exchange volume has returned to high growth
Summary of interview questions (1) HTTPS man in the middle attack, the principle of concurrenthashmap, serialVersionUID constant, redis single thread,
French Data Protection Agency: using Google Analytics or violating gdpr
Self cultivation of open source programmers who contributed tens of millions of lines of code to shardingsphere and later became CEO
[development of large e-commerce projects] performance pressure test - basic concept of pressure test & jmeter-38
Asp. NETCORE uses dynamic to simplify database access
一款Flutter版的记事本
Use of shutter SQLite
Listen in the network
二传感器尺寸「建议收藏」
spark源码阅读总纲
Terminal identification technology and management technology
Google Earth engine (GEE) - Global Human Settlements grid data 1975-1990-2000-2014 (p2016)