当前位置:网站首页>Research and practice of super-resolution technology in the field of real-time audio and video

Research and practice of super-resolution technology in the field of real-time audio and video

2022-07-07 19:18:00 Netease Yunxin

Preface

In recent days, , Top level conference on computer vision and pattern recognition CVPR Held in New Orleans, USA , At the same time, it is the most influential global top event in the field of computer image restoration NTIRE Award at the meeting , Netease Yunxin audio and video laboratory obtained NTIRE High efficiency super-resolution challenge overall performance track champion , And the running time is the third place of the track . This article will focus on AI The landing problem of super separation technology from research to deployment , Introduce the current situation of super-resolution Technology , And the opportunities and challenges faced by the application of video hyperscore on the mobile terminal .

 

 

Overview of super-resolution Technology

In recent years , Internet video data is growing explosively . meanwhile , The resolution of video is getting higher and higher , To meet people's quality of video experience (Quality of Experience, QoE) Growing demand . however , Due to the limitation of bandwidth , Video transmitted over the network is usually downsampled and compressed , This will inevitably lead to the decline of video quality , And then affect the user's experience and appearance . Hyperspectral technology aims to recover high-resolution output with better visual quality from low-resolution input , It can effectively solve the problem of poor video quality , So as to meet the needs of users at the playback end for extreme HD image quality . Live on demand 、 Monitoring equipment 、 Video codec 、 Cell phone shooting 、 Medical imaging 、 Digital HD and video restoration have very important application value .

 

The following two scenes are the problems we often encounter in life , Can be solved by super Division Technology :

Scene one : Zhang San watches the video on the subway , Because the subway network is bad , So you can only watch standard definition or low definition videos , If you improve the resolution, it will be easy to get stuck , Experience is very poor , So we can only accept low quality video , You can improve the resolution of the video by super dividing , Increase image quality .

Scene two : When Li Si, a primary school student, uses his smart watch to make a video call with his mother's mobile phone , I found that watches are easy to get hot , Flow is also used fast . Because limited by chip performance , Sending high-resolution video consumes more power , Only low resolution video can be sent , You can improve the image quality by using super score technology on your mother's mobile phone .

Classification and development direction of super-resolution Technology  

In a broad sense, super-resolution technology includes 3 In this case :  Single image super-resolution Super resolution reconstruction of single frame image from multi frame continuous image Super resolution reconstruction of video sequence .

Single image amplification mainly uses the prior knowledge of high-resolution images and high-frequency information in the form of aliasing to restore . In the latter two cases, in addition to using prior knowledge and single image information , The complementary information between adjacent images can also be used for super-resolution reconstruction , Get a high-resolution image with higher resolution than any low resolution image , However, these two situations often bring unacceptable computational costs and the risk of discontinuous reconstruction of adjacent frames . therefore , On actual landing , Prefer single image super-resolution Technology .

Classify according to time and effect , Single image super-resolution algorithms can be divided into Traditional algorithms and Deep learning algorithm Two types of .

 

Traditional super-resolution reconstruction algorithm

Traditional super-resolution reconstruction algorithms mainly rely on basic digital image processing technology for reconstruction , The common ones are as follows :

  • Super resolution reconstruction based on interpolation : Based on interpolation, each pixel in the image is regarded as a point on the image plane , Then the estimation of super-resolution image can be seen as the process of fitting the known pixel information to the unknown pixel information on the plane , This is usually done by a predefined transform function or interpolation kernel . The method based on interpolation is simple 、 Easy to understand , But there are obvious defects . The restored image is often blurred 、 Sawtooth, etc . Common interpolation based methods include nearest neighbor interpolation 、 Bilinear interpolation and bicube interpolation .

  • Super resolution reconstruction based on degenerate model : This kind of method starts from the degradation model of image , It is assumed that the high-resolution image has undergone appropriate motion transformation 、 Blur and noise to get a low resolution image . This method extracts key information from low resolution images , Combined with the prior knowledge of unknown super-resolution image, the generation of super-resolution image is constrained . Common methods include iterative back projection 、 Convex set projection method and maximum a posteriori probability method .

  • Super resolution reconstruction based on learning : The learning based method is to use a large amount of training data , Learn some correspondence between low resolution images and high resolution images , Then the high-resolution image corresponding to the low-resolution image is predicted according to the learned mapping relationship , So as to realize the super-resolution reconstruction process of the image . Common learning based methods include manifold learning 、 Sparse coding method .

Super resolution reconstruction algorithm based on deep learning

SRCNN It is the first attempt of deep learning method in super-resolution problem , It's a simple convolution network , from 3 It's made up of two convolution layers , Each convolution layer is responsible for different functions . The first convolution layer is mainly responsible for extracting high-frequency features , The second convolution layer is responsible for nonlinear mapping from low definition features to high definition features , Finally, the role of the convolution layer is to reconstruct high-resolution images .SRCNN Our network structure is relatively simple , The super-resolution effect also needs to be improved , However, it establishes the basic idea of deep learning method in dealing with super-resolution problems . The later deep learning method , Basically follow this idea to carry out super-resolution reconstruction .

Later, ESPCN be based on SRCNN Some improvements have been made , However, due to the limited capacity of network reconstruction , The super-resolution effect is not particularly ideal . Because at that time , There are some problems in the training of deep convolution network . In general, for convolutional neural networks , As the number of network layers increases , Performance will also increase , But in practice , People find that when the number of network layers increases to a certain extent , Because of the back propagation principle , There will be the problem of gradients disappearing , The convergence of the network becomes worse , Reduced model performance . This question goes on until ResNet After putting forward the residual network structure , To get a better solution . But it is worth noting that ,ESPCN The sub-pixel convolution layer is proposed for the first time in the network , It eliminates the pre acquisition operation before the low resolution image is sent to the neural network , Greatly reduced SRCNN Amount of computation , Improved reconstruction efficiency .

VDSR It is the first application of residual network and residual learning in super-resolution problem , Increase the number of super-resolution layers to the first time 20 layer . Using residual learning , Residual characteristics of network learning , The network converges fast , Be more sensitive to details . Later, some convolutional neural networks proposed more complex structures , such as RGAN This paper proposes to use generative countermeasure network to generate high resolution image ,SRGAN from 2 Part of it is made up of , One is generating networks , The other is the discriminant network . The function of generating network is to generate a high resolution image based on a low resolution image , The function of discriminant network is to judge the high resolution image generated by the generated network as false , So when the network is training , There is a constant game between the generation network and the decision network , And finally to balance , So as to generate a high resolution image with realistic detail texture , It has better subjective visual effect . Other deep convolution network methods, such as SRDenseNet、EDSR、RDN, Using a more complex network structure , The convolution layer of the network is getting deeper and deeper , The super-resolution effect on single image is also getting better and better .

However , Due to high computing costs and memory consumption , Many jobs are difficult to deploy on equipment with limited resources . So , Super-resolution and efficient model design has also attracted widespread attention .FSRCNN Adopt a compact hourglass architecture to accelerate for the first time SR The Internet ;DRCN and DRRN The recursive layer is used to build a deep network with fewer parameters .CARN By combining effective residual blocks with group convolution , Less SR Network computing . Attention mechanism is also introduced to find the area with the largest amount of information , To better reconstruct high-resolution images . in addition , Knowledge distillation has also been cited in lightweight super-resolution networks , To improve their performance .

The challenge of real-time video super-resolution  

In the era of mobile Internet , As the most important platform for video content, mobile terminal , Responsible for a large number of PGC and UGC Playback of video content , But limited by the insufficient generalization ability of the model , The computing power of the mobile terminal is limited , The algorithm has high computational complexity , be based on AI The following characteristics of super-resolution algorithm make it face great challenges in real-time deployment on mobile terminals :

  • The subjective effect is poor , Directly use the above-mentioned super-resolution algorithm based on deep learning , You will find its subjective effect and Bicubic Such as traditional algorithms , The effect of improving video quality is very limited .

  • educational circles SOTA Method the network model parameters are too large , Even many networks that are lightweight , Parameter quantities are also greater than 500K, This leads to excessive computation of the model , Reasoning is slow , It cannot meet the requirements of real-time video processing on mobile terminals .

Yunxin AI Over score

Training data based on real down sampling

The existing training data of super-resolution algorithm based on deep learning , Often through Bicubic Or other known down sampling methods . However, the real scene is often not , As a result, there is a large gap between the model training data and the actual prediction data , So the effect of the super score algorithm is not ideal .

We adopt a real down sampling generation method which is also based on the confrontation generation network . As shown in the figure below , For a high resolution image , Let's train the sampling generator G Discriminator D, bring G The generated low resolution image is close to the real low resolution image , So as to get the real down sampling G. Got it G Later, we can use high-resolution images to generate a large number of training data pairs that meet the degradation of real down sampling .

 

Yunxin super resolution algorithm  

Netease Yunxin video lab proposes an edge oriented and efficient feature distillation network (EFDN), stay 2022 year CVPR NTIRE High efficiency super-resolution challenge ,Overall Performance Racetrack Yunxin won the first place with obvious advantages ,Runtime The track finished third .

 

Complete report of the competition :

https://arxiv.org/abs/2205.05675

This method is to improve the accuracy of the model and reduce the cost of the model , Based on the idea of structural heavy parameters , In the training phase, the edge oriented convolution block is used (ECB) Instead of residual characteristic distillation module (RFDB) Medium SRB Shallow residual block , In the reasoning stage, the edge oriented convolution block (ECB) Convert to normal 3x3 Convolution layer , This method can extract texture information and edge information of image more efficiently , Improve network performance while reducing overhead ; At the same time, it can enhance spatial attention (ESA) Module for tailoring , Reduce the number of parameters and increase the step size of the pool layer , The algorithm overhead is further reduced .

 

In order to further implement the project , Get a model that can run in real time on mobile devices , Comprehensive efficiency and effect , Yunxin team adopts the following optimization methods :

  • The model of compression : In the actual landing process , In order to meet the requirements of real-time processing , We are CVPR NTIRE 2022 High efficiency super score challenge competition model - An edge oriented efficient feature distillation module (EFDN) On the basis of , Use the passage to prune , Knowledge distillation and other model compression technologies further reduce the redundant parameters in the optimized model architecture , Remove channels that contribute little to model performance , To reduce the complexity of the model . At the same time, quantization technology is used to store the weight in low bits , So as to reduce the model volume , Accelerate Computing .

  • Engineering optimization : Calculate power on mobile devices 、 With limited memory bandwidth , It should not only meet the requirements of real-time video processing with super division algorithm , It can't increase too much power consumption , The requirements for engineering deployment are very high . Our optimization on the engineering side is mainly through SIMD, Model memory optimization , Optimization methods such as data layout optimization save memory overhead and reasoning time , At the same time, the algorithm realizes zero copy of memory between rendering pipeline and device by deeply combining business scenarios , Complete the high-performance landing of the algorithm .

The following table shows yunxinchao on different platforms / A single frame on the device takes time .

 

 

Effect display and future outlook  

The video super division of mobile terminal can break through the limitation of codec and the efficiency bottleneck of video transmission , Optimize video transmission speed 、 User experience such as playback fluency , And bring many practical effects :

  • Improve video clarity , Take advantage of the high resolution of the high-end computer screen , Low definition video HD playback , HD video provides Ultra HD image quality , Improve users' video consumption experience .

  • Reduce bandwidth , Through the sender / The server reduces the resolution of transcoding and distributing video , Combined with the super-resolution processing at the receiving end, it presents a high-resolution effect , Lower the threshold of HD playback , Improve fluency , Reduce user network pressure .

future , We will continue to optimize video enhancement algorithms, including hyperfractional algorithms , Create the industry's top image restoration and image enhancement technology , Help customers improve video quality , Reduce video playback costs , Provide lower consumption of time , Lower power consumption , Better subjective quality , Algorithms that cover more models and save more bit rates , Let users on different mobile phones 、 Enjoy Ultra HD video experience in different network environments .

  Zuo Wei  Bicubic Upsampling result , On the right is the result of super fractional optimization

Reference material  

[1] Yawei Li, Kai Zhang, Luc Van Gool, Radu Timofte, et al. Ntire 2022 challenge on efficient super-resolution: Methods and results. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2022.

[2] Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the ACM International Conference on Multimedia, pages 2024–2032, 2019.

[3] Jie Liu, Jie Tang, and Gangshan Wu. Residual feature distillation network for lightweight image super-resolution. In European Conference on Computer Vision Workshops, pages41–55. Springer, 2020.

[4] Zhang, Xindong and Zeng, Hui and Zhang, Lei. Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices. In Proceedings of the 29th ACM International Conference on Multimedia, pages4034--4043. 2021.

 

原网站

版权声明
本文为[Netease Yunxin]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071706158845.html