当前位置:网站首页>Research and practice of super-resolution technology in the field of real-time audio and video
Research and practice of super-resolution technology in the field of real-time audio and video
2022-07-07 19:18:00 【Netease Yunxin】
Preface
In recent days, , Top level conference on computer vision and pattern recognition CVPR Held in New Orleans, USA , At the same time, it is the most influential global top event in the field of computer image restoration NTIRE Award at the meeting , Netease Yunxin audio and video laboratory obtained NTIRE High efficiency super-resolution challenge overall performance track champion , And the running time is the third place of the track . This article will focus on AI The landing problem of super separation technology from research to deployment , Introduce the current situation of super-resolution Technology , And the opportunities and challenges faced by the application of video hyperscore on the mobile terminal .
Overview of super-resolution Technology
In recent years , Internet video data is growing explosively . meanwhile , The resolution of video is getting higher and higher , To meet people's quality of video experience (Quality of Experience, QoE) Growing demand . however , Due to the limitation of bandwidth , Video transmitted over the network is usually downsampled and compressed , This will inevitably lead to the decline of video quality , And then affect the user's experience and appearance . Hyperspectral technology aims to recover high-resolution output with better visual quality from low-resolution input , It can effectively solve the problem of poor video quality , So as to meet the needs of users at the playback end for extreme HD image quality . Live on demand 、 Monitoring equipment 、 Video codec 、 Cell phone shooting 、 Medical imaging 、 Digital HD and video restoration have very important application value .
The following two scenes are the problems we often encounter in life , Can be solved by super Division Technology :
Scene one : Zhang San watches the video on the subway , Because the subway network is bad , So you can only watch standard definition or low definition videos , If you improve the resolution, it will be easy to get stuck , Experience is very poor , So we can only accept low quality video , You can improve the resolution of the video by super dividing , Increase image quality .
Scene two : When Li Si, a primary school student, uses his smart watch to make a video call with his mother's mobile phone , I found that watches are easy to get hot , Flow is also used fast . Because limited by chip performance , Sending high-resolution video consumes more power , Only low resolution video can be sent , You can improve the image quality by using super score technology on your mother's mobile phone .
Classification and development direction of super-resolution Technology
In a broad sense, super-resolution technology includes 3 In this case : Single image super-resolution 、 Super resolution reconstruction of single frame image from multi frame continuous image 、 Super resolution reconstruction of video sequence .
Single image amplification mainly uses the prior knowledge of high-resolution images and high-frequency information in the form of aliasing to restore . In the latter two cases, in addition to using prior knowledge and single image information , The complementary information between adjacent images can also be used for super-resolution reconstruction , Get a high-resolution image with higher resolution than any low resolution image , However, these two situations often bring unacceptable computational costs and the risk of discontinuous reconstruction of adjacent frames . therefore , On actual landing , Prefer single image super-resolution Technology .
Classify according to time and effect , Single image super-resolution algorithms can be divided into Traditional algorithms and Deep learning algorithm Two types of .
Traditional super-resolution reconstruction algorithm
Traditional super-resolution reconstruction algorithms mainly rely on basic digital image processing technology for reconstruction , The common ones are as follows :
Super resolution reconstruction based on interpolation : Based on interpolation, each pixel in the image is regarded as a point on the image plane , Then the estimation of super-resolution image can be seen as the process of fitting the known pixel information to the unknown pixel information on the plane , This is usually done by a predefined transform function or interpolation kernel . The method based on interpolation is simple 、 Easy to understand , But there are obvious defects . The restored image is often blurred 、 Sawtooth, etc . Common interpolation based methods include nearest neighbor interpolation 、 Bilinear interpolation and bicube interpolation .
Super resolution reconstruction based on degenerate model : This kind of method starts from the degradation model of image , It is assumed that the high-resolution image has undergone appropriate motion transformation 、 Blur and noise to get a low resolution image . This method extracts key information from low resolution images , Combined with the prior knowledge of unknown super-resolution image, the generation of super-resolution image is constrained . Common methods include iterative back projection 、 Convex set projection method and maximum a posteriori probability method .
Super resolution reconstruction based on learning : The learning based method is to use a large amount of training data , Learn some correspondence between low resolution images and high resolution images , Then the high-resolution image corresponding to the low-resolution image is predicted according to the learned mapping relationship , So as to realize the super-resolution reconstruction process of the image . Common learning based methods include manifold learning 、 Sparse coding method .
Super resolution reconstruction algorithm based on deep learning
SRCNN It is the first attempt of deep learning method in super-resolution problem , It's a simple convolution network , from 3 It's made up of two convolution layers , Each convolution layer is responsible for different functions . The first convolution layer is mainly responsible for extracting high-frequency features , The second convolution layer is responsible for nonlinear mapping from low definition features to high definition features , Finally, the role of the convolution layer is to reconstruct high-resolution images .SRCNN Our network structure is relatively simple , The super-resolution effect also needs to be improved , However, it establishes the basic idea of deep learning method in dealing with super-resolution problems . The later deep learning method , Basically follow this idea to carry out super-resolution reconstruction .
Later, ESPCN be based on SRCNN Some improvements have been made , However, due to the limited capacity of network reconstruction , The super-resolution effect is not particularly ideal . Because at that time , There are some problems in the training of deep convolution network . In general, for convolutional neural networks , As the number of network layers increases , Performance will also increase , But in practice , People find that when the number of network layers increases to a certain extent , Because of the back propagation principle , There will be the problem of gradients disappearing , The convergence of the network becomes worse , Reduced model performance . This question goes on until ResNet After putting forward the residual network structure , To get a better solution . But it is worth noting that ,ESPCN The sub-pixel convolution layer is proposed for the first time in the network , It eliminates the pre acquisition operation before the low resolution image is sent to the neural network , Greatly reduced SRCNN Amount of computation , Improved reconstruction efficiency .
VDSR It is the first application of residual network and residual learning in super-resolution problem , Increase the number of super-resolution layers to the first time 20 layer . Using residual learning , Residual characteristics of network learning , The network converges fast , Be more sensitive to details . Later, some convolutional neural networks proposed more complex structures , such as RGAN This paper proposes to use generative countermeasure network to generate high resolution image ,SRGAN from 2 Part of it is made up of , One is generating networks , The other is the discriminant network . The function of generating network is to generate a high resolution image based on a low resolution image , The function of discriminant network is to judge the high resolution image generated by the generated network as false , So when the network is training , There is a constant game between the generation network and the decision network , And finally to balance , So as to generate a high resolution image with realistic detail texture , It has better subjective visual effect . Other deep convolution network methods, such as SRDenseNet、EDSR、RDN, Using a more complex network structure , The convolution layer of the network is getting deeper and deeper , The super-resolution effect on single image is also getting better and better .
However , Due to high computing costs and memory consumption , Many jobs are difficult to deploy on equipment with limited resources . So , Super-resolution and efficient model design has also attracted widespread attention .FSRCNN Adopt a compact hourglass architecture to accelerate for the first time SR The Internet ;DRCN and DRRN The recursive layer is used to build a deep network with fewer parameters .CARN By combining effective residual blocks with group convolution , Less SR Network computing . Attention mechanism is also introduced to find the area with the largest amount of information , To better reconstruct high-resolution images . in addition , Knowledge distillation has also been cited in lightweight super-resolution networks , To improve their performance .
The challenge of real-time video super-resolution
In the era of mobile Internet , As the most important platform for video content, mobile terminal , Responsible for a large number of PGC and UGC Playback of video content , But limited by the insufficient generalization ability of the model , The computing power of the mobile terminal is limited , The algorithm has high computational complexity , be based on AI The following characteristics of super-resolution algorithm make it face great challenges in real-time deployment on mobile terminals :
The subjective effect is poor , Directly use the above-mentioned super-resolution algorithm based on deep learning , You will find its subjective effect and Bicubic Such as traditional algorithms , The effect of improving video quality is very limited .
educational circles SOTA Method the network model parameters are too large , Even many networks that are lightweight , Parameter quantities are also greater than 500K, This leads to excessive computation of the model , Reasoning is slow , It cannot meet the requirements of real-time video processing on mobile terminals .
Yunxin AI Over score
Training data based on real down sampling
The existing training data of super-resolution algorithm based on deep learning , Often through Bicubic Or other known down sampling methods . However, the real scene is often not , As a result, there is a large gap between the model training data and the actual prediction data , So the effect of the super score algorithm is not ideal .
We adopt a real down sampling generation method which is also based on the confrontation generation network . As shown in the figure below , For a high resolution image , Let's train the sampling generator G Discriminator D, bring G The generated low resolution image is close to the real low resolution image , So as to get the real down sampling G. Got it G Later, we can use high-resolution images to generate a large number of training data pairs that meet the degradation of real down sampling .
Yunxin super resolution algorithm
Netease Yunxin video lab proposes an edge oriented and efficient feature distillation network (EFDN), stay 2022 year CVPR NTIRE High efficiency super-resolution challenge ,Overall Performance Racetrack Yunxin won the first place with obvious advantages ,Runtime The track finished third .
Complete report of the competition :
https://arxiv.org/abs/2205.05675
This method is to improve the accuracy of the model and reduce the cost of the model , Based on the idea of structural heavy parameters , In the training phase, the edge oriented convolution block is used (ECB) Instead of residual characteristic distillation module (RFDB) Medium SRB Shallow residual block , In the reasoning stage, the edge oriented convolution block (ECB) Convert to normal 3x3 Convolution layer , This method can extract texture information and edge information of image more efficiently , Improve network performance while reducing overhead ; At the same time, it can enhance spatial attention (ESA) Module for tailoring , Reduce the number of parameters and increase the step size of the pool layer , The algorithm overhead is further reduced .
In order to further implement the project , Get a model that can run in real time on mobile devices , Comprehensive efficiency and effect , Yunxin team adopts the following optimization methods :
The model of compression : In the actual landing process , In order to meet the requirements of real-time processing , We are CVPR NTIRE 2022 High efficiency super score challenge competition model - An edge oriented efficient feature distillation module (EFDN) On the basis of , Use the passage to prune , Knowledge distillation and other model compression technologies further reduce the redundant parameters in the optimized model architecture , Remove channels that contribute little to model performance , To reduce the complexity of the model . At the same time, quantization technology is used to store the weight in low bits , So as to reduce the model volume , Accelerate Computing .
Engineering optimization : Calculate power on mobile devices 、 With limited memory bandwidth , It should not only meet the requirements of real-time video processing with super division algorithm , It can't increase too much power consumption , The requirements for engineering deployment are very high . Our optimization on the engineering side is mainly through SIMD, Model memory optimization , Optimization methods such as data layout optimization save memory overhead and reasoning time , At the same time, the algorithm realizes zero copy of memory between rendering pipeline and device by deeply combining business scenarios , Complete the high-performance landing of the algorithm .
The following table shows yunxinchao on different platforms / A single frame on the device takes time .
Effect display and future outlook
The video super division of mobile terminal can break through the limitation of codec and the efficiency bottleneck of video transmission , Optimize video transmission speed 、 User experience such as playback fluency , And bring many practical effects :
Improve video clarity , Take advantage of the high resolution of the high-end computer screen , Low definition video HD playback , HD video provides Ultra HD image quality , Improve users' video consumption experience .
Reduce bandwidth , Through the sender / The server reduces the resolution of transcoding and distributing video , Combined with the super-resolution processing at the receiving end, it presents a high-resolution effect , Lower the threshold of HD playback , Improve fluency , Reduce user network pressure .
future , We will continue to optimize video enhancement algorithms, including hyperfractional algorithms , Create the industry's top image restoration and image enhancement technology , Help customers improve video quality , Reduce video playback costs , Provide lower consumption of time , Lower power consumption , Better subjective quality , Algorithms that cover more models and save more bit rates , Let users on different mobile phones 、 Enjoy Ultra HD video experience in different network environments .
Zuo Wei Bicubic Upsampling result , On the right is the result of super fractional optimization
Reference material
[1] Yawei Li, Kai Zhang, Luc Van Gool, Radu Timofte, et al. Ntire 2022 challenge on efficient super-resolution: Methods and results. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2022.
[2] Zheng Hui, Xinbo Gao, Yunchu Yang, and Xiumei Wang. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the ACM International Conference on Multimedia, pages 2024–2032, 2019.
[3] Jie Liu, Jie Tang, and Gangshan Wu. Residual feature distillation network for lightweight image super-resolution. In European Conference on Computer Vision Workshops, pages41–55. Springer, 2020.
[4] Zhang, Xindong and Zeng, Hui and Zhang, Lei. Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices. In Proceedings of the 29th ACM International Conference on Multimedia, pages4034--4043. 2021.
边栏推荐
- [Blue Bridge Cup training 100 questions] sort scratch from small to large. Blue Bridge Cup scratch competition special prediction programming question centralized training simulation exercise question
- IP netns command (memo)
- 炒股如何开户?请问一下手机开户股票开户安全吗?
- 6.关于jwt
- 微信网页调试8.0.19换掉X5内核,改用xweb,所以x5调试方式已经不能用了,现在有了解决方案
- 2022.07.05
- Charles+drony的APP抓包
- 企业MES制造执行系统的分类与应用
- A hodgepodge of ICER knowledge points (attached with a large number of topics, which are constantly being updated)
- 【HDU】5248-序列变换(贪心+二分)「建议收藏」
猜你喜欢
Wechat web debugging 8.0.19 replace the X5 kernel with xweb, so the X5 debugging method can no longer be used. Now there is a solution
Basic operation of chain binary tree (implemented in C language)
Version 2.0 of tapdata, the open source live data platform, has been released
5billion, another master fund was born in Fujian
ES6笔记一
I feel cheated. Wechat tests the function of "size number" internally, and two wechat can be registered with the same mobile number
数据验证框架 Apache BVal 再使用
Review of network attack and defense
Numpy——axis
Redis cluster and expansion
随机推荐
數據驗證框架 Apache BVal 再使用
ip netns 命令(备忘)
How to estimate the value of "not selling pens" Chenguang?
编译原理 实验一:词法分析器的自动实现(Lex词法分析)
How to implement safety practice in software development stage
Embedded interview questions (algorithm part)
面试唯品会实习测试岗、抖音实习测试岗【真实投稿】
unity2d的Rigidbody2D的MovePosition函数移动时人物或屏幕抖动问题解决
国内首次!这家中国企业的语言AI实力被公认全球No.2!仅次于谷歌
SlashData开发者工具榜首等你而定!!!
5billion, another master fund was born in Fujian
Cadre de validation des données Apache bval réutilisé
3.关于cookie
Redis
How to choose the appropriate automated testing tools?
UVALive – 4621 Cav 贪心 + 分析「建议收藏」
ES6 note 1
[Blue Bridge Cup training 100 questions] sort scratch from small to large. Blue Bridge Cup scratch competition special prediction programming question centralized training simulation exercise question
链式二叉树的基本操作(C语言实现)
前首富,沉迷种田