当前位置:网站首页>Behind the ultra clear image quality of NBA Live Broadcast: an in-depth interpretation of Alibaba cloud video cloud "narrowband HD 2.0" technology
Behind the ultra clear image quality of NBA Live Broadcast: an in-depth interpretation of Alibaba cloud video cloud "narrowband HD 2.0" technology
2022-07-05 14:46:00 【Alibaba cloud video cloud】
It ended half a month ago NBA In the finals , Baishi TV As the only one in the whole network “ The host will watch it with you NBA” Mode live broadcast platform , With “ Accompanying type ” To face the competition of content differentiation . meanwhile , Baishi TV And used “ Narrowband HD 2.0” Live transcoding technology , Create a further experience for the audience in terms of the quality of the game picture .
Simply speaking ,“ Narrowband HD ” It's a set of “ The subjective experience is the best ” Video coding technology for optimization purposes , Let's look at a comparison chart , Feel the image quality improvement effect :
The above picture is the original picture of the anchor streaming , The following figure shows the repaired picture
The top half of the above picture is the original picture of the anchor streaming , The second half uses narrowband HD 2.0 The picture after technical transcoding . You can see , After narrowband HD 2.0 Technical transcoding , The number on the shirt 、 English letters on the floor 、 The nets 、 The boundary line becomes clearer . Besides , The overall clarity of the picture has been significantly improved , Even the texture of the floor and the outline of the audience outside will become clearer with the naked eye .
The following will be interpreted in depth as NBA Behind the super clear image quality brought by live broadcast events “ Narrowband HD ” Technical principle .
1. Narrowband HD technology
Alibaba cloud was in 2015 It has been put forward since “ Narrowband HD ” The concept of , stay 2016 In, the narrowband HD technology brand was officially launched and commercialized . Narrow band HD represents a kind of video service concept that reconciles cost and experience , It is a video coding technology based on the best subjective feeling of human eyes .
“ Narrowband HD ” Sketch Map
Narrowband HD is essentially a problem of quality improvement and compression , The main goal is to pursue quality 、 Optimal balance of bit rate and cost . There are two versions in this direction , Narrowband HD 1.0 And narrowband HD 2.0( hereinafter referred to as “ Narrow and high ”).
Narrow and high 1.0 It's a balanced version , The main function is how to achieve adaptive content processing and coding with the least cost , Save bit rate and improve image quality . therefore , In narrow high 1.0 Make full use of the information in the encoder to help video processing , That is, low-cost adaptive content processing and coding can be achieved by using low-cost preprocessing methods . meanwhile , In the encoder , Mainly based on subjective code control .
Narrow and high 2.0 And narrow height 1.0 There will be more 、 Fuller and more complex technologies to ensure adaptive ability , Include JND Adaptive content encoding 、ROI code 、SDR+、 More natural details enhancement, etc . meanwhile , In narrow high 2.0 The ability to repair content that is more suitable for high heat content has been added , While improving quality , Bit rate savings are also more .
2. The challenge of live broadcast
At present , Narrow band HD technology in long video 、 Short video 、 Pan Entertainment 、 Online education 、 E-commerce live broadcast and other scenarios are widely used .
Compared with scenes such as long video and e-commerce live broadcast ,NBA The live broadcast of basketball match is fast due to the screen switching 、 Very sporty , High bit rate streams are often required . However , High bit rate live broadcast, especially NBA Game live broadcast may be subject to network quality fluctuations in cross-border transmission , Cause audio and video jam and delay .
In order to ensure the stability of the live broadcast and the silk skating viewing experience based on the playback end , Baishi TV The source stream with smaller bit rate is selected . therefore , Facing multiple challenges in real scenes :
Challenge 1: The low bit stream leads to the blur and distortion of the game picture
Compared with the picture quality of high bit rate stream , Low bit rate streams will have obvious compression distortion 、 Blurred details and weak textures are lost . For basketball game scenes , It will cause the words on the planet's clothing to be blurred 、 The net is blurry 、 There are many image quality phenomena such as too many burrs on the boundary lines and text edges on the ground , Resulting in poor viewing experience .
Challenge 2: Pictures of strenuous exercise “ De interlacing ” Residual
In addition to the compression distortion caused by low bit rate streams, the details are blurred , There is also a unique problem with sports scenes , That is, the original signal is generally acquired by interlaced scanning , In Internet transmission, we need to do “ De interlacing ”, But for violent sports pictures , It's hard to guarantee perfect de interlacing , There are usually some “ staggered ” Not removed completely , Form some residual noise .
Challenge 3: Image loss after several transcoding
Besides , Based on the current business logic of enterprise customers , Live video from shooting to end users , Experienced several transcoding , Every transcoding , Will bring some compression distortion and loss of image quality .
To better balance the smoothness of live broadcast 、 Stability and HD quality experience , Baishi TV stay NBA In the final broadcast process, a relatively low bit rate is selected to achieve stable cross-border transmission , Pull the source stream to China and then repair it , In the process , Baishi TV We used Alibaba cloud video cloud “ Narrowband HD 2.0” technology .
3. Solutions for sports events
For sports video , If you simply use Alibaba cloud online conventional narrowband HD transcoding , There are two disadvantages :
First of all , It is difficult to repair the unique noise in sports video , At the same time, it is possible to amplify some noise , Thus affecting the viewing experience .
second , Conventional narrow-band HD can't deal with the unique elements of basketball scenes, such as the numbers on the jerseys 、 The nets 、 Perfect repair of boundary lines, etc .
So , Narrowband HD 2.0 For sports events , The existing atomic algorithm capabilities are optimized , At the same time, some algorithms optimize the basketball game scene .
The final transcoding process is shown in the figure below :
Live transcoding algorithm process
4. Analysis of key technologies
4.1 Video processing
Extreme repair generation
As mentioned earlier, the image quality of our input source itself is not high , At the same time, it has been transcoded many times , Therefore, the first processing step is repair generation , Its main purpose is to fix many flaws in the video , Such as compressed block effect 、 Compression artifacts 、 Edge burrs 、 Residual noise after de interleaving 、 Fuzzy etc , At the same time, some detail textures lost due to compression are generated .
There are many scholars who use deep learning to compress distortion 、 Specialized in the research of deblurring . For example, early image compression ARCNN[1], Do video compression MFQE[2], Early end-to-end deblurring Algorithm DeepDeblur[3].
Relatively new methods are : Image de compression algorithm with its own compression degree estimation FBCNN[4], Video compression algorithm based on deformable convolution STDF[5], There is no need for nonlinear activation NAFNet[6] wait .
Most of these algorithms are aimed at constructing data sets and designing network structure for model training for a single task , The resulting model can only deal with a single degradation type , But in this case, Baishi TV NBA Game live transcoding , The video we want to process contains many kinds at the same time “ Degradation ”, In addition to typical video compression , And the camera is out of focus and blurred / Motion blur , Residual noise after de interleaving .
Image de compression algorithm ARCNN Network structure
VIDEO DE compression algorithm MFQE Network structure
End to end deblurring Algorithm DeepDeblur Network structure
In order to solve the above many “ degeneration ”, One way is to train a model for each degradation , Then run these models in turn . The advantage of this method is that the task of each model becomes simpler , It is convenient to construct data sets and train , But in actual use, the effect is not good , Because other degradation will bring great interference , This leads to a sharp decline in algorithm performance .
therefore , We used the second way , That is to use a model to deal with a variety of degradation . The advantage of the second method is that it can achieve a relatively better processing effect , The difficulty lies in the complex construction of training data , Higher requirements for network capacity , Multiple degradation modes need to be considered at the same time , There can also be a variety of permutations .
In terms of training data construction , We draw lessons from the field of image hypersegmentation BSRGAN[7]/Real-ESRGAN[8] And video hyperdomain RealBasicVSR[9] Data degradation in , At the same time, some degradation modes unique to the live broadcast scene of sports events are added to simulate the sawtooth at the boundary line of the venue 、 Defects such as white edges .
In terms of network structure , In order to reduce the amount of calculation , We use single image processing , Classic ESRGAN[10] Model or common UNet[12] structure , Or ResSR[13] Mentioned VGG-Style structure .
In terms of the loss function , Considering the need to repair the details lost due to various degradation , In addition to using common L1/L2 loss Outside , Also used. percectual loss and GAN loss.
BSRGAN Proposed a variety of image degradation methods
be based on GAN One of the main problems of the generated network is that its robustness and time-domain continuity are not good enough . The problem of robustness is whether we can stably generate more natural textures , Such as some GAN Sometimes the detail texture generated by the model is strange and unnatural , Especially when some strange textures are generated in the face area, it will be terrifying .
The problem of temporal continuity refers to whether the textures generated by adjacent frames are consistent , If inconsistent, flickering will occur , Reduce the viewing experience .
To solve the robustness problem , Especially the robustness of face region , We learned from LDL[14] Pass detection in fine-scale details Area and impose additional punishment to improve fine-scale details The idea of generating effects , The face region is obtained by face region segmentation , Impose additional penalties on the facial region generation effect to improve the robustness of facial region detail generation .
Face region segmentation
For the problem of time domain continuity , We used TCRnet The network is used as an additional monitoring signal to improve .TCRnet The network was originally used for super task , Through simple modification, it can be used for repair tasks , The network uses IRRO Offset iterative correction module combined with deformable convolution , To improve the accuracy of motion compensation , Simultaneous utilization ConvLSTM Compensation of timing information to prevent information error , So as to improve the time domain continuity .
TRCNet Network structure
The following two pictures compare the source and the effect after repair .
It can be seen from the first comparison chart , Letters on the repaired floor GARDEN The edges of become very clear and sharp , Boundary line 、 Players' silhouettes and numbers on their jerseys 22 Also become clearer , In addition, the floor texture has also been repaired .
In the second comparison picture, you can also see that the outline of the audience outside and the lines on their clothes become clearer , In addition, the boundary line of the floor, which was originally twisted into a jagged shape, has also been straightened .
Model acceleration
In order to achieve the ultimate repair generation effect , Based on deep learning AI Algorithm is usually the preferred algorithm . But one problem of deep learning algorithm is the large amount of computation , For Video repair, this low level For visual tasks , The amount of calculation is more than ordinary high level The visual task is much larger .
One side , The input of the video restoration generation model is usually the original resolution of the video , And such as detection and classification high level Processing the input resolution of the model , It can be much smaller than the original resolution , And basically does not affect the detection and classification performance . And the same network structure , The larger the input resolution, the greater the amount of calculation , So the computation of video restoration model is much larger .
On the other hand , The output of the video restoration generation model is a video frame with the same resolution as the input video , This is bound to make the calculation of the second half of the model will be very large , Because the latter part also needs to be calculated on the relatively high-resolution feature map , Unlike detection classification high level The task only outputs the semantic information of the target box or category , Although there are many channels in the second half of the model, the overall amount of calculation is much smaller because the resolution of the characteristic image is small .
Besides , For live broadcast of sports events , The video frame rate is usually 50fps, The resolution of the blue light gear is usually 1080p, That is, the deep learning model in 1080p You need to run at least 50fps, This is a great challenge to the deep learning algorithm .
In this case , We accelerate model reasoning from multiple dimensions .
First , Compress the deep learning model , For example, search through neural architecture (Neural Architecture Search,NAS) Or prune to reduce the size of the model , In order to compensate for the performance loss after the model becomes smaller , The compressed model , Carry out knowledge distillation training to improve the performance of small models , In addition, you can go through 8bit Integer quantization or FP16 Half precision to further reduce the amount of calculation .
secondly , You can achieve the ultimate speed improvement by selecting the appropriate hardware and reasoning framework , For example, use high performance GPU The card and the supporting reasoning framework realize the optimal configuration . In order to further improve the reasoning speed , You can also use more GPU Card Parallel Computing .
Accelerate in many ways above , stay 1080p Resolution input , Processing speed from 8fps Upgrade to 67fps, Completely satisfied 50fps Live transcoding requirements .
Deep learning algorithm accelerates classification
Sharpness enhancement
In order to improve the viewing experience , Based on the above extreme repair generation , Further clarity enhancement processing .
The simplest definition enhancement algorithm is sharpening , such as ffmpeg Self contained unsharp and cas There are two simple sharpening algorithms .unsharp and cas Both methods are based on USM(UnSharp Mask) frame-designed ,USM The framework can use the following formula [15] To describe :
among ,original Image to be sharpened ,blurred yes original Fuzzy version , For example, Gauss blurred version , This is also unsharp The origin of the name .(original - blurred) It represents the details of the original image , multiply amount Then superimpose it on the original image , You can get a picture with sharper details and clearer appearance sharpened.
In addition to sharpening , You can also adjust the contrast 、 brightness 、 Color and other methods to improve clarity . At blockbuster TV Live broadcast of basketball match , We use self-developed sharpening 、 brightness 、 Contrast and color enhancement algorithm to further improve the clarity .
among , Compared with open source sharpening algorithms such as unsharp, Alibaba cloud video cloud self-developed sharpening algorithm has the following characteristics :
- More refined image texture detail extraction : Can extract different sizes , Image texture structure with different features , The enhancement effect is better ;
- By analyzing the texture structure of image content , According to the complexity of regional texture, local region adaptive enhancement is realized ;
- Combined with coding , Feedback according to the coding information of the encoder , From adaptation, adjustment and enhancement strategies .
Detail enhancement ( sharpening ) Algorithm flow
4.2 Rate allocation
JND
Through the previous extreme repair generation and sharpness enhancement , The details have been greatly increased , At the same time, we hope to preserve this information as much as possible after compression and coding . We know , Traditional video coding is based on information theory , So it's doing time domain redundancy all the time 、 Airspace redundancy 、 Statistical redundancy, etc. redundancy removal , But the mining of visual redundancy is far from enough . The following figure is taken from an article by Dr. Wang Haiqiang paper, The idea is to do it in a traditional way RDO , It's a continuous convex curve , But in people's eyes, it's a ladder , Then as long as we find this ladder, we can save the bit rate , At the same time, it does not affect the subjective quality .JND(Just Noticeable Difference) It is based on this idea to mine visual redundancy .
The relationship between bit rate and perceptual distortion
Alibaba cloud video cloud self-developed JND The algorithm is divided into two dimensions: spatial domain and time domain , Fully excavate the visual redundancy , It is implemented in a general scenario , Same subjective quality bit rate savings 30% above .
With this self-study JND Algorithm , The detailed information obtained through extreme repair generation and sharpness enhancement is encoded at a lower bit rate , Still preserved .
JND Algorithm flow
ROI
As mentioned earlier JND The algorithm can save 30% The above bit rate , But this bit rate saving is entirely based on low level Statistical information , No consideration high level Semantic information .
For the close-up shots of the characters that the audience are very concerned about in the scene of sports events , We hope to make the close-up of the characters more clearly present in front of the audience . In addition to obtaining clear character close-up through extreme repair generation , And some way to make the coding still clear . Here it is , It needs our self-developed ROI Coding technology .
ROI(Region Of Interest) Coding is a video coding technology based on region of interest , Simply put, allocating more bit rate to the region of interest in the image has improved the image quality , Allocate less bit rate to other regions of interest , It can improve the overall video viewing experience with the overall bit rate basically unchanged .
ROI The main difficulty of coding is :
1) The cost should be low enough and the speed should be fast enough ROI Algorithm , To meet the requirements of live broadcast of sports events with high resolution and high frame rate ;
2) How to base on ROI Make code control decisions , bring ROI Regional subjective quality improvement , Not ROI The regional subjective does not decline , At the same time, keep the time domain continuous without flickering .
At low cost ROI Calculation , We have developed a face detection and tracking algorithm based on adaptive decision , That is, most of the time, we only need to do face tracking with minimal computation , Face detection is only needed in a small part of the time , So as to achieve ultra-low cost and fast ROI obtain , While maintaining high accuracy .
In the code control decision , On the one hand, it is combined with encoder , Strike a balance between subjectivity and objectivity , Keep the time domain consistent ; On the other hand, with JND combination , stay ROI He Fei ROI Achieve a subjective balance , So as to realize the scene 、 Quality adaptive bit rate allocation .
ROI Algorithm flow
4.3 Coding kernel
For live sports events , In terms of video coding kernel , We have done subjective fast partition optimization and block effect optimization , To improve the subjective definition of the compressed video , Reduce blocking effect , So as to improve the overall viewing experience .
Subjective block partitioning
The block partition mode decision of the encoder is based on the optimal rate distortion model RDO (Rate Distortion Optimization, Rate distortion optimization ) To make decisions :
among D Indicates distortion ,R Indicates what is required to encode the current mode bit Count .
In the block partition decision , Sometimes the final decision is made in large pieces , But subjectively, the result of dividing into small pieces is better . This is because the block pattern is distorted D Bigger , but R smaller , This leads to the final decision of the encoder to divide into large blocks .
In this case , We modify the distortion expression of different block partition modes , Add different weight coefficients for blocks of different sizes , Make the final division result more consistent with the subjective .
Before optimization After optimization
Optimize the front block partition Optimized block partition
Block effect optimization
The rate distortion theory of video coding is more appropriate to human eyes , The encoder constructed according to the rate distortion theory is also the optimization of the subjective quality of the human eye , The only problem is the block effect , Because the human eye will magnify the straight line , Sensitive to block effect .
We observed that , Based on objective RDO(Rate Distortion Optimization, Rate distortion optimization ), Encoding some modes will amplify the block effect , and 265 In the agreement deblock Fail in this scenario . At the same time, we found that in the flat area scene , The effect of blur plus noise is better than clear block effect .
Based on the above observations , We adopt the following block effect optimization strategy to minimize the block effect , Enhance the viewing experience .
Block effect optimization algorithm flow
The following figure shows the comparison before and after block effect optimization . It can be seen that , The block effect is significantly reduced in the optimization results on the right .
Before optimization After optimization
4.4 Video effects show
Through the aforementioned video processing 、 Rate allocation optimization and coding kernel optimization , Finally achieve the ultimate image quality repair and 1080p Next 50fps Live transcoding , Provide fluency for the audience 、 Stable and HD viewing experience .
On the left is the source stream effect , On the right is the effect after repair
thus it can be seen , Through with Baishi TV Of NBA Competition cooperation , It fully embodies “ Narrowband HD 2.0” The important value of technology in improving visual experience in the live broadcast of basketball events , It brings more economical flow under the same image quality 、 Under the same bandwidth, the commercial significance of higher definition is balanced with the viewing sense .
future , Narrowband HD technology will also continue to upgrade , Further improve the repair generation effect through the algorithm ability 、 Reduce bit rate and optimization cost . At the same time , This technology will also be applied to more top-level events , On the basis of cost optimization and reconciliation , Realize a new upgrade of visual experience .
reference :
[1] ARCNN:Chao Dong, et al., Compression Artifacts Reduction by a Deep Convolutional Network, ICCV2015
[2] MFQE:Ren Yang, et al., Multi-Frame Quality Enhancement for Compressed Video, CVPR2018
[3] DeepDeblur:Seungjun Nah, et al., Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring, CVPR2017
[4] FBCNN:Towards Flexible Blind JPEG Artifacts Removal, ICCV2021
[5] STDF:Jianing Deng, et al., Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement, AAAI2020
[6] NAFNet:Liangyu Chen, et al., Simple Baselines for Image Restoration,https://arxiv.org/abs/2204.04676
[7] BSRGAN: Kai Zhang, et al., Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, CVPR2021
[8] Real-ESRGAN: Xintao Wang, et al., Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data, ICCVW2021
[9] RealBasicVSR: Kelvin C.K. Chan, et al., Investigating Tradeoffs in Real-World Video Super-Resolution, CVPR2022
[10] ESRGAN: Xintao Wang, et al., ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks, ECCVW2018
[11] ESRGAN: Xintao Wang, et al., ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks, ECCVW2018
[12] UNet: Olaf Ronneberger, et al., U-Net: Convolutional Networks for Biomedical Image Segmentation, MICCAI2015
[13] RepSR: Xintao Wang, et al., RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization, https://arxiv.org/abs/2205.05671
[14] LDL:Jie Liang, et al., Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution, CVPR2022
[15] USM:https://en.wikipedia.org/wiki/Unsharp_masking
「 Video cloud technology 」 Your most noteworthy audio and video technology official account , Push practical technical articles from alicloud every week , Here to exchange views with first-class engineers in audio and video field . Official account back office reply 【 technology 】 You can join Alibaba cloud video cloud product technology exchange group , Discuss audio and video technology with big players in the industry , Get more up-to-date information about the industry .
边栏推荐
- I collect multiple Oracle tables at the same time. After collecting for a while, I will report that Oracle's OGA memory is exceeded. Have you encountered it?
- 729. My schedule I: "simulation" & "line segment tree (dynamic open point) &" block + bit operation (bucket Division) "
- Does maxcompute have SQL that can query the current storage capacity (KB) of the table?
- Catch all asynchronous artifact completable future
- 【華為機試真題詳解】歡樂的周末
- Photoshop plug-in - action related concepts - actions in non loaded execution action files - PS plug-in development
- Topology visual drawing engine
- Postgresql 13 安装
- SaaS multi tenant solution for FMCG industry to build digital marketing competitiveness of the whole industry chain
- 启牛证券账户怎么开通,开户安全吗?
猜你喜欢
Implement a blog system -- using template engine technology
有一个强大又好看的,赛过Typora,阿里开发的语雀编辑器
可视化任务编排&拖拉拽 | Scaleph 基于 Apache SeaTunnel的数据集成
Thymeleaf th:with use of local variables
Thymeleaf 模板的创建与使用
【NVMe2.0b 14-9】NVMe SR-IOV
[12 classic written questions of array and advanced pointer] these questions meet all your illusions about array and pointer, come on!
PyTorch二分类时BCELoss,CrossEntropyLoss,Sigmoid等的选择和使用
Online electronic component purchasing Mall: break the problem of information asymmetry in the purchasing process, and enable enterprises to effectively coordinate management
Run faster with go: use golang to serve machine learning
随机推荐
TS所有dom元素的类型声明
[C question set] of Ⅷ
Shanghai under layoffs
Explain Vue's plan to clean up keepalive cache in time
What about SSL certificate errors? Solutions to common SSL certificate errors in browsers
Detailed explanation of usememo, memo, useref and other relevant hooks
快消品行业SaaS多租户解决方案,构建全产业链数字化营销竞争力
外盘入金都不是对公转吗,那怎么保障安全?
Chow Tai Fook fulfills the "centenary commitment" and sincerely serves to promote green environmental protection
想问下大家伙,有无是从腾讯云MYSQL同步到其他地方的呀?腾讯云MySQL存到COS上的binlog
一键更改多个文件名字
Share 20 strange JS expressions and see how many correct answers you can get
Implement a blog system -- using template engine technology
Niuke: intercepting missiles
【C 题集】of Ⅷ
【学习笔记】图的连通性与回路
我这边同时采集多个oracle表,采集一会以后,会报oracle的oga内存超出,大家有没有遇到的?
安装配置Jenkins
js亮瞎你眼的日期选择器
Strong connection component