当前位置:网站首页>H. 265 introduction to coding principles
H. 265 introduction to coding principles
2022-07-05 09:45:00 【Baidu geek said】
The purpose of video coding is to compress the original video , The main idea of compression is from space 、 Time 、 code 、 Remove redundant information from several main angles such as vision . because H.264 Excellent data compression ratio and video quality , It has become the most popular codec standard in the current market . and H.265 Is in H.264 On the basis of , Ensure the same video quality , The bit rate of the video stream can also be reduced 50%. With H.265 Coding formats are becoming more and more popular , This article will mainly introduce H.265 The coding principle of , Here are H.265 Coding framework flow chart .
01 Coding structure
H.265 The coding structure is divided into Video coding layer (VCL) and Network extraction layer (NAL).
VCL:Video Coding Layer, It mainly includes the definition of video compression engine and image segmentation syntax , Original video in VCL layer , Encoded into video data . The coding process for the simple version is as follows :
Block the image of each frame , Add block information to the code stream ;
Predictive coding of unit blocks , Intra prediction generates residuals , Inter prediction is used for motion estimation and motion compensation ;
Transform the residuals , Quantize the transform coefficients 、 scanning .
For the quantized transformation coefficient 、 Sports information 、 Entropy coding of prediction information, etc , Form a compressed video bitstream output .
NAL:Network Abstraction Layer, It mainly defines the encapsulation format of data , hold VCL The generated video data is encapsulated one by one NAL Data package of the unit , Adapt to different network environments and transmit .
02 Block
In terms of coding sequence and structure ,H.265 First, a video is divided into several sequences , A sequence is divided into several Image group (GOP), every last GOP Represents a set of consecutive video frames .H.265 When performing predictive coding and transform coding on an image , The image will be divided first , The partition method is quadtree . When dividing a quadtree , The whole video frame will be divided into several squares Coding tree block (CTB),CTB It can be further divided into Code block (CB),CB It can also be divided into Prediction block (PB) and Transform block (TB). therefore ,H.265 The structure of the video is divided as shown in the following figure :
A brightness at the same position CB And two shades CB , Add some corresponding grammatical elements , Form a coding unit (CU).CU Is the decision to perform intra prediction 、 Inter prediction 、Skip/Merge Unit of pattern .
A brightness at the same position CTB And two shades CTB , Add some corresponding grammatical elements , And what it contains CU , Form a coding tree unit (CTU).CTU amount to H.264 Macroblocks in , The difference is that CTU The size of is determined by the encoder , It can support up to 64x64, It can support up to at least 16x16. The size of the macroblock is fixed to 16x16.
One CTU When coding , Proceed in depth first order CU code , Like a quadtree in a data structure , A large square represents the parent node , There are four small squares representing four child nodes .
03 forecast
The essence of video is composed of a series of continuous video frames , There is a lot of redundancy within a single video frame and between multiple video frames . From a spatial perspective , The difference of pixel values between pixels within a single video frame is very small . From the perspective of time , There are many same pixels between two consecutive video frames . Predictive coding is a method of data compression based on image statistical characteristics , It makes use of the correlation of images in time and space , The pixel currently being encoded is predicted from the reconstructed pixel data .
3.1 Intra prediction
Intra prediction refers to the pixels used for prediction and the pixels currently being encoded All in Within the same video frame , And generally in the adjacent area . Due to the strong correlation between adjacent pixels , Pixel values are generally very close , The probability of mutation is very small , The difference is 0 Or very small numbers . therefore , The difference between the predicted value and the real value is transmitted after intra prediction coding , namely 0 Nearby values , be called Prediction error or residual , In this way, less bits are used for transmission , Achieve the effect of compression .
H.265 Intra prediction coding is in blocks , The blocks being encoded are predicted using the reconstruction values of adjacent reconstructed blocks . The prediction component is divided into brightness and chroma , The corresponding prediction blocks are a luminance prediction block and a chrominance prediction block . In order to adapt to the content characteristics of HD video , Improve prediction accuracy ,H.265 More abundant prediction block sizes and prediction modes are adopted .
H.265 The size of the brightness prediction block is 4*4 To 32*32 Between , Prediction blocks of all sizes have 35 There are two prediction models , These prediction models can be divided into 3 class : Plane (Planar) Pattern 、 Direct (DC) Mode and angle (Angular) Pattern .
Planar Pattern : Brightness mode 0, It is suitable for areas with slow pixel value transformation , For example, the scene with pixel gradient . Different prediction values are used for each pixel in the prediction block . The predicted value is equal to : The average value of the linear interpolation of the pixel in the horizontal and vertical directions .
DC Pattern : Brightness mode 1, Suitable for large flat areas of images , This mode uses the same prediction value for all pixels in the prediction block .
If the prediction block is a square , The predicted value is equal to the average of the left and upper reference pixels ;
If the prediction block is rectangular , The predicted value is equal to the average value of the long side ;
Angle mode : Brightness mode 2~34, in total 33 Forecast directions , And the pattern 10 It's horizontal , Pattern 26 It's vertical . The predicted value of each pixel in the angle mode is offset from the sample value of the pixel set reconstructed before the corresponding prediction direction in the horizontal or vertical direction .
Because in color video , The characteristics of chrominance signal and luminance signal at the same position are similar , Therefore, the prediction modes of the chrominance prediction block and the brightness prediction block are similar .H.265 The prediction modes of the medium chroma prediction block are Planar Pattern 、 Vertical mode 、 Horizontal mode 、DC Schema and export schema 5 Kind of :
Planar Pattern : Chroma mode 0, And brightness mode 0 equally .
Vertical mode : Chroma mode 1, And brightness mode 26 equally .
Horizontal mode : Chroma mode 2, And brightness mode 10 equally .
DC Pattern : Chroma mode 3, And brightness mode 1 equally .
Export mode : Chroma mode 4, The same prediction mode as the corresponding brightness prediction block is used . If the corresponding brightness prediction block mode is 0、1、10、26 One of the , Then replace with the pattern 34.
3.2 Inter prediction
Inter frame prediction means that the pixel used for prediction and the pixel currently being encoded are not in the same video frame , But generally in the adjacent or nearby position . In general , The compression effect of inter prediction coding is better than that of intra prediction , The main reason is that the correlation between video frames is very strong . If the moving objects in the video frame change slowly , Then the pixel difference between video frames is very small , Time redundancy is very large .
The method of inter frame prediction to evaluate the motion of moving objects is motion estimation , Its main idea is to search the matching block from the given range of the reference frame for the prediction block , Calculate the relative displacement between the matching block and the prediction block , This relative displacement is the motion vector . After getting the motion vector , The forecast needs to be corrected , That is to say Motion compensation . Input the motion vector to the motion compensation module ," compensate " Reference frame , The predicted frame of the current encoded frame can be obtained . The difference between the predicted frame and the current frame , Is the inter prediction error .
If only the previous frame image is used for inter prediction , It is called forward inter prediction or unidirectional prediction . The prediction frame is P frame ,P The frame can refer to the previous I Frame or P frame .
If the inter prediction not only uses the previous frame image to predict the current block , The last frame of image is also used , So it's two-way prediction . The prediction frame is B frame ,B The frame can refer to the previous I The frame or P The frame and the following P frame .
because P The frame needs to refer to the previous I The frame or P frame , and B Frames need to refer to the front I The frame or P The frame and the following P frame , If in a video stream , Here we go first B frame , And dependent I frame 、P The frame hasn't arrived yet , Then the B The frame cannot be decoded immediately , So how to ensure the playback sequence ? Actually , In video coding , Will generate PTS and DTS. Usually , The encoder is generating a I After the frame , Will skip back a few frames , Use the front one I Frame as a reference frame pair P Frame encoding ,I The frame and P Frames between frames are encoded as B frame . The sequence of streaming video frames is already in accordance with... When encoding I frame 、P frame 、B The dependent order of the frames is arranged , After receiving the data, you can decode it directly . therefore , It is impossible to receive B frame , And then receive the dependent I The frame and P frame .
PTS:Presentation Time Stamp, Display time stamp , Tell the player when to display this frame .
DTS:Decoding Time Stamp, Decoding timestamp , Tell the player when to decode this frame .
04 Transformation
Transform coding is to transform the spatial domain signal mapping in the image into the frequency domain ( Frequency domain ), Then the generated transform coefficients are encoded . Because in the spatial domain , The correlation between the data is relatively large , The change of residual error after predictive coding is small , There is a lot of data redundancy , In the image, the flat area where the brightness value changes slowly is particularly obvious . And after transformation into frequency domain , It can convert the scattered residual data in spatial domain into centralized distribution , Can reduce correlation , Reduce data redundancy , So as to achieve the purpose of removing spatial redundancy .
stay H.265 in , A coding block (CB) It can be divided into several prediction blocks by quadtree (PB) And transform blocks (TB). As a result of CB To TB The quadtree partition between them is mainly for the transformation operation of residuals , Therefore, this kind of quadtree is also called Residual quadtree (RQT). As shown in the figure below , It's just one. RQT Divide examples , Will a 32*32 Residual of CB Divide into 13 A different size TB .
Every TB There are four sizes of , They are from 4*4、8*8、16*16、32*32, Every TB They all correspond to one Integer transformation coefficient matrix . Large size TB Suitable for flat areas where the brightness value of the image changes slowly , Small size TB It is suitable for complex areas with sharp changes in image brightness value . All sizes can be used Discrete cosine transform (DCT) Transformation . in addition , about 4*4 Intra prediction luminance residual block , Discrete sine transforms can also be used (DST).
Since the intra prediction coding is based on the data of the left and upper coded blocks , Therefore, the closer the prediction block is to the encoded block , The stronger the correlation , The smaller the prediction error ; The farther away from the encoded block , The less relevant , The greater the prediction error . This data distribution characteristic of prediction error is similar to DST Sine basis function of sin Very similar , Minimum starting point , And then it gets bigger . But because DST Calculation volume ratio DCT Big , Need to add more transform type identifiers , therefore DST Only used for 4*4 Intra prediction luminance residual block .
05 quantitative
Because transform coding only converts image data from spatial domain matrix to frequency domain transform coefficient matrix , The number of coefficients and the amount of data of the matrix have not decreased . To compress data , It is also necessary to quantize and encode the statistical features in the frequency domain .
Common quantitative methods can be divided into ** Scalar quantization (SQ) And vector quantization (VQ)** Two types of :
Scalar quantization : Divide the data in the image into several intervals , Then use one in each interval value Represents the value of all samples in this interval .
Vector quantization : Divide the data in the image into several intervals , Then use one in each interval It's a vector All vector values representing this interval .
Since vector quantization introduces the correlation between multiple pixels , And the method of probability is used , Generally, the compression ratio is higher than that of scalar quantization . But because of its high computational complexity , So the widely used quantization method is scalar quantization .
The quantized compression rate depends on the size of the partition , Quantization step size . The larger the quantization step , Indicates the coarser the quantization , The lower the corresponding video bit rate , The greater the distortion ; The smaller the quantization step , Indicates that the quantification is finer , The higher the corresponding video bit rate , The less distortion .
H.265 Quantification is based on ** Transformation unit (TU) by Basic unit ,** Processing objects include TU The luminance component and the chrominance component in .H.265 Nonlinear scalar quantization ,** By quantifying parameters (QP)** Control the quantization step size of each coding block ,QP The relationship with the quantization step size is approximately exponential .QP It's an integer. , Brightness component QP The value range is 0~51, The brightness of the chrominance component QP The value range is 0~45.QP Values in 0~29 Range time , The quantization steps of the luminance component and the chrominance component are equal , from QP=30 Start , The two are beginning to differ .QP The relationship with quantization step size is shown in the figure below :
Coding end Quantification process It can be simply understood as every DCT Transform coefficient divided by quantization step Get the quantized value . Corresponding at the decoding end Inverse quantization process Namely The amount **** Multiply the quantization value by the quantization step obtain DCT Variation coefficient value .
06 Entropy coding
Entropy coding is a kind of coding that does not lose any information according to entropy principle . Quantization is a lossy compression method , Entropy coding marks the mapping relationship between the original data in a more compact way , Lossless compression . Common entropy codes are Shannon (Shannon) code 、 Huffman (Huffman) code 、 The arithmetic (Arithmetic) code 、 Run length coding, etc .
6.1 Huffman code
Huffman coding is a kind of variable length coding , That is, the encoding length of different characters varies . The coding uses the probability of character occurrence to construct Huffman binary tree , The goal is to use short codes when encoding characters with high probability of occurrence ( Close to the root node ), Long code is used when encoding characters with low probability ( Far from the root node ), So as to minimize the average codeword length .
Code word : The code obtained by Huffman encoding of characters .
example : character A、B、C、D、E、F The corresponding probabilities of occurrence are 0.32、0.22、0.18、0.16、0.08、0.04. The construction process of Huffman tree is as follows :
Choose the one with the lowest probability E、F As a leaf node , Calculation E、F And as their parent nodes ;
Compare the value of the parent node with the rest A、B、C、D Probability value ranking , Then select the smallest two trees to sum ;
Repeat the above process ;
The final constructed Huffman binary tree is shown in the following figure :
The path of the left node is 0, The path of the right node is 1, Get A、B、C、D、E、F The result of coding :
character | A | B | C | D | E | F |
probability | 0.32 | 0.22 | 0.18 | 0.16 | 0.08 | 0.04 |
Code word | 11 | 01 | 00 | 101 | 1001 | 1000 |
Code length | 2 | 2 | 2 | 3 | 4 | 4 |
Average codeword length = 0.32*2 + 0.22*2 + 0.18*2 + 0.16*3 + 0.08*4 + 0.04*4 = 2.4bit
6.2 Arithmetic coding
Although Huffman coding can obtain the best coding results in theory , But in actual coding , Because the minimum data unit processed by the computer is 1bit, The length of codewords containing decimal points can only be treated as integers , Therefore, the actual coding effect is often slightly inferior to the theoretical coding effect . In the field of image compression , Arithmetic coding is usually used instead of Huffman coding . however , The theoretical basis of arithmetic coding is consistent with Huffman coding , Short code is used for characters with high probability , Characters with low probability use long code .
Arithmetic coding is divided into fixed mode arithmetic coding 、 Adaptive arithmetic coding (AAC)、 Binary arithmetic encoding 、 Adaptive binary arithmetic coding (CABAC) etc. ,H.265 Used in CABAC . Only the fixed mode arithmetic coding process will be introduced here :
Statistics of each character and probability of occurrence in the input symbol sequence ;
According to the probability distribution , take [0, 1) The interval is divided into several sub intervals , Each subinterval represents a character , The size of the subinterval represents the probability of character occurrence ; The sum of the sizes of all subintervals is equal to 1; Suppose the range of this character is [L, H);
Set initial variable low=0, high=1, Continuously read every character in the symbol sequence , Find the range corresponding to this character [L, H), to update low and high Value :
low = low + (high - low) * L
high = low + (high - low) * H
After traversing the symbol sequence , To get the final low and high, Convert binary form output to get encoded data ;
example : The input symbol sequence is ADBCD, Count the occurrence probability of each character :
character | Number of occurrences | Probability of occurrence | Probability interval |
A | 1 | 0.2 | [0, 0.2) |
B | 1 | 0.2 | [0.2, 0.4) |
C | 1 | 0.2 | [0.4, 0.6) |
D | 2 | 0.4 | [0.6, 1) |
Traverse the first character A when ,low = 0, high = 1, L = 0, H = 0.2
low = low + (high - low) * L = 0
high = low + (high - low) * H = 0.2
Traverse the second character D when ,low = 0, high = 0.2, L = 0.6, H = 1
low = low + (high - low) * L = 0.12( notes : Calculated here low Do not substitute into the following calculation high Value in the formula )
high = low + (high - low) * H = 0.2
Traverse the third character B when ,low = 0.12,high = 0.2,L = 0.2,H = 0.4
low = low + (high - low) * L = 0.136
high = low + (high - low) * H = 0.152
Traverse the fourth character C when ,low = 0.136,high = 0.152,L = 0.4,H = 0.6
low = low + (high - low) * L = 0.1424
high = low + (high - low) * H = 0.1456
Traverse the fifth character D when ,low = 0.1424,high = 0.1456,L = 0.6,H = 1
low = low + (high - low) * L = 0.14432
high = low + (high - low) * H = 0.1456
Get the last [low, high) The interval is [0.14432, 0.1456), After taking any value in this interval and converting it to binary, it is true "ADBCD" Arithmetic encoding of . The corresponding coding process can be simplified to the following figure :
07 Loop filtering
because H.265 Block coding is adopted , In image inverse quantization 、 When reconstructing by inverse transformation , There will be some distortion effects , for example Block effect 、 Ringing effect . To solve these problems ,H.265 Adopted Loop filtering technology , These include deblocking filter (DBF) And sample adaptive compensation (SAO).
DBF Act on boundary pixels , be used for Solve the block effect . Block effect refers to the obvious discontinuity of gray values at the boundary of some adjacent coding blocks , There are two main reasons for the block effect :
Encoder to residual DCT Transformation and quantization are block based , Ignoring the correlation between blocks , Result in inconsistent processing between blocks ;
Incomplete matching of inter prediction motion compensation blocks , There is an error ; The prediction reference frame at the time of coding usually comes from these reconstructed images , Cause the image to be predicted to be distorted ;
DBF Strong filtering is adopted for the boundary type 、 Weak filtering or no processing , The boundary type is determined by the boundary pixel gradient threshold and the quantization parameters of the boundary block .DBF When dealing with , First, the vertical edge of the whole image is horizontally filtered , Then the horizontal edge is vertically filtered . The filtering process is actually the process of modifying the pixel value , Make the squares look less obvious .H.264 There are also DBF technology , But it applies to 4*4 Size processing block , and H.265 Apply to 8*8 Size processing block .
SAO yes H.265 The newly introduced error compensation mechanism for the reconstructed image , be used for Improve ringing effect . Ringing effect refers to the oscillation caused by the sharp change of the gray value of the image , The main reason for ringing effect is DCT High frequency information is lost after transformation .SAO The principle of the method is to add negative values to the peak pixels of the reconstructed curve , Trough add positive compensation , So as to reduce the distortion of high-frequency information . and DBF It only works on boundary pixels ,SAO Acts on all pixels in the block .
08 Summary
This paper starts from H.265 From the perspective of the overall coding process , It introduces H.265 The blocks involved in coding 、 forecast 、 Transformation 、 quantitative 、 code 、 Loop filtering and other technical points . By understanding these coding principles , It will lay a solid foundation for our further study of audio and video development technology .
---------- END ----------
Recommended reading 【 Technical gas station 】 series :
Small program startup performance optimization practice
Baidu engineers teach you to play with design patterns ( The singleton pattern )
Baidu programmer Android Develop tips
Chrome Devtools Debugging tips
On the super large-scale pre training model of artificial intelligence
边栏推荐
- 百度交易中台之钱包系统架构浅析
- 阿里十年测试带你走进APP测试的世界
- The research trend of map based comparative learning (gnn+cl) in the top paper
- C language - input array two-dimensional array a from the keyboard, and put 3 in a × 5. The elements in the third column of the matrix are moved to the left to the 0 column, and the element rows in ea
- 百度APP 基于Pipeline as Code的持续集成实践
- [technical live broadcast] how to rewrite tdengine code from 0 to 1 with vscode
- Why don't you recommend using products like mongodb to replace time series databases?
- Android 隐私沙盒开发者预览版 3: 隐私安全和个性化体验全都要
- 揭秘百度智能测试在测试自动执行领域实践
- Tdengine connector goes online Google Data Studio app store
猜你喜欢
Unity skframework framework (XXIII), minimap small map tool
Nips2021 | new SOTA for node classification beyond graphcl, gnn+ comparative learning
Develop and implement movie recommendation applet based on wechat cloud
7 月 2 日邀你来TD Hero 线上发布会
C语言-从键盘输入数组二维数组a,将a中3×5矩阵中第3列的元素左移到第0列,第3列以后的每列元素行依次左移,原来左边的各列依次绕到右边
百度交易中台之钱包系统架构浅析
[ctfhub] Title cookie:hello guest only admin can get flag. (cookie spoofing, authentication, forgery)
百度APP 基于Pipeline as Code的持续集成实践
The popularity of B2B2C continues to rise. What are the benefits of enterprises doing multi-user mall system?
揭秘百度智能测试在测试自动执行领域实践
随机推荐
Oracle combines multiple rows of data into one row of data
写入速度提升数十倍,TDengine 在拓斯达智能工厂解决方案上的应用
mysql安装配置以及创建数据库和表
22-07-04 Xi'an Shanghao housing project experience summary (01)
使用el-upload封装得组件怎么清空已上传附件
[technical live broadcast] how to rewrite tdengine code from 0 to 1 with vscode
The most comprehensive promotion strategy: online and offline promotion methods of E-commerce mall
SQL learning alter add new field
Wechat applet obtains household area information
The research trend of map based comparative learning (gnn+cl) in the top paper
Why does everyone want to do e-commerce? How much do you know about the advantages of online shopping malls?
一文读懂TDengine的窗口查询功能
C language - input array two-dimensional array a from the keyboard, and put 3 in a × 5. The elements in the third column of the matrix are moved to the left to the 0 column, and the element rows in ea
TDengine 离线升级流程
Cloud computing technology hotspot
项目实战 | Excel导出功能
uni-app---uni. Navigateto jump parameter use
[ManageEngine] how to make good use of the report function of OpManager
一次 Keepalived 高可用的事故,让我重学了一遍它
What should we pay attention to when entering the community e-commerce business?