当前位置:网站首页>H264 (I) i/p/b frame gop/idr/ and other parameters
H264 (I) i/p/b frame gop/idr/ and other parameters
2022-07-05 08:00:00 【Drink more hot water-】
Catalog :
1. I frame
2. P frame
3. B frame
4. GOP
5. IDR frame
6. SPSPPS
7. slice 、 site 、 frame 、 Macroblock
8. Coding format
9. QP
10. profile level
11. DTS PTS
1. I frame
I frame : namely Intra-coded picture( Intra coded image frame ),I Frames represent keyframes , You can understand it as a complete preservation of this frame ;
Decoding only needs the frame data to complete ( Because it contains the whole picture ). Also known as the inner picture (intra picture),I Frames are usually each GOP(MPEG A video compression technology used ) The first frame of , After moderate compression , As a reference point for random access , It can be used as an image . stay MPEG In the process of coding , Part of the video frame sequence is compressed into I frame ; Partially compressed into P frame ; And part of it is compressed into B frame .I Frame method is intra frame compression method , Also known as “ Keyframes ” Compression method .I The frame method is based on discrete cosine transform DCT(Discrete Cosine Transform) Compression technology , This algorithm works with JPEG The compression algorithm is similar to . use I Frame compression can achieve 1/6 Compression ratio without obvious compression trace .
【I Frame features 】
1. It's a full frame compressed encoded frame . It takes the whole frame of image information JPEG Compression coding and transmission ;
2. Use only when decoding I The frame data can reconstruct the whole image ;
3.I The frame describes the background of the image and details of the moving subject ;
4.I Frame is generated without reference to other pictures ;
5.I Frames are P The frame and B Frame reference frame ( Its quality directly affects the quality of subsequent frames in the same group );
6.I Frames are frame groups GOP The base frame of ( First frame ), Only one in a group I frame ;
7.I Frames don't need to consider motion vectors ;
8.I The frame takes up a large amount of information .
【I Frame coding process 】
(1) Intra prediction , Determine the intra prediction mode used .
(2) Subtract the predicted value from the pixel value , Get the residuals .
(3) Transform and quantify the residuals .
(4) Variable length coding and arithmetic coding .
(5) Reconstruct the image and filter it , The obtained image is used as the reference frame of other frames .
for example : In a video conference system , The terminal sends to MCU( perhaps MCU Send to the terminal ) Image , Not every time a complete picture is sent to the remote end , It is only the part of the last picture that changes based on the previous picture . If the network is in bad condition , The receiving remote end of the terminal or the picture sent to the remote end will have packet loss and the image splash screen will appear 、 The phenomenon of image Caton , In this case, if not I Frame mechanism to enable the remote end to resend a new and complete image to the local ( Or resend a new and complete image locally to the remote ), The flower screen of the output image of the terminal 、 The Caton phenomenon will become more and more serious , As a result, the meeting cannot proceed normally .
In the process of video picture playing , if I The frame is lost , Then the back P The frame can't be solved , There will be a black screen of the video screen ; if P The frame is lost , Then the video picture will show a flower screen 、 Mosaic and other phenomena .
In a video conference system I Frames will only occur within the bandwidth limited by the Conference , It will not take effect beyond the conference bandwidth .I The frame mechanism exists not only in MCU in , TV wall server 、 It also exists in the recording and broadcasting server . It is to solve the problem when the network is in bad condition , Caused by packet loss, such as image splash screen 、 Carton , It will affect the normal progress of the meeting .
2. P frame
P frame : namely Predictive-coded Picture( Forward prediction coded image frame ).P The frame represents this frame and the previous keyframe ( or P frame ) The difference between ,
In decoding, the difference defined in this frame needs to be superimposed on the previous cached picture , Generate the final picture .( That's the difference frame ,P Frame has no full picture data , Only data different from the previous frame )
【P Frame prediction and reconstruction 】
P The frame is based on I Frames are reference frames , stay I Find... In the frame P frame “ A point ” The predicted values and motion vectors of , Take the prediction difference and transmit it with the motion vector . At the receiving end, according to the motion vector from I Find... In the frame P frame “ A point ” And add the predicted value to the difference value P frame “ A point ” Sample value , So we can get complete P frame .
【P Frame features 】
1.P Frames are I The frame is separated by 1~2 Frame encoding frame ;
2.P The frame is transmitted with motion compensation I or P Frame difference and motion vector ( Prediction error );
3. When decoding, you have to decode I Only when the predicted value in the frame is summed with the prediction error can the complete P Frame image ;
4.P The frame belongs to the inter frame coding of forward prediction . It only refers to the one closest to it in front of it I The frame or P frame ;
5.P The frame can be behind it P Frame reference frame , It can also be before and after B Frame reference frame ;
6. because P Frames are reference frames , It can cause the spread of decoding errors ;
7. Because it's a differential transmission ,P The frame compression ratio is high .
3. B frame
B frame : namely Bidirectionally predicted picture( Bidirectional predictive coding image frames ).B Frames are two-way differential frames , That is to say B A frame records the difference between this frame and the preceding and following frames ,
In other words , To decode B frame , Not only to get the previous cache screen , You have to decode the image , Through the superposition of the front and back pictures and the data of this frame, the final picture is obtained .
B High frame compression rate , But when decoding CPU I will be tired .
【B Frame prediction and reconstruction 】
B Frame before I or P The frame and the following P Frames are reference frames ,“ find ”B frame “ A point ” And two motion vectors , And take the prediction difference and the motion vector transmission . According to the motion vector in two frames “ find ( Work out )” Predict the value and sum it with the difference , obtain B frame “ A point ” Sample value , So we can get complete B frame . Motion prediction is used for inter frame bidirectional prediction coding
【B Frame features 】
1.B The frame is from the front I or P The frame and the following P Frame to predict ;
2.B The frame transmits it with the preceding I The frame or P The frame and the following P Prediction errors between frames and motion vectors ;
3.B The frame is a bidirectional predictive coding frame ;
4.B The frame compression ratio is the highest , Because it only reflects the change of motion subject between C reference frames , The prediction is more accurate ;
5.B Frame is not a reference frame , It doesn't cause the spread of decoding errors
【 Why B frame 】
From above , We know I and P The decoding algorithm is relatively simple , It also takes up less resources ,I Just finish it by yourself ,P Well , You only need the decoder to cache the previous picture , encounter P Just use the cached image before ,
If the video stream only I and P, The decoder can ignore the following data , Decoding while reading , Linear forward , Everyone is very comfortable . So why introduce B frame ?
Many movies on the Internet use it B frame , because B Frame records the difference between the previous and subsequent frames , Than P Frames can save more space , But this way , The file is small , The decoder is in trouble ,
Because in decoding , Not only with the previously cached images , And know the next I perhaps P The picture of ( That is to say, pre read and pre decode ), and ,B Frames cannot be simply discarded ,
because B Frames actually contain picture information , If you simply throw it away , And simply repeat with the previous picture , It will cause picture card ( In fact, I lost the frame ),
And because of the movies on the Internet, in order to save space , Often use quite a lot of B frame ,B More frames , No support for B Frame player causes more trouble , The more stuck the picture is .
4. Sequence (GOP)
stay H264 Images are organized in sequence , A sequence is a data stream after image coding .
The first image of a sequence is called IDR Images ( Refresh the image now ),IDR The images are all I Frame image .H.264 introduce IDR The image is for decoding the resynchronization , When the decoder decodes to IDR In the picture , Clear the reference frame queue immediately , To output or discard all decoded data , Look up the parameter set again , Start a new sequence . such , If there is a major error in the previous sequence , Here's a chance to resynchronize .IDR The image after the image will never be used IDR Before the image data to decode .
A sequence is a series of data streams generated after encoding an image with small content difference . When the movement changes less , A sequence can be very long , Because the motion changes little, it means that the content of the image changes little , So you can make up a I frame , And then all the time P frame 、B Frames . When motion changes a lot , Maybe a sequence is shorter , For example, it includes a I The frame and 3、4 individual P frame .
In video coding sequence ,GOP namely Group of picture( Image group ), Two I The distance between frames ,Reference( Reference period ) Two P The distance between frames ( Here's the picture 3.1). One I Frame takes more than one byte P frame , One P Frame takes more than one byte B frame ( Here's the picture 3.1 Shown ).
So at the bit rate ( The popular understanding is the sampling rate , The greater the sampling rate per unit time , The higher the accuracy , The closer the processed file is to the original file ) Without change ,GOP The bigger the value is. ,P、B The more frames there will be , Average each I、P、B The more bytes a frame takes , It's easier to get better image quality ;Reference The bigger it is ,B The more frames , In the same way, it's easier to get better image quality .
It should be noted that , By improving the GOP Value to improve image quality is limited , In case of scene switching ,H.264 The encoder will automatically force an I frame , At this point, the actual GOP The value is shortened . On the other hand , In a GOP in ,P、B Frames are made by I Frame prediction , When I When the image quality of the frame is poor , It's going to affect a GOP Medium follow-up P、B Frame image quality , Until the next GOP It's possible to recover from the beginning , therefore GOP The value should not be set too large .
meanwhile , because P、B The complexity of the frame is greater than I frame , So much P、B Frames affect coding efficiency , The coding efficiency is reduced . in addition , Too long GOP Will also affect Seek The response speed of the operation , because P、B The frame is from the front I or P Frame prediction , therefore Seek Operation requires direct positioning , Decode a P or B When the frame , You need to decode this first GOP Internal I Frame and previous N A prediction frame can ,GOP The longer the value is , The more prediction frames need to be decoded ,seek The longer the response time .
5. IDR
IDR(Instantaneous Decoding Refresh)– Instant decoding refresh .
I frame : Intra coding frame is an independent frame with all information , It can be decoded independently without reference to other images , The first frame in the video sequence is always I frame .
I and IDR Every frame uses intra prediction . They're all the same thing , In coding and decoding, for convenience , I want the first one I Frames and other I Frame difference , That's why I put the first one first I Frame call IDR, In this way, it is convenient to control the encoding and decoding process . IDR The function of the frame is to refresh immediately , Prevent errors from spreading , from IDR Frame start , Recalculate a new sequence and start coding . and I Frames do not have the ability of random access , This function is created by IDR To undertake . IDR It can lead to DPB(DecodedPictureBuffer Reference frame list —— This is the key ) Empty , and I Can't .IDR It must be an image I Images , but I The image is not necessarily IDR Images . There can be many... In a sequence I Images ,I The image after the image can refer to I The images between images are used as motion reference . There can be many... In a sequence I Images ,I The image after the image can refer to I The images between images are used as motion reference .
about IDR Frame , stay IDR No frame after a frame can reference any IDR The content of the frame before the frame , On the contrary , For ordinary I- Frame , After that B- and P- Frames can be referenced in normal I- Before the frame I- frame . From random access video streams , The player can always come from a IDR Frame play , Because no frame after it references the previous frame . however , Can't be in a world without IDR Start at any point in the video of the frame , Because the following frame always refers to the previous frame .
received IDR When the frame , The other thing the decoder needs to do is : Put all the PPS and SPS Parameters are updated .
Yes IDR Frame processing ( And I The processing of frames is the same ):(1) Intra prediction , Determine the intra prediction mode used .(2) Subtract the predicted value from the pixel value , Get the residuals .(3) Transform and quantify the residuals .(4) Variable length coding and arithmetic coding .(5) Reconstruct the image and filter it , The obtained image is used as the reference frame of other frames .
In the case of multiple reference frames , for instance : There are the following frame sequences : IPPPP I P PPP ……. according to 3 Reference frame encoding .
because “ according to 3 Reference frame encoding ”, So the reference frame queue length is 3 .
Meet the second I when , The reference frame queue is not emptied , Put this I Frames are added to the reference frame queue ( Of course I No reference frame is needed for coding .). And then detect the following P When the frame , Use the previous PPI Three frames for reference .
6. SPS PPS
6.1 SPS Grammatical elements and their meanings
stay H.264 There are many different NAL Unit type , The type 7 It means that we should NAL Unit The data stored in is Sequence Paramater Set. stay H.264 Among the various grammatical elements of ,SPS The information in is crucial . If the data is lost or an error occurs , Then the decoding process is likely to fail .SPS And the image parameter set to be described later PPS Video processing framework on some platforms ( such as iOS Of VideoToolBox etc. ) It is also usually used as initialization information of decoder instances .
SPS namely Sequence Paramater Set, Also called sequence parameter set .SPS A set of encoded video sequences is stored in (Coded video sequence) Global parameters of . The so-called encoded video sequence is the sequence composed of the pixel data of the original video frame by frame after being encoded . The parameters of the encoded data of each frame are saved in the image parameter set . General situation SPS and PPS Of NAL Unit Usually at the beginning of the whole stream . But in some special cases , These two structures may also appear in the middle of the code stream , The main reason may be :
The decoder needs to start decoding in the middle of the stream ;
The encoder changes the parameters of the code stream in the process of encoding ( Such as image resolution );
6.2 PPS Grammatical elements and their meanings
Except for the sequence parameter set SPS outside ,H.264 Another important parameter set in is image parameter set Picture Paramater Set(PPS). Usually ,PPS Be similar to SPS, stay H.264 A single bitstream is stored in a single stream NAL Unit in , It's just PPS NAL Unit Of nal_unit_type The value is
8; In the package format ,PPS Usually with SPS Together , Save in the header of the video file .
7. slice 、 site 、 frame 、 Macroblock
H264 In structure , The data encoded by a video image is called a frame , A frame consists of a piece (slice) Or multiple pieces , A slice consists of one or more macroblocks (MB) form , A macro block consists of 16×16 Of yuv Data composition . Macro block as H264 The basic unit of coding .
use CIF And QCIF When the format , The structure of video signal adopts image 、 Block group (GOB,group of block) 、 Macroblock (MB,macroblock 16×16) 、 block (B,block 8×8) Four level structure .
1 frame = n A movie
1 slice = n A macro block
1 Macroblock = 16x16yuv data
Fields and frames : A field or frame of video can be used to produce an encoded image . On TV , In order to reduce large area flicker phenomenon , Divide a frame into two interlaced fields .
Macroblock : A coded image is usually divided into several macroblocks , A macro block consists of a 16×16 Brightness pixels and an additional 8×8 Cb And a 8×8 Cr Color pixel block composition .
slice : In every image , The form in which several macroblocks are arranged in sheets . The film is divided into I slice 、B slice 、P And some other films .
I The film contains only I Macroblock ,P The tablet may contain P and I Macroblock , and B The tablet may contain B and I Macroblock .
I The macroblock performs intra prediction using the decoded pixels from the current chip as a reference .
P The macroblock uses the previously encoded image as the reference image for intra prediction .
B Macroblocks use bidirectional reference images ( Previous frame and next frame ) Intra prediction .
8. Coding mode
VBR:
Variable BitRate, Dynamic bit rate , The bit rate can vary with the complexity of the image , Therefore, its coding efficiency is relatively high ,Motion occurs , Mosaics are rare . The bit rate control algorithm determines the bit rate used according to the image content , If the image content is relatively simple, less bit rate is allocated ( It seems that the code word is more suitable ), If the image content is complex, more codewords are allocated , This not only ensures the quality , And bandwidth constraints . This algorithm gives priority to image quality .
ABR:
Average BitRate, Average bit rate yes VBR An interpolation parameter of .ABR Within the specified file size , per 50 frame (30 Frame about 1 second ) For a paragraph , Low and insensitive frequencies use relatively low flow rates , Use high flow for high frequency and large dynamic performance , It can be VBR and CBR A compromise choice .
CBR:
Constant BitRate, It is encoded at a constant bit rate , Yes Motion occurs , Because the code rate is constant , Only by increasing QP To reduce the codeword size , Poor image quality , When the scene is still , The image quality is getting better again , Therefore, the image quality is unstable . The advantage is that the compression speed is fast , The disadvantage is that the traffic per second is the same, which can easily lead to a waste of space .
CVBR:
Constrained Variable BitRate,VBR An improvement of , Give consideration to CBR and VBR The advantages of : When the image content is still , Save bandwidth , Yes Motion occurs , Use the bandwidth saved in the early stage to improve the image quality as much as possible , Achieve the purpose of taking into account both bandwidth and image quality . This method usually allows the user to input the maximum code rate and the minimum code rate , At rest , The code rate is stable at the minimum code rate , In sports , The code rate is greater than the minimum code rate , But it does not exceed the maximum bit rate . The ideal model is as follows :
9. QP
Quantizer Parameter, Quantizing parameters , It reflects the compression of spatial details . The smaller the value. , The finer the quantification , The higher the image quality , The longer the generated code stream . Such as QP Small , Most of the details will be retained ;QP increase , Some details are missing , Bit rate reduction , But the image distortion is strengthened and the quality is degraded .
h264 When coding , Divide the image of each frame into many macroblocks , Each macroblock is encoded with a qp value ( Of each macroblock qp Not necessarily equal ). So for every frame of image , There is a biggest qp Value and minimum qp value , That is to say max_qp and min_qp.
Min qp: Set up x264 The smallest quantizer that can be used . The smaller the quantization parameter , The closer the output is to the input . When using certain values ,x264 The output of can look exactly like the input , Although it is not exactly the same , Usually enough, there is no need to use more bits on macroblocks . If the adaptive quantizer is turned on ( Default on ), Do not encourage improvement qpmin Value , That may reduce the quality of the flat part of the frame .
10 . H264 profile level
Namely BP、EP、MP、HP:
1、BP-Baseline Profile: Basic picture quality . Support I/P frame , Only no interleaving is supported (Progressive) and CAVLC;
2、EP-Extended profile: Advanced quality . Support I/P/B/SP/SI frame , Only no interleaving is supported (Progressive) and CAVLC;
3、MP-Main profile: Mainstream quality . Provide I/P/B frame , Support no interlacing (Progressive) And interlace (Interlaced), Also support CAVLC and CABAC Support for ;
4、HP-High profile: High quality . stay main Profile That's an increase from 8x8 Internal Forecasting 、 Custom quantization 、 Lossless video coding and more YUV Format .
11. PTS and DTS
【 Why is there PTS and DTS The concept of 】
As can be seen from the above description :P The frame needs to refer to the previous I The frame or P Frame can generate a complete picture , and B Frames need to refer to the previous I The frame or P The frame and the one after it P Frame can generate a complete picture . This brings a problem : In the video stream , First come B The frame cannot be decoded immediately , You need to wait for what it depends on I、P The frame is decoded first , In this way, the playback time is inconsistent with the decoding time , The order is out of order , How should these frames be played ? At this time, two other concepts are introduced :DTS and PTS.
Let's first get to know PTS and DTS Basic concepts of :
DTS(Decoding Time Stamp): That is, decoding timestamps , The meaning of this time stamp is to tell the player when to decode the data of this frame .
PTS(Presentation Time Stamp): The time stamp is displayed , This timestamp is used to tell the player when to display the data of this frame .
although DTS、PTS It is used to guide the behavior of the player , But they are generated by the encoder at the time of coding .
During video acquisition, one frame is recorded, one frame is encoded and one frame is sent , When coding, it will generate PTS, Here is the special note frame( frame ) Coding method of , In a normal scenario , The codec encodes a I frame , Then skip back a few frames , With code I Frame as a reference frame for a future P Frame coding , Then jump back to I Next frame after frame . Coded I The frame and P Frames between frames are encoded as B frame . after , The encoder will skip a few frames again , Use the first one P Frame is used as the reference frame to encode another P frame , Then jump back to , use B Frames fill gaps in the display sequence . This process continues , Every time 12 To 15 individual P The frame and B Insert a new... Within the frame I frame .P The frame consists of the previous I The frame or P Frame image to predict , and B The frame consists of two P Frame or a I Frames and a P Frame to predict , Therefore, the display order of codec and frame is different , As shown below :
Suppose the frame collected by the encoder looks like this :
I B B P B B P
Then its display order , That is to say PTS It should be :
1 2 3 4 5 6 7
The coding order of the encoder is :
1 4 2 3 7 5 6
The streaming order is also pushed according to the coding order , namely
I P B B P B B
Then the received video stream is
I P B B P B B
This time to decode , According to the received frame, the video stream is also decoded , Receive one frame and decode one frame , Because it has been coded according to I、B、P The dependencies are made up , Just decode the received data directly . Then the decoding order is :
I P B B P B B
DTS:1 2 3 4 5 6 7
PTS:1 4 2 3 7 5 6
You can see the decoded corresponding PTS It's not sequential , In order to display the video stream correctly , At this time, we must follow PTS Readjust the decoded frame( frame ), namely
I B B P B B P
DTS:1 3 4 2 6 7 5
PTS:1 2 3 4 5 6 7
in addition , It is not necessary to use B frame . In the real-time interactive live broadcasting system , Rarely used B frame . The main reason is compression and decoding B When the frame , Because we need two-way reference , So it needs to buffer more data , And CPU It's going to be higher . Due to the requirement of real-time , So you don't use it . But for the player , Meet with B The frame of H264 Data is a common thing . In the absence of B In the case of frames , The order of storing frames is the same as that of displaying frames ,PTS and DTS The value of is the same .
边栏推荐
- Global and Chinese market of digital shore durometer 2022-2028: Research Report on technology, participants, trends, market size and share
- Global and Chinese markets for medical oxygen machines 2022-2028: Research Report on technology, participants, trends, market size and share
- C language uses arrays to realize the intersection, union, difference and complement of sets
- Altium designer 19.1.18 - clear information generated by measuring distance
- Global and Chinese market of peeled bourdon tubes 2022-2028: Research Report on technology, participants, trends, market size and share
- Programming knowledge -- basis of C language
- C language # and #
- Altium designer 19.1.18 - hide the fly line of a network
- Embedded AI intelligent technology liquid particle counter
- Global and Chinese markets of nano biosensors 2022-2028: Research Report on technology, participants, trends, market size and share
猜你喜欢
Ads learning record (lna_atf54143)
Acwing - the collection of pet elves - (multidimensional 01 Backpack + positive and reverse order + two forms of DP for the answer)
Realization of binary relation of discrete mathematics with C language and its properties
How to migrate the device data accessed by the RTSP of the easycvr platform to easynvr?
Extended application of single chip microcomputer-06 independent key
Measurement fitting based on Halcon learning [i] fuse Hdev routine
Altium designer learning (I)
Interview catalogue
Introduction of air gap, etc
MLPerf Training v2.0 榜单发布,在同等GPU配置下百度飞桨性能世界第一
随机推荐
Altium designer 19.1.18 - change the transparency of copper laying
C#,数值计算(Numerical Recipes in C#),线性代数方程的求解,LU分解(LU Decomposition)源程序
Extended application of single chip microcomputer-06 independent key
Shape template matching based on Halcon learning [vi] find_ mirror_ dies. Hdev routine
Temperature sensor DS18B20 principle, with STM32 routine code
Can't find real-time chat software? Recommend to you what e-commerce enterprises are using!
导电滑环磨损快的原因
How to select conductive slip ring
Hardware 1 -- relationship between gain and magnification
MLPerf Training v2.0 榜单发布,在同等GPU配置下百度飞桨性能世界第一
Record the opening ceremony of Beijing Winter Olympics with display equipment
Altium Designer 19.1.18 - 隐藏某一个网络的飞线
Global and Chinese market of peeled bourdon tubes 2022-2028: Research Report on technology, participants, trends, market size and share
Altium designer 19.1.18 - Import frame
万字详解八大排序 必读(代码+动图演示)
Altium designer learning (I)
[professional literacy] core conferences and periodicals in the field of integrated circuits
Global and Chinese market of rammers 2022-2028: Research Report on technology, participants, trends, market size and share
TCP and UDP
Measurement fitting based on Halcon learning [II] meaure_ pin. Hdev routine