当前位置:网站首页>An example analysis of MP4 file format parsing

An example analysis of MP4 file format parsing

2022-07-07 23:43:00 Tinghua_ M

List of articles


One 、 Preface

Have been in contact before MP4 File format , But I haven't calmed down to study it carefully , Recently , Summarize and record , By the way, I'll check the omissions and fill the vacancies . The first thing we know is ,MP4 Also known as  MPEG-4 The first 14 part , It's inheritance MPEG-4 The first 12 Part of the ISO The basic media file format is slightly extended , Defined in standard ISO/IEC 14496-14 in , It is a standard digital multimedia Container format , Widely used for packaging video and audio data streams 、 posters 、 Subtitles and metadata, etc . The current popular video coding format AVC/H264 It's defined in MPEG-4 Part 10.


Two 、MP4 Format Overview

1、 The basic structure

MP4 The data of the file is encapsulated in one after another named Box In the unit of .MP4 The most basic unit in is Box, It is internally through an independent box Made by splicing . Every box It is divided into Header and Data. among Header Part contains box The type and size of ,Data It contains the sub box Or data . One MP4 The file will first have and only have one “ftyp” Type of box, As MP4 Format and contains some information about the file ; There will be and only one “moov” Type of box(Movie Box), It is a kind of container box, Son box Including media metadata Information ;MP4 The media data of the file is contained in “mdat” Type of box(Midia Data Box) in , This type of box It's also container box, There can be multiple , There can be no ( When all media data references other files ), The structure of media data is composed of metadata Describe .


2、 The overall structure

The figure below is a typical MP4 The basic structure of the document :

Box There are different types of , Have different data structures ,Box It can also contain other Box.Box See the following table for the type of ( among  *  Indicates when the parent Box In existence , Must include this Box):


3、 ... and 、 Combined with examples

First post some pieces we want to analyze MP4 Of documents mediainfo chart :

Another overall structure diagram :

  Note the hierarchy of this file , Facilitate subsequent analysis and understanding .


1、File Type Box(ftyp)

File Type Box, Usually at the beginning of the file , The version of the document described 、 Compatible protocols, etc . The box Yes and only 1 individual , And can only be contained at the file layer , And cannot be otherwise box contain .ftyp amount to mp4 Programmatic statement of . namely , tell demuxer Its basic decoded version , Compatible formats . In short , It is used to tell the client , The MP4 Decoding standard used by . therefore ,ftyp It's all in MP4 The beginning of .

Its format is :

aligned(8) class FileTypeBox
   extends Box(‘ftyp’) {
   unsigned int(32)  major_brand;
   unsigned int(32)  minor_version;
   unsigned int(32) compatible_brands[];
}

The fields above are all placed in  data  Field ( Reference resources ,box Description of ).

  • major_brand: Because compatibility can generally be divided into recommended compatibility and default compatibility . here major_brand It is equivalent to recommending compatibility . Generally speaking, they use  isom  This kaleidoscope oil is ok . If you need a specific format , You can define .
  • minor_version: Refers to the minimum compatible version .
  • compatible_brands: and major_brand similar , It's usually aimed at MP4 Additional formats included in , such as ,AVC,AAC Equivalent to audio and video decoding format .

  As mentioned above , Every box Of header front 4 Bytes  00 00 00 20  It means that we should Box Of size, next 4 Bytes  66 74 79 70  Is that the Box Of type, namely  ftyp. Next 4 Bytes  69 73 6F 6D  It is the Lord. brand, Indicates the standard specification that the document follows , Here is  isom, That is to follow ISO Base Media File Format. Next 4 Bytes  00 00 02 00  It means this Box Version number of format . Next 16 Bytes are compatible compatible brands, Other standard specifications compatible with this document , Here is  isomiso2avc1mp41.


2、Movie Box(moov)

This section contains the file media metadata Information ,“moov” It's a contanier box, The specific content is determined by its son box interpretation . Same as ftyp box You should box There is one and only one , And only included in the file layer , In general moov Will follow ftyp appear .moov Box Can be said to be MP4 The most important thing in the document Box, The implementation of a general player needs to read moov Data can be played .

   One In general ,“moov” Will contain 1 individual “mvhd” And a number of “trak”. among “mvhd” by header box, As a general “moov” The first child box appear ( For others container box Come on ,header box Should be the first box appear ).“trak” Contains a track Information about , It's a container box.

2.1 Movie Header Box(mvhd)

Movie Header Box, Record the description of the entire media file , Such as creation time 、 Modification time 、 Time scale 、 It can play for a long time .

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3
creation time4 Creation time ( be relative to UTC Time 1904-01-01 The number of seconds at zero )
modification time4 Modification time
time scale4 File media in 1 Scale value in seconds , It can be understood as 1 The number of units of time in seconds
duration4 The track Length of time , use duration and time scale The value can be calculated track Duration , such as audio track Of time scale = 8000, duration = 560128, The length of 70.016,video track Of time scale = 600, duration = 42000, The length of 70
rate4 Recommended playback rate , high 16 Bit and low 16 The digits are the integral part of the decimal point and the decimal part , namely [16.16] Format , The value is 1.0(0x00010000) It means normal forward play
volume2 And rate similar ,[8.8] Format ,1.0(0x0100) It means maximum volume
reserved10 Keep a
matrix36 Represents the graphic transformation matrix data of the video . The default value here is  { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }
pre-defined24
next track id4 next track The use of id Number

The following figure shows , Current analysis of MP4 The length of the file is 130838ms 


3、Video/AudioTrack Box

although track Is in moov Next , But because it is more important , And it contains many contents , So separate it out and introduce .

“trak” It's actually a container box, His son box Contains the track Media data reference and description (hint track With the exception of ,hint track Contains packaging information for streaming media protocols ). One MP4 The media in the file can contain more than one track, And at least one track, these track Independent of each other , Have your own time and space information . “trak” Must contain a “tkhd” And a “mdia”, There are also many options box( A little ). among “tkhd” by track header box,“mdia” by media box, The box It's one that contains some track Media data information box Of container box.


3.1 Track Header Box(tkhd)

tkhd, namely Track Header Box, Contains the creation time of the track 、 Identify the track Of ID、 Track playback duration 、 The volume 、 Wide and high information .“tkhd” The structure is as follows :

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3 By bit or operation result value , Predefined as follows :0x000001 track_enabled, Otherwise track Not being played ;0x000002 track_in_movie, It means that we should track Be quoted in play ;0x000004 track_in_preview, It means that we should track Referenced in preview . Generally, the value is 7, If a media owned track None of them are set track_in_movie and track_in_preview, Will be understood as all track Both of them are set ; about hint track, The value is 0
creation time4 Creation time ( be relative to UTC Time 1904-01-01 The number of seconds at zero )
modification time4 Modification time
track id4id Number , Cannot be repeated and cannot be for 0
reserved4 Keep a
duration4track Length of time
reserved8 Keep a
layer2 Video layer , The default is 0, The smaller ones are on the top
alternate group2track Group information , The default is 0 It means that we should track Not with other track There's a group relationship
volume2[8.8] Format , If it's audio track,1.0(0x0100) It means maximum volume ; Otherwise 0
reserved2 Keep a
matrix36 Video transformation matrix
width4 wide
height4 high , Are all [16.16] Format value , And sample The actual picture size ratio in the description , Display width and height for playback

The current track The length of time is 130806ms, wide 640, high 360.


3.2 Media Box

 “mdia” It's also a container box, His son box The structure and types of are still relatively complex . Let's have a look at “mdia” Instance structure tree of .

  total Physically speaking ,“mdia” Defined track Media type and sample data , describe sample Information . commonly “mdia” Contains a “mdhd”, One “hdlr” and One “minf”, among “mdhd” by media header box,“hdlr” by handler reference box,“minf” by media information box. Let's take a look at these in turn box Structure .


3.2.1 media header box (mdhd)

mdhd:Media Header Box, Save the video stream creation time , Length information .

“mdhd” The structure is as follows :

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3
creation time4 Creation time ( be relative to UTC Time 1904-01-01 The number of seconds at zero )
modification time4 Modification time
time scale4 Same as the previous table
duration4track Length of time
language2 Media language code . The highest bit is 0, Back 15 Position as 3 Characters ( see ISO 639-2/T The standard defines )
pre-defined2

  The example information is as follows :


3.2.2 handler reference box(hdlr)

 hdlr It's time to track How to process data , The corresponding types include :Video Track、Audio Track perhaps Hint Track. The box It can also be included in meta box(meta) in .“hdlr” The structure is as follows .

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3
pre-defined4
handler type4 stay media box in , The value is 4 Characters :“vide”— video track“soun”— audio track“hint”— hint track
reserved12
name Indefinite track type name, With ‘\0’ a null-terminated string

The example information is as follows :

There are two fields that need additional explanation :

  • handler_type: It refers to concrete trak The type of processing . That is what we listed above vide,soun,hint Field .
  • name: It's used to write names . It is not mainly read by machines , But to read , therefore , Here, as long as you feel you can make it clear , You can fill in anything .

handler_type The value filled in is actually string Convert to hex The value obtained after . such as :

  • vide by  0x76, 0x69, 0x64, 0x65
  • soun by  0x73, 0x6F, 0x75, 0x6E

 3.2.3  Media Information Box(minf)

 “minf” Stored explanations track Media data handler-specific Information ,media handler Use this information to map media time to media data and process it .“minf” The information format and content in are related to the type of media and the method of interpreting media data media handler Is closely related to the , other media handler I don't know how to interpret this information .“minf” It's a container box, Its actual content is determined by box explain .

  One In general ,“minf” Contains a header box, One “dinf” And a “stbl”, among ,header box according to track type( namely media handler type) It is divided into “vmhd”、“smhd”、“hmhd” and “nmhd”,“dinf” by data information box,“stbl” by sample table box. Here are the introduction .

3.2.3.1 Video Media Header Box(vmhd) 

v/smhd Is right for the current trak Description of box.vmhd Aiming at video,smhd Aiming at audio. these two items. box In decoding , Not indispensable ( Sometimes it depends on the player ), Missing words , It may be considered that the format is incorrect .

vmhd The structure is defined in the following table :

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3
graphics mode4 Video synthesis mode , by 0 Copy the original image when it's done , Otherwise and opcolor To synthesize
opcolor2×3{red,green,blue}

Example diagram : 

smhd Structure definition :

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3
balance2 Stereo balance ,[8.8] Format value , It's usually 0,-1.0 Indicates all left channels ,1.0 Indicates all right channels
reserved2

Example diagram :

3.2.3.2 Data Information Box(dinf)

“dinf” Explain how to locate media information , It's a container box.“dinf” It usually contains a “dref”, namely data reference box;“dref” There will be several “url” or “urn”, these box Form a table , Used to locate track data . To put it simply ,track It can be divided into sections , Each paragraph can be based on “url” or “urn” Point to the address to get the data ,sample The serial numbers of these fragments are used in the description to form a complete track. In general , When the data is completely contained in the file ,“url” or “urn” The positioning string in is empty .

“dref” The byte structure of is shown in the following table :

Field Number of bytes significance
box size4box size
box type4box type
version1box edition ,0 or 1, It's usually 0.( The following number of bytes are all pressed version=0)
flags3
entry count4“url” or “urn” The number of elements in the table
“url” or “urn” list Indefinite

   “url” or “urn” All are box,“url” The content of is a string (location string),“urn” The content of is a pair of strings (name string and location string). When “url” or “urn” Of box flag by 1 when , All strings are empty .

  Here's a “dinf” Byte instance diagram of . The yellow box is “dinf” Of box header, We know from the part in the red box that it contains “url” or “urn” The number is 1, Red is followed by “url”box The content of . The purple box is “url” Of box header( according to box type We know it's a “url”), The green box is box flag, The value is 1, explain “url” The string in is empty , Express track The data has been included in the file .

3.2.3.1 Sample Table Box(stbl)

Sample Table Box, Mentioned above mdia The most important part of the file is to store every Sample The information of stbl. In parsing stbl front , We need to distinguish between Chunk and Sample These two concepts .

stay MP4 In file ,Sample It's a basic unit of media streaming , For example, one of the video streams Sample Represents the actual nal data .Chunk It is the basic unit of data storage , It's a series of Sample A collection of data , One Chunk Can contain one or more Sample.

“stbl” It's about track in sample All time and location information , as well as sample Encoding and decoding information . Use this table , Can explain sample The temporal 、 type 、 Size and location in their respective storage containers .“stbl” It's a container box, His son box Include :sample description box(stsd)、time to sample box(stts)、sample size box(stsz or stz2)、sample to chunk box(stsc)、chunk offset box(stco or co64)、composition time to sample box(ctts)、sync sample box(stss) etc. .
“stsd” essential , And contain at least one entry , The box Contains data reference box Conduct sample Information for data retrieval . No, “stsd” It can't be calculated media sample Storage location .“stsd” Contains encoded information , The information it stores varies with the media type .

      video                      audio

3.2.3.1.1  Sample Description Box(stsd)

stsd, namely Sample Description Box, It mainly contains the details of the sampling data , Including the encoding type and various initialization data information required for decoding .box header and version After the field there will be a entry count Field , according to entry The number of , Every entry There will be type Information , Such as “vide”、“sund” etc. , according to type Different sample description It will provide different information , For example, for video track, There will be “VisualSampleEntry” Type information , about audio track There will be “AudioSampleEntry” Type information .
The coding type of video 、 Wide and high 、 length , The track of the audio 、 Sampling and other information will appear in this box in .

video track Corresponding stsd Box:

  We saw that it was still up there  avcC  The entry contains the video SPS and PPS Information about , These are the information needed for video decoding .SPS and PPS yes H.264 Meta information in the stream , stay MP4 The documents are stored separately in  avcC  in . When you switch , Also need to SPS and PPS extracted , add 0x00000001, Put it in H.264 The starting position of the video stream . about H.265, Its meta information is in  hvcC  type Box in .

3.2.3.1.2 Time To Sample Box(stts)

“stts” timestamp -sample Sequence number mapping table , Store sample Of duration, It describes sample The mapping method of time series , We can find it at any time sample.“stts” You can include a compressed table to map time and space sample Serial number , Use other tables to provide each sample The length and pointer of . Each entry in the table provides a continuous... Within the same time offset sample Serial number , as well as samples The offset . Increment these offsets , You can build a complete time to sample surface .

3.2.3.1.3 Sync Sample Box(stss)

“stss” determine media Keyframes in . For compressed media data , A keyframe is the starting frame of a series of compressed sequences , Its decompression does not rely on previous frames , The decompression of subsequent frames will depend on this key frame . “stss” It can label random access points in media very compactly , It contains a sample Serial number table , Every item in the table is strictly in accordance with sample The order of the serial number of , Which one of the media sample It's a keyframe . If this table does not exist , Explain each of them sample It's all a keyframe , It's a random access point .

The following is the key frame distribution of the video , You can use Thor's VideoEye Take a look at the code stream of key frames , It can be right .

3.2.3.1.4 Sample To Chunk Box(stsc)

stsc, namely Sample To Chunk Box, contain Sample and Chunk The mapping relation of . use chunk organization sample It is convenient to optimize data acquisition , One chunk Contains one or more sample.“stsc” A table is used to describe sample And chunk The mapping relation of , Look at this table and you'll find the list containing the specified sample Of chunk, To find this sample.

  You can see... Here entry count Values for 5. We can stco See the video in the message track Yes 3876 individual chunk. So here stsc Of entry table The data of indicates , The first  [1, 590-1]  individual chunk It's all about 1 individual sample, The first  [590, 591-590]  individual chunk It's all about 1 individual sample, The first  [591, 790-591]  individual chunk It's all about 1 individual sample, The first  [790, 791-790]  individual chunk It's all about 2 individual sample, The first  [791, last=3876]  individual chunk It's all about 1 individual sample. therefore , The total is  (589 * 1) +(1 * 2)+ (199 * 1)+ (1 * 2)+ (3876 - 791 + 1) * 1 = 3878  individual sample. This quantity is just the same as stsz The... Shown in the example sample The quantity is right .

3.2.3.1.5  Sample Size Boxe(stsz)

stsz, namely Sample Size Boxe, Include each Sample Size .

3.2.3.1.6  Chunk Offset Box(stco)

stco, namely Chunk Offset Box, Every Chunk The migration . This offset is the offset from the initial location of the file . So one thing to note here is , When modifying mdat Box Other before Box Information time , Will affect Chunk Offset, The records here need to be updated accordingly . If the video file is large ,Offset use 32 Bit means no less than , Just use co64 Box adopt 64 To said .

stay MP4 In file ,Chunk Is the smallest basic unit , instead of Sample. One Chunk It can contain one or more Sample. Here is to optimize the data I/O Read efficiency .

4、Free Space Box(free)

“free” The content is irrelevant , Can be ignored . The box After being deleted , It won't have any effect on the playback .

5、Meida Data Box(mdat)

The box Contained at the file layer , There can be multiple , There can be no ( When all media data is referenced by external files ), Used to store media data . The data is directly following box type The back of the field , The meaning of specific data structure needs reference metadata( Mainly in the sample table Description in ).

原网站

版权声明
本文为[Tinghua_ M]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207072126129960.html