当前位置:网站首页>Mp4 format details
Mp4 format details
2022-07-06 16:41:00 【Dog egg L】
mp4 summary
MP4 All the data in the file is loaded in box(QuickTime In Chinese, it means atom) in , in other words MP4 The document consists of several box form , Every box There are types and lengths , Can be box It is understood as a block of data objects .box Can contain another box, such box be called container box. One MP4 The file will first have and only have one “ftyp” Type of box, As MP4 Format and contains some information about the file ; There will be and only one “moov” Type of box(Movie Box), It is a kind of container box, Son box Including media metadata Information ;MP4 The media data of the file is contained in “mdat” Type of box(Midia Data Box) in , This type of box It's also container box, There can be multiple , There can be no ( When all media data references other files ), The structure of media data is composed of metadata Describe .
Here are some concepts :
- track To express something sample Set , For media data ,track Represents a video or audio sequence .
- hint track This special track Does not contain media data , It contains some other data track Instructions for packaging into streaming media .
- sample For non hint track Come on ,video sample It is a frame of video , Or a set of consecutive video frames ,audio sample That is, a continuous compressed audio , They are collectively referred to as sample. about hint tracksample Define the format of one or more streaming media packages .
- sample table To specify sampe Table of timing and physical layout .
- chunk One track Several sample Constituent unit .
Basic concepts
- file , By many Box and FullBox form .
- Box, Every Box from Header and Data form .
- FullBox, yes Box An extension of ,Box Based on the structure, in Header add 8bits version and 24bits flags.
- Header, It contains the whole Box The length of size And type type. When size == 0 when , This is the last one in the file Box; When size==1 when , It means our Box The length needs more bits To describe , We will define a 64bits Of largesize describe Box The length of ; When type yes uuid when , representative Box The data in is a user-defined extension type .
- Data, yes Box Actual data , It can be pure data or more sub data Boxes.
- When one Box Of Data There is a series of sub Box when , This Box It can also become Container Box.
The structure is as follows: :
MP4 File format Overview
MP4 The file consists of multiple box form , Every box Store different information , And box There is a tree structure between them , As shown in the figure below .
box There are many types , Here is 3 A more important top level box:
- ftyp:File Type Box, Describes the MP4 Specifications and versions ;
- moov:Movie Box, Media metadata Information , There is and only one .
- mdat:Media Data Box, Store actual media data , There are usually more than one ;
although box There are many types , But the basic structure is the same . The next section will start with box Structure , And then to the common box For further explanation .
The following table is common box, Just take a look and have a general impression .
MP4 Box brief introduction
1 individual box It's made up of two parts :box header、box body.
- box header:box Metadata , such as box type、box size.
- box body:box The data part of , What's actually stored is the same as box The type is related to , such as mdat in body Part of the stored media data .
box header in , Only type、size Is a required field . When size==0 when , There is largesize Field . In part box in , There is still version、flags Field , In this way box be called Full Box. When box body Nest others in box when , In this way box be called container box.
Box Header
The fields are defined as follows :
- type:box type , Include “ Predefined types ”、“ Custom extension types ”, Occupy 4 Bytes ;
- Predefined types : such as ftyp、moov、mdat And other predefined types ;
- Custom extension types : If type==uuid, It means that it is a custom extension type .size( or largesize) And then 16 byte , Is the value of the custom type (extended_type)
- size: contain box header The whole inside box Size , Unit is byte . When size by 0 or 1 when , Require special treatment :
- size be equal to 0:box The size of the following largesize determine ( Generally, only media data is loaded mdat box use largesize);
- size be equal to 1: At present box For the last of the documents box, Usually included in mdat box in ;
- largesize:box Size , Occupy 8 Bytes ;
- extended_type: Custom extension types , Occupy 16 Bytes ;
Box The pseudo-code is as follows :
aligned(8) class Box (unsigned int(32) boxtype, optional unsigned int(8)[16] extended_type) {
unsigned int(32) size;
unsigned int(32) type = boxtype;
if (size==1) {
unsigned int(64) largesize;
} else if (size==0) {
// box extends to end of file
}
if (boxtype==‘uuid’) {
unsigned int(8)[16] usertype = extended_type;
}
}
Box Body
box Data body , Different box The content is different , You need to refer to the specific box The definition of . yes , we have box body It's simple , such as ftyp. yes , we have box More complicated , Maybe there's something else in it box, such as moov.
Box vs FullBox
stay Box On the basis of , Expanded out FullBox type . comparison Box,FullBox More version、flags Field .
- version: At present box Version of , Prepare for expansion , Occupy 1 Bytes ;
- flags: Sign a , Occupy 24 position , The meaning consists of specific box Define your own ;
FullBox The pseudocode is as follows :
aligned(8) class FullBox(unsigned int(32) boxtype, unsigned int(8) v, bit(24) f) extends Box(boxtype) {
unsigned int(8) version = v;
bit(24) flags = f;
}
FullBox Mainly in the moov Medium box be used , such as moov.mvhd, It will be introduced later .
aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
// The fields are omitted ...
}
ftyp(File Type Box)
ftyp Used to indicate the specifications that the current file follows , Introducing ftyp Before the details of , SECCO isom.
What is? isom
isom(ISO Base Media file) Is in MPEG-4 Part 12 A basic file format defined in ,MP4、3gp、QT And other common packaging formats , They are all derived from this basic file format .
MP4 The specifications that the document may follow are mp41、mp42, and mp41、mp42 It's based on isom derived .
- 3gp(3GPP): A container format , It is mainly used for 3G On the cell phone ;
- QT:QuickTime Abbreviation ,.qt The document represents Apple QuickTime The media file ;
ftyp Definition
ftyp The definition is as follows :
aligned(8) class FileTypeBox extends Box(‘ftyp’) {
unsigned int(32) major_brand;
unsigned int(32) minor_version;
unsigned int(32) compatible_brands[]; // to end of the box
}
Here is brand Description of , In fact, it is the code corresponding to the specific packaging format , use 4 A byte code to indicate , such as mp41.
A brand is a four-letter code representing a format or subformat. Each
file has a major brand (or primary brand), and also a compatibility
list of brands.
ftyp The meaning of several fields of :
- major_brand: For example, the common isom、mp41、mp42、avc1、qt etc. . It said “ best ” Based on which format to parse the current file . give an example ,major_brand yes A,compatible_brands yes A1, When the decoder supports A、A1 When standardizing , Best use A Specification to decode the current media file , If not A standard , But support A1 standard , that , have access to A1 Standard to decode ;
- minor_version: Provide major_brand The description information of , Such as version number , It must not be used to determine whether a media file meets a certain standard / standard ;
- compatible_brands: File compatible brand list . such as mp41 Compatibility brand by isom. Through the compatibility list brand standard , You can part ( Or all ) Decode it ; In practical use , Can't take isom As major_brand, It's about using specific brand( such as mp41), therefore , about isom, There is no specific file extension defined 、mime type.
In practical use , Can't take isom As major_brand, It's about using specific brand( such as mp41), therefore , about isom, There is no specific file extension defined 、mime type.
Here are some common brand, And the corresponding file extension 、mime type, more brand You can refer to here .
Here's a screenshot of the actual example , Don't go into .
About AVC/AVC1
In the discussion MP4 When standardizing , mention AVC, Sometimes it means “AVC File format ”, Sometimes it means "AVC Compression standard (H.264)", Here's a simple distinction .
- AVC File format : be based on ISO Basic file format Derivative , It uses AVC Compression standard , Think of it as MP4 The extended format of , Corresponding brand Usually avc1, stay MPEG-4 PART 15 In the definition of .
- AVC Compression standard (H.264): stay MPEG-4 Part 10 In the definition of .
- ISO Basic file format (Base Media File Format) stay MPEG-4 Part 12 In the definition of .
moov(Movie Box)
Movie Box, Storage mp4 Of metadata, Generally located mp4 Beginning of file .
aligned(8) class MovieBox extends Box(‘moov’){
}
moov in , The two most important box yes mvhd and trak:
- mvhd:Movie Header Box,mp4 The overall information of the document , Such as creation time 、 File length, etc ;
- trak:Track Box, One mp4 It can contain one or more orbits ( Like video tracks 、 Audio track ), The orbital information is in trak in .trak yes container box, At least two box,tkhd、mdia;
mvhd For the whole film ,tkhd For a single track,mdhd For the media ,vmhd For video ,smhd For audio , We can think of it as from broad > Specifically , The former is generally derived from the latter .
mvhd(Movie Header Box)
MP4 The overall information of the document , With the specific video stream 、 The audio stream has nothing to do with , Such as creation time 、 File length, etc .
The definition is as follows :
aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
if (version==1) {
unsigned int(64) creation_time;
unsigned int(64) modification_time;
unsigned int(32) timescale;
unsigned int(64) duration;
} else {
// version==0
unsigned int(32) creation_time;
unsigned int(32) modification_time;
unsigned int(32) timescale;
unsigned int(32) duration;
}
template int(32) rate = 0x00010000; // typically 1.0
template int(16) volume = 0x0100; // typically, full volume const bit(16) reserved = 0;
const unsigned int(32)[2] reserved = 0;
template int(32)[9] matrix =
{
0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
// Unity matrix
bit(32)[6] pre_defined = 0;
unsigned int(32) next_track_ID;
}
The meaning of the field is as follows :
- creation_time: File creation time ;
- modification_time: File modification time ;
- timescale: The unit of time contained in a second ( Integers ). for instance , If timescale be equal to 1000, that , One second contains 1000 Time units ( Back track Waiting time , We have to use this to convert , such as track Of duration by 10,000, that ,track The actual duration of is 10,000/1000=10s);
- duration: The length of the film ( Integers ), According to the document track The information is derived from , Equal to the longest time track Of duration;
- rate: Recommended playback rate ,32 An integer , high 16 position 、 low 16 Bits represent integral parts respectively 、 The fractional part ([16.16]), give an example 0x0001 0000 representative 1.0, Normal playback speed ;
- volume: Play volume ,16 An integer , high 8 position 、 low 8 Bits represent integral parts respectively 、 The fractional part ([8.8]), give an example 0x01 00 Express 1.0, That's maximum volume ;
- matrix: Video conversion matrix , Generally, it can be ignored ;
- next_track_ID:32 An integer , Not 0, Generally, it can be ignored . When you want to add a new track When it comes to this film , serviceable track id, It has to be better than what is currently in use track id Be big . in other words , Add new track when , Traversal takes all of track, Confirm available track id;
tkhd(Track Box)
Single track Of metadata, Contains the following fields :
version:tkhd box Version of ;
flags: To obtain by bit or operation , The default value is 7(0x000001 | 0x000002 |0x000004), Express this track It's enabled 、 For playing And For preview .
Track_enabled: The value is 0x000001, Express this track It's enabled , The duty of 0x000000, Express this track Not enabled ;
Track_in_movie: The value is 0x000002, At present track It will be used when playing ;
Track_in_preview: The value is 0x000004, At present track For preview mode ;
creation_time: At present track The creation time of ;
modification_time: At present track Last modified time of ;
track_ID: At present track Unique identification of , Not for 0, Can't repeat ;
duration: At present track The full duration of ( It needs to be divided by timescale Get the exact number of seconds );
layer: The stacking order of video tracks , The smaller the number, the closer it gets to the viewer , such as 1 Than 2 Lean up ,0 Than 1 Lean up ;
alternate_group: At present track The grouping ID,alternate_group Same value track In the same group . In the same group track, There can only be one at a time track In play mode . When alternate_group by 0 when , At present track Nothing else track In the same group . In a group , There can be only one track;
volume:audio track The volume of , Be situated between 0.0~1.0 Between ;
matrix: Video transformation matrix ;
width、height: The width and height of the video ;
The definition is as follows :
aligned(8) class TrackHeaderBox
extends FullBox(‘tkhd’, version, flags){
if (version==1) {
unsigned int(64) creation_time;
unsigned int(64) modification_time;
unsigned int(32) track_ID;
const unsigned int(32) reserved = 0;
unsigned int(64) duration;
} else {
// version==0
unsigned int(32) creation_time;
unsigned int(32) modification_time;
unsigned int(32) track_ID;
const unsigned int(32) reserved = 0;
unsigned int(32) duration;
}
const unsigned int(32)[2] reserved = 0;
template int(16) layer = 0;
template int(16) alternate_group = 0;
template int(16) volume = {
if track_is_audio 0x0100 else 0}; const unsigned int(16) reserved = 0;
template int(32)[9] matrix= {
0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 }; // unity matrix
unsigned int(32) width;
unsigned int(32) height;
}
Examples are as follows :
hdlr(Handler Reference Box)
Declare that at present track The type of , And the corresponding processor (handler).
handler_type The values of include :
- vide(0x76 69 64 65),video track;
- soun(0x73 6f 75 6e),audio track;
- hint(0x68 69 6e 74),hint track;
name by utf8 character string , Yes handler Describe , such as L-SMASH Video Handler( Reference resources here ).
aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) {
unsigned int(32) pre_defined = 0;
unsigned int(32) handler_type;
const unsigned int(32)[3] reserved = 0;
string name;
}
stbl(Sample Table Box)
MP4 The media data section of the file is in mdat box in , and stbl It contains the index of these media data and time information , understand stbl To decode 、 Rendering MP4 Documents are critical .
stay MP4 In file , Media data is divided into multiple chunk, Every chunk Can contain more than one sample, and sample It's made up of frames ( Usually 1 individual sample Corresponding 1 A frame )
stbl It's the key part of box contain stsd、stco、stsc、stsz、stts、stss、ctts. Here's a brief introduction , And then go through the details one by one .
stco / stsc / stsz / stts / stss / ctts / stsd summary
Here are some box A brief introduction :
- stsd: Give the video 、 Audio coding 、 Wide and high 、 Volume and other information , And each sample How many are included in frame;
- stco:thunk Offset in file ;
- stsc: Every thunk There are several sample;
- stsz: Every sample Of size( Unit is byte );
- stts: Every sample Duration ;
- stss: Which? sample It's a keyframe ;
- ctts: Time difference between frame decoding and rendering , Usually used in B The scene of the frame ;
stsd(Sample Description Box)
stsd give sample Description information of , This contains any initialization information needed in the decoding phase , such as code etc. . For video 、 For audio , The required initialization information is different , Take video as an example .
The pseudocode is as follows :
aligned(8) abstract class SampleEntry (unsigned int(32) format) extends Box(format){
const unsigned int(8)[6] reserved = 0;
unsigned int(16) data_reference_index;
}
// Visual Sequences
class VisualSampleEntry(codingname) extends SampleEntry (codingname){
unsigned int(16) pre_defined = 0;
const unsigned int(16) reserved = 0;
unsigned int(32)[3] pre_defined = 0;
unsigned int(16) width;
unsigned int(16) height;
template unsigned int(32) horizresolution = 0x00480000; // 72 dpi
template unsigned int(32) vertresolution = 0x00480000; // 72 dpi
const unsigned int(32) reserved = 0;
template unsigned int(16) frame_count = 1;
string[32] compressorname;
template unsigned int(16) depth = 0x0018;
int(16) pre_defined = -1;
}
// AudioSampleEntry、HintSampleEntry The definition omits
aligned(8) class SampleDescriptionBox (unsigned int(32) handler_type) extends FullBox('stsd', 0, 0){
int i ;
unsigned int(32) entry_count;
for (i = 1 ; i u entry_count ; i++) {
switch (handler_type){
case ‘soun’: // for audio tracks
AudioSampleEntry();
break;
case ‘vide’: // for video tracks
VisualSampleEntry();
break;
case ‘hint’: // Hint track
HintSampleEntry();
break;
}
}
}
stay SampleDescriptionBox in ,handler_type Parameters by track The type of (soun、vide、hint),entry_count The variable represents the current box in smaple description The number of entries .
stsc in ,sample_description_index It's pointing to these smaple description The index of .
For different handler_type,SampleDescriptionBox Subsequent applications are different SampleEntry type , such as video track by VisualSampleEntry.
VisualSampleEntry Contains the following fields :
- data_reference_index: When MP4 The data part of the file , Can be divided into multiple segments , Each paragraph corresponds to an index , And passed separately URL Address to get , here ,data_reference_index
Point to the corresponding fragment ( Less use of ); - width、height: The width and height of the video , Unit is pixel ;
- horizresolution、vertresolution: level 、 Vertical resolution ( Pixels / Inch ),16.16 Fixed-point number , The default is 0x00480000(72dpi);
- frame_count: One sample How many are included in frame, Yes video track Come on , The default is 1;
- compressorname: Name for reference only , Usually used to show , Occupy 32 Bytes , such as AVC Coding. First byte , It means that the name is actually occupied N The length of bytes . The first 2 To the first N+1 Bytes , Store the name . The first N+2 To 32 Bytes for padding .compressorname It can be set to 0;
- depth: Bitmap depth information , such as 0x0018(24), It means not to bring alpha Picture of the channel ;
In video tracks, the frame_count field must be 1 unless the specification for the media format explicitly documents this template field and permits larger values. That specification must document both how the individual frames of video are found (their size information) and their timing established. That timing might be as simple as dividing the sample duration by the frame count to establish the frame duration.
Examples are as follows :
stco(Chunk Offset Box)
chunk Offset in file . For small files 、 A large file , There are two different kinds of box type , Namely stco、co64, They have the same structure , It's just that the length of the field is different .
chunk_offset In the document itself offset, Not some one box The internal offset .
In the build mp4 When you file , Special attention required moov Where it is , For it chunk_offset The value of has an effect on . Somewhat MP4 Of documents moov At the end of the document , To optimize the first frame speed , Need to put moov Move to the front of the file , here , Need to be right chunk_offset To rewrite .
stco The definition is as follows :
# Box Type: ‘stco’, ‘co64’
# Container: Sample Table Box (‘stbl’) Mandatory: Yes
# Quantity: Exactly one variant must be present
aligned(8) class ChunkOffsetBox
extends FullBox(‘stco’, version = 0, 0) {
unsigned int(32) entry_count;
for (i=1; i u entry_count; i++) {
unsigned int(32) chunk_offset;
}
}
aligned(8) class ChunkLargeOffsetBox
extends FullBox(‘co64’, version = 0, 0) {
unsigned int(32) entry_count;
for (i=1; i u entry_count; i++) {
unsigned int(64) chunk_offset;
}
}
As shown in the following example , first chunk Of offset yes 47564, the second chunk The deviation of 120579, Other similar .
stsc(Sample To Chunk Box)
sample With chunk Divide units into groups .chunk Of size It can be different ,chunk Inside sample Of size It can be different .
- entry_count: How many entries are there ( Each table item , contain first_chunk、samples_per_chunk、sample_description_index Information );
- first_chunk: In the current table entry , The corresponding first one chunk The serial number of ;
- samples_per_chunk: Every chunk Contains sample Count ;
- sample_description_index: Point to stsd in sample description The index of the value ( Reference resources stsd Section );
aligned(8) class SampleToChunkBox
extends FullBox(‘stsc’, version = 0, 0) {
unsigned int(32) entry_count;
for (i=1; i u entry_count; i++) {
unsigned int(32) first_chunk;
unsigned int(32) samples_per_chunk;
unsigned int(32) sample_description_index;
}
}
The previous description is quite abstract , Here's an example , This means :
- Serial number 1~15 Of chunk, Every chunk contain 15 individual sample;
- Serial number 16 Of chunk, contain 30 individual sample;
- Serial number 17 And then chunk, Every chunk contain 28 individual sample;
- All of the above chunk Medium sample, Corresponding sample description The index of all is 1;
first_chunk | samples_per_chunk | sample_description_index |
---|---|---|
1 | 15 | 1 |
16 | 30 | 1 |
17 | 28 | 1 |
stsz(Sample Size Boxes)
Every sample Size ( byte ), according to sample_size Field , We can know the present track How many are included sample( Or frame ).
There are two different kinds of box type ,stsz、stz2.
stsz:
- sample_size: default sample size ( The unit is byte), Usually it is 0. If sample_size Not for 0, that , be-all sample It's all the same size . If sample_size by 0, that ,sample The size may be different .
- sample_count: At present track Inside sample number . If sample_size==0, that ,sample_count It's equal to the following entry The entry of ;
- entry_size: Single sample Size ( If sample_size==0 Words );
aligned(8) class SampleSizeBox extends FullBox(‘stsz’, version = 0, 0) {
unsigned int(32) sample_size;
unsigned int(32) sample_count;
if (sample_size==0) {
for (i=1; i u sample_count; i++) {
unsigned int(32) entry_size;
}
}
}
stz2:
- field_size:entry In the table , Every entry_size The number of digits occupied (bit), The optional value is 4、8、16.4 A special , When field_size be equal to 4 when , One byte contains two entry, high 4 Position as entry[i], low 4 Position as entry[i+1];
- sample_count: It's equal to the following entry The entry of ;
- entry_size:sample Size .
aligned(8) class CompactSampleSizeBox extends FullBox(‘stz2’, version = 0, 0) {
unsigned int(24) reserved = 0;
unisgned int(8) field_size;
unsigned int(32) sample_count;
for (i=1; i u sample_count; i++) {
unsigned int(field_size) entry_size;
}
}
stts(Decoding Time to Sample Box)
stts Contains DTS To sample number Mapping table , It is mainly used to deduce the duration of each frame .
aligned(8) class TimeToSampleBox extends FullBox(’stts’, version = 0, 0) {
unsigned int(32) entry_count;
int i;
for (i=0; i < entry_count; i++) {
unsigned int(32) sample_count;
unsigned int(32) sample_delta;
}
}
- entry_count:stts It contains entry Number of entries ;
- sample_count: Single entry in , Having the same duration (duration or sample_delta) Continuity of sample The number of .
- sample_delta:sample Duration ( With timescale To measure )
Let's take an example , Here's the picture ,entry_count by 3, front 250 individual sample The duration is 1000, The first 251 individual sample The length of 999, The first 252~283 individual sample The duration is 1000.
hypothesis timescale by 1000, Then the actual duration needs to be divided by 1000.
stss(Sync Sample Box)
mp4 In file , Where the keyframe is sample Serial number . without stss Words , be-all sample It's all keyframes in .
- entry_count:entry The number of entries , Think of it as the number of keyframes ;
- sample_number: Keyframes correspond to sample The serial number of ;( from 1 Start calculating )
aligned(8) class SyncSampleBox
extends FullBox(‘stss’, version = 0, 0) {
unsigned int(32) entry_count;
int i;
for (i=0; i < entry_count; i++) {
unsigned int(32) sample_number;
}
}
Examples are as follows , The first 1、31、61、91、121…271 individual sample It's a keyframe .
ctts(Composition Time to Sample Box)
From decoding (dts) To render (pts) Difference between .
For only I frame 、P Frame video , Decoding order 、 The rendering order is consistent , here ,ctts There's no need to exist .
For presence B Frame video ,ctts It needs to exist . When PTS、DTS When they are not equal , Need ctts 了 , Formula for CT(n) = DT(n) + CTTS(n) .
aligned(8) class CompositionOffsetBox extends FullBox(‘ctts’, version = 0, 0) {
unsigned int(32) entry_count;
int i;
for (i=0; i < entry_count; i++) {
unsigned int(32) sample_count;
unsigned int(32) sample_offset;
}
}
fMP4(Fragmented mp4)
fMP4 With the common mp4 The basic file structure is the same . Ordinary mp4 For on demand scenarios ,fmp4 It's usually used in live scenes .
They have the following differences :
- Ordinary mp4 Duration 、 The content is usually fixed .fMP4 Duration 、 The content is usually not fixed , You can play while generating ;
- Ordinary mp4 complete metadata All in moov in , Need to load moov box after , To be able to mdat Decode and render the media data in ;
- fMP4 in , Media data metadata stay moof box in ,moof Follow mdat ( Usually ) Pairing appears .moof It contains sample duration、sample size Etc , therefore ,fMP4 You can play while generating ;
for instance , Ordinary mp4、fMP4 top floor box The structure may be as follows . The following is written by the author MP4 Parse the widget and print it out , The code is given at the end of the article .
// Ordinary mp4
ftyp size=32(8+24) curTotalSize=32
moov size=4238(8+4230) curTotalSize=4270
mdat size=1124105(8+1124097) curTotalSize=1128375
// fmp4
ftyp size=36(8+28) curTotalSize=36
moov size=1227(8+1219) curTotalSize=1263
moof size=1252(8+1244) curTotalSize=2515
mdat size=65895(8+65887) curTotalSize=68410
moof size=612(8+604) curTotalSize=69022
mdat size=100386(8+100378) curTotalSize=169408
How to judge mp4 The document is ordinary mp4, still fMP4 Well ? You can generally see if there exists mvex(Movie Extends Box).
mvex(Movie Extends Box)
When it exists mvex when , Indicates that the current file is fmp4( Not rigorous ). here ,sample dependent metadata be not in moov in , It needs to be resolved moof box To obtain a .
The pseudocode is as follows :
aligned(8) class MovieExtendsBox extends Box(‘mvex’){
}
mehd(Movie Extends Header Box)
mehd It's optional , Used to declare the full length of the movie (fragment_duration). If it doesn't exist , You need to traverse all of the fragment, To get the full length of time . about fmp4 Scene ,fragment_duration There is no way to predict in advance .
aligned(8) class MovieExtendsHeaderBox extends FullBox(‘mehd’, version, 0) {
if (version==1) {
unsigned int(64) fragment_duration;
} else {
// version==0
unsigned int(32) fragment_duration;
}
}
trex(Track Extends Box)
To give fMP4 Of sample Set various default values , For example, duration 、 Size, etc .
aligned(8) class TrackExtendsBox extends FullBox(‘trex’, 0, 0){
unsigned int(32) track_ID;
unsigned int(32) default_sample_description_index;
unsigned int(32) default_sample_duration;
unsigned int(32) default_sample_size;
unsigned int(32) default_sample_flags
}
The meaning of the field is as follows :
- track_id: Corresponding track Of ID, such as video track、audio track Of ID;
- default_sample_description_index:sample description Default index( Point to stsd);
- default_sample_duration:sample Default duration , It's usually 0;
- default_sample_size:sample Default size , It's usually 0;
- default_sample_flags:sample Default flag, It's usually 0;
default_sample_flags Occupy 4 Bytes , More complicated , The structure is as follows :
In the old version of the specification , front 6 Bits are reserved bits , In the new specification , Only the front 4 Bit is reserved bit .is_leading The meaning is not very intuitive , The next section will focus on .
- reserved:4 bits, Keep a ;
- is_leading:2 bits, whether leading sample, Possible values include :
- 0: At present sample I'm not sure if leading sample;( It is generally set to this value )
- 1: At present sample yes leading sample, And depend on referenced I frame Ahead sample, So it can't be decoded ;
- 2: At present sample No leading sample;
- 3: At present sample yes leading sample, Don't depend on referenced I frame Ahead sample, So it can be decoded ;
- sample_depends_on:2 bits, Whether to rely on others sample, Possible values include :
- 0: It's not clear if it depends on other sample;
- 1: Depend on others sample( No I frame );
- 2: Don't rely on others sample(I frame );
- 3: Reserved values ;
- sample_is_depended_on:2 bits, Whether or not by others sample rely on , Possible values include :
- 0: It's not clear if there are other sample Rely on the present sample;
- 1: other sample May rely on the present sample;
- 2: other sample Not dependent on the present sample;
- 3: Reserved values ;
- sample_has_redundancy:2 bits, Whether there are redundant codes , Possible values include :
- 0: It is not clear whether there is redundant coding ;
- 1: There are redundant codes ;
- 2: There is no redundant coding ;
- 3: Reserved values ;
- sample_padding_value:3 bits, Fill value ;
- sample_is_non_sync_sample:1 bits, Not keyframes ;
- sample_degradation_priority:16 bits, Degrade the priority of processing ( It is generally aimed at problems in the process of spreading );
Examples are as follows :
About is_leading
is_leading It's not particularly easy to explain , The original text is pasted here , So that you can understand .
A leading sample (usually a picture in video) is defined relative to a reference sample, which is the immediately prior sample that is marked as “sample_depends_on” having no dependency (an I picture). A leading sample has both a composition time before the reference sample, and possibly also a decoding dependency on a sample before the reference sample. Therefore if, for example, playback and decoding were to start at the reference sample, those samples marked as leading would not be needed and might not be decodable. A leading sample itself must therefore not be marked as having no dependency.
For the convenience of explanation , Below leading frame Corresponding leading sample,referenced frame Corresponding referenced samle.
With H264 code For example ,H264 in I frame 、P frame 、B frame . because B frame The existence of , Video frame Decoding order 、 Rendering order It may not be the same .
mp4 One of the characteristics of the document , It supports random position playback . such as , On video sites , You can drag the progress bar to fast forward .
A lot of times , The moment the progress bar is positioned , It doesn't have to be I frame . In order to be able to play , We need to look forward to the nearest one I frame , If possible , From the nearest I frame Start decoding and playing ( in other words , Not necessarily the closest from the front I Frame play ).
Locate the frame described above at this moment , Referred to as leading frame.leading frame The nearest one ahead I frame , be called referenced frame.
In retrospect is_leading by 1 or 3 The situation of , The same is leading frame, When to decode (decodable), When can't decode (not decodable)?
1: this sample is a leading sample that has a dependency before the referenced I‐picture (and is therefore not decodable);
3: this sample is a leading sample that has no dependency before the referenced I‐picture (and is therefore decodable);
1、is_leading by 1 Example : As shown below , frame 2(leading frame) Decoding depends on frame 1、 frame 3(referenced frame). In the video stream , from frame 2 Look forward to , Current I frame yes frame 3. Even if it's decoded frame 3, frame 2 I can't figure it out .
2、is_leading by 3 Example : As shown below , here , frame 2(leading frame) It can be decoded .
moof(Movie Fragment Box)
moof It's a container box, relevant metadata To embed box in , such as mfhd、 tfhd、trun etc. .
The pseudocode is as follows :
aligned(8) class MovieFragmentBox extends Box(‘moof’){
}
mfhd(Movie Fragment Header Box)
Simple structure ,sequence_number by movie fragment The serial number of . according to movie fragment The order of production , from 1 Began to increase .
aligned(8) class MovieFragmentHeaderBox extends FullBox(‘mfhd’, 0, 0){
unsigned int(32) sequence_number;
}
traf(Track Fragment Box)
aligned(8) class TrackFragmentBox extends Box(‘traf’){
}
Yes fmp4 Come on , The data is more than one movie fragment. One movie fragment Can contain more than one track fragment( Every track contain 0 Or more track fragment). Every track fragment in , It can contain more than one track Of sample.
Every track fragment in , Contains multiple track run, Every track run Represents a continuous set of sample.
tfhd(Track Fragment Header Box)
tfhd Used to set track fragment in Of sample Of metadata The default value of .
The pseudocode is as follows , except track_ID, Others are Optional fields .
aligned(8) class TrackFragmentHeaderBox extends FullBox(‘tfhd’, 0, tf_flags){
unsigned int(32) track_ID;
// all the following are optional fields
unsigned int(64) base_data_offset;
unsigned int(32) sample_description_index;
unsigned int(32) default_sample_duration;
unsigned int(32) default_sample_size;
unsigned int(32) default_sample_flags
}
sample_description_index、default_sample_duration、default_sample_size Nothing to say , I'm just going to talk about tf_flags、base_data_offset.
First of all tf_flags, Different flag The values are as follows ( It's the same as seeking or by position ) :
- 0x000001 base‐data‐offset‐present: There is base_data_offset Field , Express The data location is relative to the Base offset .
- 0x000002 sample‐description‐index‐present: There is sample_description_index Field ;
- 0x000008 default‐sample‐duration‐present: There is default_sample_duration Field ;
- 0x000010 default‐sample‐size‐present: There is default_sample_size Field ;
- 0x000020 default‐sample‐flags‐present: There is default_sample_flags Field ;
- 0x010000 duration‐is‐empty: Indicates that the current time period does not exist sample,default_sample_duration If it exists, it is 0 ;
- 0x020000 default‐base‐is‐moof: If base‐data‐offset‐present by 1, Ignore this flag. If base‐data‐offset‐present by 0, Is the current track fragment Of base_data_offset It's from moof The first byte of begins to count ;
sample The formula for calculating the position is base_data_offset + data_offset, among ,data_offset Every sample Individually define . If not explicitly provided base_data_offset, be sample The position of is usually based on moof Relative position of .
for instance , such as tf_flags be equal to 57, Express There is base_data_offset、default_sample_duration、default_sample_flags.
base_data_offset by 1263 (ftyp、moov Of size The sum is 1263).
trun(Track Fragment Run Box)
trun The pseudocode is as follows :
aligned(8) class TrackRunBox extends FullBox(‘trun’, version, tr_flags) {
unsigned int(32) sample_count;
// the following are optional fields
signed int(32) data_offset;
unsigned int(32) first_sample_flags;
// all fields in the following array are optional
{
unsigned int(32) sample_duration;
unsigned int(32) sample_size;
unsigned int(32) sample_flags
if (version == 0)
{
unsigned int(32) sample_composition_time_offset; }
else
{
signed int(32) sample_composition_time_offset; }
}[ sample_count ]
}
I've heard of ,track run Denotes a continuous set of sample, among :
- sample_count:sample Number of ;
- data_offset: The offset of the data part ;
- first_sample_flags: Optional , Aiming at the present track run in first sample Set up ;
tr_flags as follows , Be the same in essentials while differing in minor points :
- 0x000001 data‐offset‐present: There is data_offset Field ;
- 0x000004 first‐sample‐flags‐present: There is first_sample_flags Field , The value of this field , Only the first one will be covered sample Of flag Set up ; When first_sample_flags In existence ,sample_flags There is no ;
- 0x000100 sample‐duration‐present: Every sample All have their own sample_duration, Otherwise use the default value ;
- 0x000200 sample‐size‐present: Every sample All have their own sample_size, Otherwise use the default value ;
- 0x000400 sample‐flags‐present: Every sample All have their own sample_flags, Otherwise use the default value ;
- 0x000800 sample‐composition‐time‐offsets‐present: Every sample All have their own sample_composition_time_offset;
- 0x000004 first‐sample‐flags‐present, Cover the first one sample Set up , So you can put a group of sample The first frame in is keyed , Other settings are non keyframes ;
Examples are as follows ,tr_flags by 2565. here , There is data_offset 、first_sample_flags、sample_size、sample_composition_time_offset.
边栏推荐
- Codeforces Round #803 (Div. 2)A~C
- 新手必会的静态站点生成器——Gridsome
- Spark independent cluster dynamic online and offline worker node
- 解决Intel12代酷睿CPU单线程只给小核运行的问题
- 第5章 NameNode和SecondaryNameNode
- Remove the border when input is focused
- QT simulates mouse events and realizes clicking, double clicking, moving and dragging
- Soft music -js find the number of times that character appears in the string - Feng Hao's blog
- 本地可视化工具连接阿里云centOS服务器的redis
- Educational Codeforces Round 130 (Rated for Div. 2)A~C
猜你喜欢
QT style settings of qcobobox controls (rounded corners, drop-down boxes, up expansion, editable, internal layout, etc.)
Native JS realizes the functions of all selection and inverse selection -- Feng Hao's blog
Codeforces Round #799 (Div. 4)A~H
顺丰科技智慧物流校园技术挑战赛(无t4)
第一章 MapReduce概述
Business system compatible database oracle/postgresql (opengauss) /mysql Trivia
第5章 消费者组详解
Simple records of business system migration from Oracle to opengauss database
图像处理一百题(11-20)
音视频开发面试题
随机推荐
Useeffect, triggered when function components are mounted and unloaded
Classic application of stack -- bracket matching problem
Problem - 1646C. Factorials and Powers of Two - Codeforces
(lightoj - 1236) pairs forming LCM (prime unique decomposition theorem)
解决Intel12代酷睿CPU【小核载满,大核围观】的问题(WIN11)
Market trend report, technological innovation and market forecast of double door and multi door refrigerators in China
VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
日期加1天
JS time function Daquan detailed explanation ----- AHAO blog
Install Jupiter notebook under Anaconda
CMake速成
China double brightening film (dbef) market trend report, technical dynamic innovation and market forecast
两个礼拜速成软考中级软件设计师经验
Research Report on market supply and demand and strategy of double drum magnetic separator industry in China
300th weekly match - leetcode
力扣——第298场周赛
去掉input聚焦时的边框
业务系统兼容数据库Oracle/PostgreSQL(openGauss)/MySQL的琐事
第2章 HFDS的Shell操作
QT implementation window gradually disappears qpropertyanimation+ progress bar