当前位置:网站首页>Ffmpeg notes (I) fundamentals of audio and video
Ffmpeg notes (I) fundamentals of audio and video
2022-07-25 05:42:00 【Hello,C++!】
1 Basic concepts of image
1.1 Pixels
Pixel is the basic unit of a picture pix It's English words picture Abbreviation Add English words Elements element” Got it. pixel” abbreviation px therefore Pixels Yes The meaning of image elements .2500 × 2000 The picture of means that there are horizontal 2500 Pixels , Vertical 2000 Pixels , The total is 500 Ten thousand pixels , Also known as 500 10 megapixel photo .
1.2 The resolution of the
Resolution refers to the size or size of the image . such as 1920 x 1080 A picture of is a picture with a horizontal width 1920 Pixels , Vertical height 1080 Pixels .
Common resolution :360P(640 x 360)、720P(1280x720) 、 1080P(1920x1080) 、 4K(3840x2160) 、 8K(7680x4320) etc. .
Used to say 1080 and 720 Actually speaking of vertical pixels . According to width : High for 16:9 Calculate the proportion of ,720p The number of horizontal pixels of is 720 ÷ 9 × 16 = 1280, The total pixels are 921600 The pixels are approximately 92 Mega pixels . 1080p The horizontal pixel of is 1080 ÷ 9 × 16 = 1920, The total pixels are about 200 Mega pixels , yes 720p Of 2 More than double . The more pixels, the clearer the video , therefore 1080p Than 720p The video is clearer . The higher the resolution of the image , The clearer the image .


Provide an article explaining better resolution :
https://segmentfault.com/a/1190000023769775
1.3 A deep
Bit Bit depth is also called “ Bit resolution ”(Bit resolution), Represents the number of binary bits contained in an image .
1 Bit depth can only display single bit information in a picture , So the graphics can only have pure black and white colors .8 A deep (2 Of 8 Power ) It means that there is 256 A combination of grayscale or color .16 A deep (2 Of 16 Power ) Can express 65 536 Two possible color combinations .24 Bit depth can express about 1670 Thousands of different colors .
Because the eyes of ordinary people can only distinguish about 1200~1400 Thousands of different colors and shades , therefore 24 Bit color is also called “ photo ” Color or true color . Usually ,24 Bit color channels are allocated 8 Bit data , in other words : red , green , blue , Each of these three primary colors can have 256 Change . That is to say, it can be used 3 Byte representation 24 A color .(8x3=24)
Computers are storing R、G、B when , All of them adopt a 8bit The storage space of , So computers can express 256256256=16,777,216 = 1677 Ten thousand colors . Different values represent colors .
The greater the bit depth of each channel , The larger the color value that can be represented , For example, high-end TV now says 10bit color , That is, each channel uses 10bit Express , Each channel has 1024 Color . 102410241024, about 10,7374 Ten thousand colors =10 Billion colors , yes 8bit Of 64 times .
1.4 Frame rate
stay 1 Number of frames of pictures transmitted in seconds . It can also be understood that the graphics processor can refresh several times per second , such as 25 fps It means that one second has 25 A picture .
FPS The higher the frame rate, the smoother the video picture , The lower, the more stuck .
Because the visual image temporarily stays in the retina , Generally, the image frame rate can reach
24 frame , We think that the image is continuous .
The higher the frame rate , The smoother the picture , The higher the performance of the equipment required .
1.5 Bit rate
The data flow used by video files per unit time . such as 1 Mbps .
In most cases, the higher the bit rate The higher the resolution , The clearer it becomes . But fuzzy video file size ( Bit rate ) It can be very big , A video file with a small resolution may also be clearer than a video file with a large resolution .
For the same original image source , The same coding algorithm , The higher the bit rate , The smaller the distortion of the image , The clearer the video will be .
1.6 Stride
Refers to the space occupied by each row of pixels in memory . In order to realize memory alignment, the space occupied by each row of pixels in memory Does not necessarily Is the width of the image .
Stride Is the name of these extensions , Stride Also known as Pitch , If there is extended content at the end of each row of pixels in the image , Stride The value of must be greater than the width of the image , As shown in the figure below :
Alignment issues :
Like resolution 638x480 Of RGB24 Images , If we want to With 16 Byte alignment is 6383/16=119.625 Not divisible , So we can't 16 Byte alignment , We need to fill in the end of each line 6 Bytes . Namely (638+2 -->640), 6403/16=120 . At this time, the stride by 1920 byte .
Like resolution 638x480 Of YUV420P Images , When we process the memory, if we want to 16 Byte alignment , be 638 Can not be 16 to be divisible by , We need to fill in the end of each line 2 Bytes . Namely 640 . At this time, the Y stride by 640 byte .
2 YUV Knowledge aggregation
2.1 YUV Definition
YUV Of "Y" Component represents brightness ( That's the gray scale value )、"UV" Component represents chromaticity . among “u” Blue hue ,“v” Reddish hue .
YUV Set brightness Y and UV The benefits of expressing separately :
1、 Avoid interfering with each other , Rely solely on Y You can also display a black-and-white picture completely , It solves the compatibility problem between black-and-white TV and color TV .
2、 Reduce chromaticity (UV) Sampling rate It will not affect the image quality too much It reduces the bandwidth of video signal transmission bandwidth The requirements of . It can be done by UV The sampling frequency of is modified to reduce the bandwidth , Save network traffic , Indirectly reduce the video delay problem .
2.2 YUV The format of
YUV Is a relatively general term For its specific arrangement It can be divided into many specific formats :
1、 pack packed Format : take Every pixel point Y 、 U 、 V Components are arranged crosswise And the pixels are continuously stored in the same array Usually several adjacent pixels form a macro pixel macro pixel
Packaging mode , Generally speaking, different small quantities are packed together as large quantities .
2、 Plane planar Format : Use three arrays to store them separately and continuously Y 、 U 、 V The three components namely Y 、 U 、 V Stored in their respective arrays .
Flat pattern , Popular understanding is to Y、U、V The components are tiled separately .
2.3 YUV sampling
Main sampling methods :
1、YUV 4:4:4 sampling , Indicates that the chroma channel has no down sampling , That is, a Y The component corresponds to a U Component and one V component 

2、YUV 4:2:2 sampling , Express 2:1 Sampling at the same level , No vertical down sampling , That is, every Two Y Components share one U Component and one V component .

3、YUV 4:2:0 sampling , Express 2:1 Sampling at the same level , 2:1 Vertical down sampling , That is, every four Y Components share one U Component and one V component .

2.4 FFMPEG in YUV data storage
1、I444(YUV 444 P) Format : Corresponding Ffmpeg Pixels represent AV_PIX_FMT_ YUV444P, This type is plane mode 
2、I422 (YUV422P) Format
Corresponding Ffmpeg Pixels represent AV_PIX_FMT_YUV422 P, This type is plane mode 
3、4:2:0 Format YUV420P
Corresponding Ffmpeg Pixels represent AV_PIX_FMT_ YUV420 P( That is to say I420), This type is in flat format , Occupy (4+1+1)/4 = 1.5 Bytes 
4、4:2:0 Format NV12
Corresponding Ffmpeg Pixels represent AV_PIX_FMT_NV12, This type is in flat format 
5、 Look at the difference from the memory layout
YV12 And I420 The difference between :
YYYYYYYY VV UU (YV12)
YYYYYYYY UU VV (I420)
You can see U Weight and V The order of components is interchanged .
NV12 and NV21 The difference between :
YYYYYYYY UV UV (NV12)
YYYYYYYY VU VU(NV21)
You can see U and V The positions of the are interchanged .
2.5 RGB and YUV Transformation
Usually RGB and YUV The direct mutual conversion is to call the interface implementation such as Ffmpeg Of swscale perhaps libyuv Such as the library .
YUV(256 Level ) It can be downloaded from 8 position RGB Direct calculation :
Y = 0.299*R + 0.587*G + 0.114*B;
U = 0.169*R 0.331*G + 0.5 *B
V = 0.5 *R 0.419*G 0.081*B;
8bit In the case of bit depth
TV range yes 16 235(Y) 、 16 240(UV) , Also called Limited Range.
PC range yes 0 255 , Also called Full Range.
and RGB No, range Points , Is full of 0 255.
In turn, ,RGB You can also go straight from YUV (256 Level ) Calculation
R = Y + 1.402 (Y-128)
G = Y - 0.34414 (U 128) 0.71414 (U 128)
B = Y + 1.772 (V - 128)
from YUV go to RGB If the value is less than 0 Want to take 0 , If it is greater than 255 Want to take 255.
problem :RGB and YUV Transformation Why is the green screen displayed when decoding errors ?
analysis : Because when decoding fails ,YUV Fill in all the components as 0 value , And then according to the formula :
R = 1.402 * (128) = 126.598
G = 0.34414*( 128) 0.71414*( 128) = 44.04992 + 91.40992 = 135.45984
B = 1.772 * (128) = 126.228
RGB The range of values is [0 255], So the final calculated value is :
R = 0,G = 135.45984,B = 0. At this time only G The component has a value, so it is green .
3 Audio video correlation
3.1 The principle of audio and video recording

3.2 The principle of audio and video playback :

3.3 I、P、B frame



3.4、 Common video compression algorithms
MPEG2 MPEG camp
H264 MPEG camp
H265 MPEG camp
AVS The Chinese Camp
VP8 Google camp
VP9 Google camp
3.5、 Package format
Package format ( It's also called a container ) Is to encode and compress the video stream 、 The audio stream and subtitles are put into a file according to a certain scheme , Easy to play software play .
Generally speaking , The suffix of a video file is its encapsulation format .
The format of encapsulation is different , The suffix is not the same .
such as : The same sink can be made into dumplings or steamed buns . There's also a reason for video , The same audio and video stream can be carried in different containers .

Knowable flow 0 It's video format 、 It uses h264 Compression algorithm , flow 1 Is the audio format 、 It uses mp3 Compression algorithm .
Common video packaging formats :
AVI、MKV、MPE、MPG、MPEG
MP4、WMV、MOV、3GP
M2V、M1V、M4V、OGM
RM、RMS、RMM、RMVB、IFO
SWF、FLV、F4V、
ASF、PMF、XMB、DIVX、PART
DAT、VOB、M2TS、TS、PS
among H264+AAC Encapsulated in the FLV or MP4 It's the most popular model .
3.6、 Audio video synchronization
The concept of audio and video synchronization :
DTS(Decoding Time Stamp): Decoding timestamp , The meaning of this time stamp is to tell the player when to decode the data of this frame .
PTS(Presentation Time Stamp): Display time stamp , This timestamp is used to tell the player when to display the data of this frame .
Audio and video synchronization mode :
Audio Master: Sync video to audio
Video Master: Sync audio to video
External Clock Master: Synchronize audio and video to an external clock .
In general Audio Master > External Clock Master > Video Master
Before the end of the , Provide a test video download website in various formats
https://sample-videos.com/
The website provides download files in various audio and video formats , Aspect test .
边栏推荐
- 聊聊 Redis 是如何进行请求处理
- Softing pnGate系列网关:将PROFIBUS总线集成到PROFINET网络
- npx和npm区别
- uniapp手机端uView的u-collapse组件高度init
- HTB-Granpa
- Samsung folding screen has sent samples to apple and Google, and the annual production capacity will be expanded from 2.4 million to 10million!
- 动态规划学习笔记
- Get URL of [url reference]? For the following parameters, there are two ways to get the value of the corresponding parameter name and convert the full quantity to the object structure
- Msys2 common configuration
- 微服务 - 远程调用(Feign组件)
猜你喜欢

Idea commonly used 10 shortcut keys

Productivity tool in the new era -- flowus information flow comprehensive evaluation

Adaptation dynamics | in June, sequoiadb completed mutual certification with five products

Leetcode 237. delete nodes in the linked list

C Programming -- the solution of dynamic programming of "the sum of the largest subarray"

Leetcode 15: sum of three numbers

Vim查找替换及正则表达式的使用

Microservice configuration center Nacos

HTB-Optimum

C100: smallest hevc visual IOT MCU
随机推荐
LeetCode 15:三数之和
Atof(), atoi(), atol() functions [detailed]
ABC 261.D - Flipping and Bonus ( DP )
Vim查找替换及正则表达式的使用
QT qtextedit setting qscrollbar style sheet does not take effect solution
LCP plug-in creates peer VLAN interface
Realsense d435i depth map optimization_ High precision mode
HTB-Granpa
Productivity tool in the new era -- flowus information flow comprehensive evaluation
Leetcode 237. delete nodes in the linked list
Single sign on (one sign on, available everywhere)
CCID released the "Lake warehouse integrated technology research report", and Jushan database was selected as a typical representative of domestic enterprises
obj文件格式与.mtl文件格式
批量下载视频小技巧
Sword finger offer 05. replace spaces
50:第五章:开发admin管理服务:3:开发【查询admin用户名是否已存在,接口】;(这个接口需要登录时才能调用;所以我们编写了拦截器,让其拦截请求,判断用户是否是登录状态;)
Basset: learning the regulatory code of the accessible genome with deep convolutional neural network
微服务 - 远程调用(Feign组件)
R language uses data.table function to create data.table data (use: operator to create continuous numeric vector)
For data security reasons, the Dutch Ministry of Education asked schools to suspend the use of Chrome browser