author :vivo Internet database team - Li Shihai
This paper mainly introduces the outline process and principle of lossless compression pictures , as well as Lepton The problems and solutions of lossless compression found in the previous research .
One 、 Start with a game
1.1 Game fault finding
Please take out your stopwatch to time , stay 15 Find out the difference between the following pictures in seconds .
time out , Did you find the difference between the two pictures ?
Two 、 The growth of the wise
In the game above , You may not find any difference between the two pictures , In fact, one of them is 3.7MB Of jpg Original drawing of format , The other one is of size 485KB Of jpg Format compressed pictures , It's just different sizes . You may be a little angry , Indignant to this is cheating , However, smart you soon have a series of questions in your brain , These question marks let you uncover the veil of the game layer by layer , No longer regret for fooling , Instead, get happiness from new knowledge .
2.1 Socrates: midwifery
- Why does the picture above become smaller ?
- Where is the lost information ?
- Why is the picture quality declining , I can't see it ?
- Can I make it smaller ?
- Can I restore it to its original size ?
- Why compress my pictures ?
Why is the picture above smaller ? Picture from 3.7MB become 485KB It's because I used the image viewing tool to save the original image as a new image , In the process of saving , There is a parameter for image quality selection , I chose the lowest quality , After saving, a smaller picture is generated . But the quality of the picture has declined , Why can't you see ? This requires understanding the principle of image compression .
2.2 Explore the story behind the appearance
Take advantage of the weakness of the human eye .
There are two kinds of cells in human retina , Cones and rods . Cone cells are used to sense color , Rod cells are used to sense brightness . And relative to color , Our perception of light and dark is more obvious .
Therefore, color information can be compressed to reduce the size of the picture .
So we will transform the color space before image compression ,JPEG Pictures are usually transformed into YCbCr Color space ,Y For brightness ,Cb Blue chroma ,Cr Red chroma , After the transformation, it is easier for us to deal with the color part . Then we cut a picture into pieces 8*8 Pixel block of , Then use the discrete cosine transform algorithm (DCT) Calculate the high frequency region and low frequency region .
Because the human eye is not sensitive to complex information in the high-frequency region , Therefore, this part can be compressed , This process is called quantification . Finally, package the new file . This process completes the image compression .
The basic flow chart is as follows :
JPEG Compression is lossy .
In the above process , After the color space conversion of the prediction module , By discarding some color density information , Increase the compression ratio . Common options are 4:2:0, After this step, we need 8 Information represented by numbers , Now it's just a matter of 2 individual , Directly abandoned 75% Of Cb Cr Information , However, this step is irreversible , This leads to the loss of image compression . In addition, in the entropy coding module , Will further use stroke length coding or Huffman Encoding further compresses the picture information , And the compression of this part is lossless , It's reversible .
(YCbCr Space conversion )
The principle of Huffman coding is as follows :
If the total number of characters to be encoded is 38 Symbol data , Count them , The symbols and corresponding frequencies obtained are shown in the following table :
First , Sort all symbols by frequency , After sorting, see the following figure :
then , Select two leaf nodes with the lowest frequency , The one with the least frequency is the left child node , The other is the right child node , The root node is the sum of the frequencies of the two leaf nodes .
(Huffman Trees )
Go through the above steps , It forms a Huffman Trees ,Huffman Coding is often used in lossless compression , Its basic idea is to use short encoding to represent characters with high frequency , Use long encoding to represent characters with low frequency , This makes the average length of the encoded string 、 The expectation of length decreases , So as to achieve the purpose of compression .
3、 ... and 、 The protagonist of the story Lepton
Not perfect .
above JPEG Although the compression reduces the size of the image and the quality is good, it is difficult for the human eye to distinguish the differences , But because of the lossy compression , The image quality cannot be restored to the original quality , And actually at this time jpg The picture still has compressed space .
Lepton You can do it in JPEG On this basis, the image is further lossless compressed .
3.1 Why choose Lepton
And lepton Similar compression tools include jpegcan,MozJPEG,PackJPG,PAQ8PX. But these tools have some defects more or less , Make it inferior to lepton More suitable for industrial production .
such as PackJPG You need to rearrange all compressed pixel values in the file in the order of global sorting . This means that decompression is single threaded , At the same time, the whole image needs to be put into memory, which leads to high delay and low throughput of image processing .
The picture below is lepton The comparison of several tools in the paper :
3.2 Lepton What optimizations have been made .
First, on the algorithm Lepton Divide the image into two parts header And the image data itself ,header Use DEFLATE Do lossless compression , The image itself uses arithmetic coding instead of Holman coding for lossless compression . because JPEG Use Huffman code , This makes it difficult to use multithreading ,Lepton Use "Huffman Switch word " Improved .
secondly Lepton A complex adaptive probability model is used , This model was developed by testing on a large number of field images . The goal of this model is to produce the most accurate prediction of the value of each coefficient , This produces smaller files ; Allow multithreading concurrent processing in Engineering , Allow distributed processing across multiple servers in chunks , Stream processing line by line effectively controls the memory , At the same time, it also ensures the safety of data reading and output .
It is Lepton In the optimization of the above key issues , So it can be used in production environment .
3.3 Lepton stay vivo Exploration in storage
Expected earnings :
At present, there are about 100PB data , Among them, picture data accounts for about 70%, And in the picture 90% All the pictures are jpeg Type picture , If on average 23% Compression ratio of , that 100PB * 70% * 90% * 23% = 14.5PB, Will achieve approximately 14.5PB Cost savings .
At the same time, because it is lossless compression , The user experience is well guaranteed . At present lepton The design of compression function is shown in the figure below :
Current challenges :
- lepton Compression and decompression require high computing performance of the server 、 High consumption .
- Expect to make full use of idle servers CPU resources , To achieve the purpose of reducing cost and increasing efficiency .
- It has the ability to dynamically expand and shrink capacity in the face of tidal phenomena .
The main problems faced at present :
At present, most of the pictures are in 4M-5M, Tested for 4M-5M The file compression delay of size is 1s Left and right , The server is required at least 16 The core 、 bearing 5QPS. At this time, the utilization rate of each core is 95% above . so Lepton Compression of requires high computing performance . The current common solution is to use FPGA Card for hardware acceleration 、 And the horizontal expansion of a large number of computing nodes .FPGA The use of will increase the cost of hardware , Reduce the costs and benefits of compression .
Solution :
In order to solve the above problems and challenges , We try to use physical servers and Kubernetes The hybrid deployment method solves the problems of the use of computing resources and dynamic expansion , The schematic diagram of the structure is as follows :
The management of physical servers and the expansion of capacity are elastically expanded through the registration and discovery of services 、 Through this cgroup/Taskset And so on cpu Use to manage . At the same time, connect and use Kubernetes Manage as a container 、 The flexibility of the container is more suitable for this kind of computing service .
3.4 performance evaluation
Whether synchronous compression , Or asynchronous compression , Usually, we pay more attention to the delay of image reading . Reading a large number of pictures will bring great pressure to the server , The pressure mainly comes from the image decompression calculation . To improve decompression efficiency , And make full use of the company's resources , We will lepton Compressed services are distributed in cpu Idle server , According to the degree of resource idleness , free time , Make full use of the peak and valley of resources to improve computing performance .
Piezometric data :
We selected image files of different sizes , Compression and decompression tests are carried out in a stand-alone environment , The test results are as follows :
The compression ratio is kept at 22% about .
The above figure shows the time scale of file compression and decompression of different sizes , Orange is the decompression time , Blue is compression time .
The above picture shows pictures of different sizes , stay 32 Threads concurrent , Each thread processes 100 Test data of files .
Four 、 Common problems of image compression
4.1 Distinguish lossy and lossless compression by file format
4.2 Common lossless compression algorithms
5、 ... and 、 summary
Lepton Lossless compression can provide a relatively high compression ratio , At the same time, it does not affect the user's image quality and use experience 、 In the scenario of large amount of data, you will get obvious benefits .
deficiencies It requires high computing performance 、 Only support jpeg Type of picture . For performance requirements, the industry also has relatively mature solutions , As mentioned above FPGA And elastic calculation scheme . The key is to choose a reasonable plan according to the needs of the enterprise .
quote :
- 《The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files For a File-Storage Service》
- 《 Based on deep learning JPEG Research on image cloud storage 》
- 《JPEG-Lepton Key modules of compression technology VLSI Research on structural design 》
Lepton More related articles on the principle and performance analysis of lossless compression
- MYSQL Index structure principle 、 Performance analysis and optimization
[ turn ]MYSQL Index structure principle . Performance analysis and optimization The first part : Basic knowledge of Indexes The official introduction index is to help MySQL Data structure for efficient data acquisition . I understand that the index is equivalent to the catalogue of a book , You need to know where the information is through the catalog , Not page by page ...
- 【 turn 】 Explore from the simple to the deep mysql Index structure principle 、 Performance analysis and optimization
Abstract : The first part : Basic knowledge of The second part :MYISAM and INNODB Index structure 1. Brief introduction B-tree B+ tree Trees 2.MyisAM Index structure 3.Annode Index structure 4.MyisAM Index and Inno ...
- DDS Working principle and performance analysis
DDS Working principle and performance analysis Statement : Please indicate the source of the quotation http://blog.csdn.net/lg1259156776/ Series blog description : This series of blogs belongs to the author's knowledge about electronic circuit design and other hardware stored in the third and fourth stages of his freshman year ...
- PHP Function implementation principle and performance analysis
Preface In any language , Functions are the most basic constituent units . about php Function of , What are its characteristics ? How function calls are implemented ?php What is the performance of the function , Any suggestions ? This paper will start from the principle analysis, combined with the actual performance test, try to solve these problems ...
- PHP Foundation Series ( 3、 ... and ) 【 turn 】PHP Function implementation principle and performance analysis
author :HDK ( Baidu ) Preface In any language , Functions are the most basic constituent units . about PHP Function of , What are its characteristics ? How function calls are implemented ?php What is the performance of the function , Any suggestions ? This paper will analyze from the principle and combine with the reality ...
- ( turn )PHP Function implementation principle and performance analysis
Preface In any language , Functions are the most basic constituent units . about php Function of , What are its characteristics ? How function calls are implemented ?php What is the performance of the function , Any suggestions ? this paper This paper will analyze these problems from the principle, combined with the actual performance test ...
- DevTools Implementation principle and performance analysis
One . introduction from 2008 year Google Release the first version of Chrome after , Whole Web The development field seems to be injected with fresh blood , Gradually broken IE The era of a dominant family .Chrome and Firefox yes W3 ...
- Explore from the simple to the deep mysql Index structure principle 、 Performance analysis and optimization turn
The first part : Basic knowledge of The second part :MYISAM and INNODB Index structure 1. Brief introduction B-tree B+ tree Trees 2. MyisAM Index structure 3. Annode Index structure 4. MyisAM Index and Inno ...
- Explore from the simple to the deep mysql Index structure principle 、 Performance analysis and optimization
Abstract : The first part : Basic knowledge of The second part :MYISAM and INNODB Index structure 1. Brief introduction B-tree B+ tree Trees 2.MyisAM Index structure 3.Annode Index structure 4.MyisAM Index and Inno ...
- [ Reprint ] Explore from the simple to the deep mysql Index structure principle 、 Performance analysis and optimization
The first part : Basic knowledge part II :MYISAM and INNODB Index structure 1. Brief introduction B-tree B+ tree Trees 2. MyisAM Index structure 3. Annode Index structure 4. MyisAM Index and InnoDB ...
Random recommendation
- A quick victory (3) - PHP: Function basis , Function parameter , Function return value , Variable function , Anonymous functions , Closure function , Callback function
[ Source download ] A quick victory (3) - PHP: Function basis , Function parameter , Function return value , Variable function , Anonymous functions , Closure function , Callback function author :webabcd Introduce the quick fight and quick decision And PHP Function basis Function parameter Letter ...
- Deeply understand and apply XMLHttpRequest
This is a reprinted article , Happy to see the hunter , Worried about being lost , So post this for a rainy day . Original address : delivery You really can use XMLHttpRequest Do you ? xmlhttprequest http cors ajax ruoyiqing ...
- Hbase Installation (hadoop-2.6.0,hbase1.0)
Hbase The installation of is relatively simple ... As long as you install Hadoop loading Hbase It's just a matter of minutes If you want to install hadoop Word of the cluster hadoop The classified cluster is installed , If the stand-alone version has been installed ~ Then configure it as follows ~ One .vi ...
- -webkit-appearance Change the browser default style of any element
Some time ago , The company has an emergency press conference , You need to create an invitation page on the mobile terminal . But when implementing the drop-down box ,IOS The effect is always different from that of Android . After my search , I found... By chance -webkit-appearance This style attribute . after ...
- Memcached User manual
memcached brief introduction 1.memcached Is a high-performance distributed memory object caching system , By maintaining a unified giant in memory hash surface , It can be used to store data in various formats , Include images . video . File and database search results ...
- Python Beginners
Recommended by students , I've learned Python Language , see Python Introduction to , It is an object-oriented interpretive scripting language , When I first saw this sentence, I was thinking , A scripting language is also made object-oriented ? Is this necessary ? Forgive me for being superficial . It is also commonly known as ...
- SpringCloud Comb the overall modules of the micro framework series
The following is a Spring Cloud Core functions : Distributed / Versioned configuration services register and discover routing services and call load balancing circuit breaker distributed messaging between services Through this picture , Let's take a look at the running process of each component configuration : 1. Ask for uniform passage A ...
- Calculate the MD5 value
/// <summary> /// Calculate the MD5 value /// </summary> /// <param name="fileName"> ...
- well-known APP( Alipay 、 WeChat 、 Petals, etc ) Home page design skills and prototype examples
APP Home page design right APP In itself is crucial , An excellent APP product , Its home page design not only needs to clearly show the core functions of the product , Create a good user experience for users , But also need to show the company's brand image , Enhance brand awareness in the hearts of users . today , Just ...
- Use command wsimport Generate WebService client
Use command wsimport Generate WebService client wsimpost The command has several important parameters : -keep: Whether to generate java Source file -d: Specify output directory -s: Specify the source code output directory -p ...