当前位置:网站首页>Principle and performance analysis of lepton lossless compression
Principle and performance analysis of lepton lossless compression
2022-07-05 12:56:00 【51CTO】
author :vivo Internet database team - Li Shihai
This paper mainly introduces the outline process and principle of lossless compression pictures , as well as Lepton The problems and solutions of lossless compression found in the previous research .
One 、 Start with a game
1.1 Game fault finding
Please take out your stopwatch to time , stay 15 Find out the difference between the following pictures in seconds .
 Lepton Principle and performance analysis of lossless compression _ lossless compression](/img/63/be42c0017a871376fc3ac5bbcf6d64.png)
time out , Did you find the difference between the two pictures ?
Two 、 The growth of the wise
In the game above , You may not find any difference between the two pictures , In fact, one of them is 3.7MB Of jpg Original drawing of format , The other one is of size 485KB Of jpg Format compressed pictures , It's just different sizes . You may be a little angry , Indignant to this is cheating , However, smart you soon have a series of questions in your brain , These question marks let you uncover the veil of the game layer by layer , No longer regret for fooling , Instead, get happiness from new knowledge .
2.1 Socrates: midwifery
- Why does the picture above become smaller ?
- Where is the lost information ?
- Why is the picture quality declining , I can't see it ?
- Can I make it smaller ?
- Can I restore it to its original size ?
- Why compress my pictures ?
Why is the picture above smaller ? Picture from 3.7MB become 485KB It's because I used the image viewing tool to save the original image as a new image , In the process of saving , There is a parameter for image quality selection , I chose the lowest quality , After saving, a smaller picture is generated . But the quality of the picture has declined , Why can't you see ? This requires understanding the principle of image compression .
2.2 Explore the story behind the appearance
Take advantage of the weakness of the human eye .
There are two kinds of cells in human retina , Cones and rods . Cone cells are used to sense color , Rod cells are used to sense brightness . And relative to color , Our perception of light and dark is more obvious . Therefore, color information can be compressed to reduce the size of the picture . So we will transform the color space before image compression ,JPEG Pictures are usually transformed into YCbCr Color space ,Y For brightness ,Cb Blue chroma ,Cr Red chroma , After the transformation, it is easier for us to deal with the color part . Then we cut a picture into pieces 8*8 Pixel block of , Then use the discrete cosine transform algorithm (DCT) Calculate the high frequency region and low frequency region . Because the human eye is not sensitive to complex information in the high-frequency region , Therefore, this part can be compressed , This process is called quantification . Finally, package the new file . This process completes the image compression . The basic flow chart is as follows :
JPEG Compression is lossy .
In the above process , After the color space conversion of the prediction module , By discarding some color density information , Increase the compression ratio . Common options are 4:2:0, After this step, we need 8 Information represented by numbers , Now it's just a matter of 2 individual , Directly abandoned 75% Of Cb Cr Information , However, this step is irreversible , This leads to the loss of image compression . In addition, in the entropy coding module , Will further use stroke length coding or Huffman Encoding further compresses the picture information , And the compression of this part is lossless , It's reversible .
 Lepton Principle and performance analysis of lossless compression _ lossless compression _03](/img/b0/e031e321fc47403f448d0f422dfaf1.png)
(YCbCr Space conversion ) The principle of Huffman coding is as follows : If the total number of characters to be encoded is 38 Symbol data , Count them , The symbols and corresponding frequencies obtained are shown in the following table :
 Lepton Principle and performance analysis of lossless compression _ lossless compression _04](/img/8b/71ddcea07ecf8f3e4ca5c5e6bd9b24.png)
First , Sort all symbols by frequency , After sorting, see the following figure :
 Lepton Principle and performance analysis of lossless compression _lepton_05](/img/c6/bb1314ea93960a73031cb0bf3b3484.png)
then , Select two leaf nodes with the lowest frequency , The one with the least frequency is the left child node , The other is the right child node , The root node is the sum of the frequencies of the two leaf nodes .
 Lepton Principle and performance analysis of lossless compression _ lossless compression _06](/img/f7/b04a436737a00469e1c47fb7c1c3d0.png)
 Lepton Principle and performance analysis of lossless compression _ Object storage _07](/img/5e/b0ca0ba3e809d0e3c40409d2e2a234.png)
 Lepton Principle and performance analysis of lossless compression _lepton_08](/img/7d/d0d596f14cec175672a59c11238ff0.png)
(Huffman Trees ) Go through the above steps , It forms a Huffman Trees ,Huffman Coding is often used in lossless compression , Its basic idea is to use short encoding to represent characters with high frequency , Use long encoding to represent characters with low frequency , This makes the average length of the encoded string 、 The expectation of length decreases , So as to achieve the purpose of compression .
3、 ... and 、 The protagonist of the story Lepton
Not perfect .
above JPEG Although the compression reduces the size of the image and the quality is good, it is difficult for the human eye to distinguish the differences , But because of the lossy compression , The image quality cannot be restored to the original quality , And actually at this time jpg The picture still has compressed space . Lepton You can do it in JPEG On this basis, the image is further lossless compressed .
3.1 Why choose Lepton
And lepton Similar compression tools include jpegcan,MozJPEG,PackJPG,PAQ8PX. But these tools have some defects more or less , Make it inferior to lepton More suitable for industrial production . such as PackJPG You need to rearrange all compressed pixel values in the file in the order of global sorting . This means that decompression is single threaded , At the same time, the whole image needs to be put into memory, which leads to high delay and low throughput of image processing . The picture below is lepton The comparison of several tools in the paper :
 Lepton Principle and performance analysis of lossless compression _ The server _09](/img/fa/94cea5ab96a985a5a34714bec6e8b6.png)
3.2 Lepton What optimizations have been made .
First, on the algorithm Lepton Divide the image into two parts header And the image data itself ,header Use DEFLATE Do lossless compression , The image itself uses arithmetic coding instead of Holman coding for lossless compression . because JPEG Use Huffman code , This makes it difficult to use multithreading ,Lepton Use "Huffman Switch word " Improved . secondly Lepton A complex adaptive probability model is used , This model was developed by testing on a large number of field images . The goal of this model is to produce the most accurate prediction of the value of each coefficient , This produces smaller files ; Allow multithreading concurrent processing in Engineering , Allow distributed processing across multiple servers in chunks , Stream processing line by line effectively controls the memory , At the same time, it also ensures the safety of data reading and output . It is Lepton In the optimization of the above key issues , So it can be used in production environment .
3.3 Lepton stay vivo Exploration in storage
Expected earnings :
At present, there are about 100PB data , Among them, picture data accounts for about 70%, And in the picture 90% All the pictures are jpeg Type picture , If on average 23% Compression ratio of , that 100PB * 70% * 90% * 23% = 14.5PB, Will achieve approximately 14.5PB Cost savings . At the same time, because it is lossless compression , The user experience is well guaranteed . At present lepton The design of compression function is shown in the figure below :
Current challenges :
- lepton Compression and decompression require high computing performance of the server 、 High consumption .
- Expect to make full use of idle servers CPU resources , To achieve the purpose of reducing cost and increasing efficiency .
- It has the ability to dynamically expand and shrink capacity in the face of tidal phenomena .
The main problems faced at present :
At present, most of the pictures are in 4M-5M, Tested for 4M-5M The file compression delay of size is 1s Left and right , The server is required at least 16 The core 、 bearing 5QPS. At this time, the utilization rate of each core is 95% above . so Lepton Compression of requires high computing performance . The current common solution is to use FPGA Card for hardware acceleration 、 And the horizontal expansion of a large number of computing nodes .FPGA The use of will increase the cost of hardware , Reduce the costs and benefits of compression .
Solution :
In order to solve the above problems and challenges , We try to use physical servers and Kubernetes The hybrid deployment method solves the problems of the use of computing resources and dynamic expansion , The schematic diagram of the structure is as follows : Lepton Principle and performance analysis of lossless compression _ The server _11](/img/6f/50c77170ff3df2242cdc50e352e399.jpg)
The management of physical servers and the expansion of capacity are elastically expanded through the registration and discovery of services 、 Through this cgroup/Taskset And so on cpu Use to manage . At the same time, connect and use Kubernetes Manage as a container 、 The flexibility of the container is more suitable for this kind of computing service .
3.4 performance evaluation
Whether synchronous compression , Or asynchronous compression , Usually, we pay more attention to the delay of image reading . Reading a large number of pictures will bring great pressure to the server , The pressure mainly comes from the image decompression calculation . To improve decompression efficiency , And make full use of the company's resources , We will lepton Compressed services are distributed in cpu Idle server , According to the degree of resource idleness , free time , Make full use of the peak and valley of resources to improve computing performance . Piezometric data : We selected image files of different sizes , Compression and decompression tests are carried out in a stand-alone environment , The test results are as follows :
 Lepton Principle and performance analysis of lossless compression _ The server _12](/img/ef/07a954f0d6e09f759c35b09eda2855.png)
The compression ratio is kept at 22% about .
 Lepton Principle and performance analysis of lossless compression _ Picture compression _13](/img/4e/95bde36b601f92777e951791165e08.png)
The above figure shows the time scale of file compression and decompression of different sizes , Orange is the decompression time , Blue is compression time .
 Lepton Principle and performance analysis of lossless compression _lepton_14](/img/e1/aeef0eeccbf6473276944cf0cdba68.png)
The above picture shows pictures of different sizes , stay 32 Threads concurrent , Each thread processes 100 Test data of files .
Four 、 Common problems of image compression
4.1 Distinguish lossy and lossless compression by file format
 Lepton Principle and performance analysis of lossless compression _ The server _15](/img/ff/71004d3d5bad249e3c8c5d50e49dac.png)
4.2 Common lossless compression algorithms
 Lepton Principle and performance analysis of lossless compression _lepton_16](/img/5f/e3429fc2bc18864ebc9789b564f9e0.png)
5、 ... and 、 summary
Lepton Lossless compression can provide a relatively high compression ratio , At the same time, it does not affect the user's image quality and use experience 、 In the scenario of large amount of data, you will get obvious benefits .
deficiencies It requires high computing performance 、 Only support jpeg Type of picture . For performance requirements, the industry also has relatively mature solutions , As mentioned above FPGA And elastic calculation scheme . The key is to choose a reasonable plan according to the needs of the enterprise .
quote :
- 《The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files For a File-Storage Service》
- 《 Based on deep learning JPEG Research on image cloud storage 》
- 《JPEG-Lepton Key modules of compression technology VLSI Research on structural design 》
边栏推荐
- 前几年外包干了四年,秋招感觉人生就这样了..
- Taobao short videos are automatically released in batches without manual RPA open source
- 跨平台(32bit和64bit)的 printf 格式符 %lld 输出64位的解决方式
- leetcode:221. 最大正方形【dp状态转移的精髓】
- 滴滴开源DELTA:AI开发者可轻松训练自然语言模型
- 2021-12-21 transaction record
- 开发者,云原生数据库是未来吗?
- Using docker for MySQL 8.0 master-slave configuration
- 10 minute fitness method reading notes (2/5)
- Compile kernel modules separately
猜你喜欢

SAP UI5 FlexibleColumnLayout 控件介绍

MySQL giant pit: update updates should be judged with caution by affecting the number of rows!!!

Lepton 无损压缩原理及性能分析

Taobao flag insertion remarks | logistics delivery interface

CVPR 2022 | single step 3D target recognizer based on sparse transformer

谈谈我写作生涯的画图技巧

CVPR 2022 | 基于稀疏 Transformer 的单步三维目标识别器

简单上手的页面请求和解析案例

How to connect the API interface of Taobao open platform (super detailed)

SAP UI5 视图里的 OverflowToolbar 控件
随机推荐
A possible investment strategy and a possible fuzzy fast stock valuation method
10 minute fitness method reading notes (2/5)
[Nacos cloud native] the first step of reading the source code is to start Nacos locally
Introduction aux contrôles de la page dynamique SAP ui5
滴滴开源DELTA:AI开发者可轻松训练自然语言模型
SAP UI5 ObjectPageLayout 控件使用方法分享
在家庭智能照明中应用的测距传感芯片4530A
insmod 提示 Invalid module format
Time conversion error
10 minute fitness method reading notes (1/5)
【云原生】Nacos-TaskManager 任务管理的使用
RHCAS6
SAP SEGW 事物码里的 Association 建模方式
Notes for preparation of information system project manager --- information knowledge
946. Verify stack sequence
Kotlin process control and circulation
Comprehensive upgrade of Taobao short video photosynthetic platform
OPPO小布推出预训练大模型OBERT,晋升KgCLUE榜首
#yyds干货盘点# 解决名企真题:搬圆桌
Simply take stock reading notes (3/8)