当前位置:网站首页>Principle and performance analysis of lepton lossless compression
Principle and performance analysis of lepton lossless compression
2022-07-05 12:56:00 【51CTO】
author :vivo Internet database team - Li Shihai
This paper mainly introduces the outline process and principle of lossless compression pictures , as well as Lepton The problems and solutions of lossless compression found in the previous research .
One 、 Start with a game
1.1 Game fault finding
Please take out your stopwatch to time , stay 15 Find out the difference between the following pictures in seconds .
time out , Did you find the difference between the two pictures ?
Two 、 The growth of the wise
In the game above , You may not find any difference between the two pictures , In fact, one of them is 3.7MB Of jpg Original drawing of format , The other one is of size 485KB Of jpg Format compressed pictures , It's just different sizes . You may be a little angry , Indignant to this is cheating , However, smart you soon have a series of questions in your brain , These question marks let you uncover the veil of the game layer by layer , No longer regret for fooling , Instead, get happiness from new knowledge .
2.1 Socrates: midwifery
- Why does the picture above become smaller ?
- Where is the lost information ?
- Why is the picture quality declining , I can't see it ?
- Can I make it smaller ?
- Can I restore it to its original size ?
- Why compress my pictures ?
Why is the picture above smaller ? Picture from 3.7MB become 485KB It's because I used the image viewing tool to save the original image as a new image , In the process of saving , There is a parameter for image quality selection , I chose the lowest quality , After saving, a smaller picture is generated . But the quality of the picture has declined , Why can't you see ? This requires understanding the principle of image compression .
2.2 Explore the story behind the appearance
Take advantage of the weakness of the human eye .
There are two kinds of cells in human retina , Cones and rods . Cone cells are used to sense color , Rod cells are used to sense brightness . And relative to color , Our perception of light and dark is more obvious . Therefore, color information can be compressed to reduce the size of the picture . So we will transform the color space before image compression ,JPEG Pictures are usually transformed into YCbCr Color space ,Y For brightness ,Cb Blue chroma ,Cr Red chroma , After the transformation, it is easier for us to deal with the color part . Then we cut a picture into pieces 8*8 Pixel block of , Then use the discrete cosine transform algorithm (DCT) Calculate the high frequency region and low frequency region . Because the human eye is not sensitive to complex information in the high-frequency region , Therefore, this part can be compressed , This process is called quantification . Finally, package the new file . This process completes the image compression . The basic flow chart is as follows :
JPEG Compression is lossy .
In the above process , After the color space conversion of the prediction module , By discarding some color density information , Increase the compression ratio . Common options are 4:2:0, After this step, we need 8 Information represented by numbers , Now it's just a matter of 2 individual , Directly abandoned 75% Of Cb Cr Information , However, this step is irreversible , This leads to the loss of image compression . In addition, in the entropy coding module , Will further use stroke length coding or Huffman Encoding further compresses the picture information , And the compression of this part is lossless , It's reversible .
(YCbCr Space conversion ) The principle of Huffman coding is as follows : If the total number of characters to be encoded is 38 Symbol data , Count them , The symbols and corresponding frequencies obtained are shown in the following table :
First , Sort all symbols by frequency , After sorting, see the following figure :
then , Select two leaf nodes with the lowest frequency , The one with the least frequency is the left child node , The other is the right child node , The root node is the sum of the frequencies of the two leaf nodes .
(Huffman Trees ) Go through the above steps , It forms a Huffman Trees ,Huffman Coding is often used in lossless compression , Its basic idea is to use short encoding to represent characters with high frequency , Use long encoding to represent characters with low frequency , This makes the average length of the encoded string 、 The expectation of length decreases , So as to achieve the purpose of compression .
3、 ... and 、 The protagonist of the story Lepton
Not perfect .
above JPEG Although the compression reduces the size of the image and the quality is good, it is difficult for the human eye to distinguish the differences , But because of the lossy compression , The image quality cannot be restored to the original quality , And actually at this time jpg The picture still has compressed space . Lepton You can do it in JPEG On this basis, the image is further lossless compressed .
3.1 Why choose Lepton
And lepton Similar compression tools include jpegcan,MozJPEG,PackJPG,PAQ8PX. But these tools have some defects more or less , Make it inferior to lepton More suitable for industrial production . such as PackJPG You need to rearrange all compressed pixel values in the file in the order of global sorting . This means that decompression is single threaded , At the same time, the whole image needs to be put into memory, which leads to high delay and low throughput of image processing . The picture below is lepton The comparison of several tools in the paper :
3.2 Lepton What optimizations have been made .
First, on the algorithm Lepton Divide the image into two parts header And the image data itself ,header Use DEFLATE Do lossless compression , The image itself uses arithmetic coding instead of Holman coding for lossless compression . because JPEG Use Huffman code , This makes it difficult to use multithreading ,Lepton Use "Huffman Switch word " Improved . secondly Lepton A complex adaptive probability model is used , This model was developed by testing on a large number of field images . The goal of this model is to produce the most accurate prediction of the value of each coefficient , This produces smaller files ; Allow multithreading concurrent processing in Engineering , Allow distributed processing across multiple servers in chunks , Stream processing line by line effectively controls the memory , At the same time, it also ensures the safety of data reading and output . It is Lepton In the optimization of the above key issues , So it can be used in production environment .
3.3 Lepton stay vivo Exploration in storage
Expected earnings :
At present, there are about 100PB data , Among them, picture data accounts for about 70%, And in the picture 90% All the pictures are jpeg Type picture , If on average 23% Compression ratio of , that 100PB * 70% * 90% * 23% = 14.5PB, Will achieve approximately 14.5PB Cost savings . At the same time, because it is lossless compression , The user experience is well guaranteed . At present lepton The design of compression function is shown in the figure below :
Current challenges :
- lepton Compression and decompression require high computing performance of the server 、 High consumption .
- Expect to make full use of idle servers CPU resources , To achieve the purpose of reducing cost and increasing efficiency .
- It has the ability to dynamically expand and shrink capacity in the face of tidal phenomena .
The main problems faced at present :
At present, most of the pictures are in 4M-5M, Tested for 4M-5M The file compression delay of size is 1s Left and right , The server is required at least 16 The core 、 bearing 5QPS. At this time, the utilization rate of each core is 95% above . so Lepton Compression of requires high computing performance . The current common solution is to use FPGA Card for hardware acceleration 、 And the horizontal expansion of a large number of computing nodes .FPGA The use of will increase the cost of hardware , Reduce the costs and benefits of compression .
Solution :
In order to solve the above problems and challenges , We try to use physical servers and Kubernetes The hybrid deployment method solves the problems of the use of computing resources and dynamic expansion , The schematic diagram of the structure is as follows :
The management of physical servers and the expansion of capacity are elastically expanded through the registration and discovery of services 、 Through this cgroup/Taskset And so on cpu Use to manage . At the same time, connect and use Kubernetes Manage as a container 、 The flexibility of the container is more suitable for this kind of computing service .
3.4 performance evaluation
Whether synchronous compression , Or asynchronous compression , Usually, we pay more attention to the delay of image reading . Reading a large number of pictures will bring great pressure to the server , The pressure mainly comes from the image decompression calculation . To improve decompression efficiency , And make full use of the company's resources , We will lepton Compressed services are distributed in cpu Idle server , According to the degree of resource idleness , free time , Make full use of the peak and valley of resources to improve computing performance . Piezometric data : We selected image files of different sizes , Compression and decompression tests are carried out in a stand-alone environment , The test results are as follows :
The compression ratio is kept at 22% about .
The above figure shows the time scale of file compression and decompression of different sizes , Orange is the decompression time , Blue is compression time .
The above picture shows pictures of different sizes , stay 32 Threads concurrent , Each thread processes 100 Test data of files .
Four 、 Common problems of image compression
4.1 Distinguish lossy and lossless compression by file format
4.2 Common lossless compression algorithms
5、 ... and 、 summary
Lepton Lossless compression can provide a relatively high compression ratio , At the same time, it does not affect the user's image quality and use experience 、 In the scenario of large amount of data, you will get obvious benefits .
deficiencies It requires high computing performance 、 Only support jpeg Type of picture . For performance requirements, the industry also has relatively mature solutions , As mentioned above FPGA And elastic calculation scheme . The key is to choose a reasonable plan according to the needs of the enterprise .
quote :
- 《The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files For a File-Storage Service》
- 《 Based on deep learning JPEG Research on image cloud storage 》
- 《JPEG-Lepton Key modules of compression technology VLSI Research on structural design 》
边栏推荐
- 在家庭智能照明中应用的测距传感芯片4530A
- [cloud native] event publishing and subscription in Nacos -- observer mode
- Simply take stock reading notes (4/8)
- Simply take stock reading notes (3/8)
- RHCSA5
- 太方便了,钉钉上就可完成代码发布审批啦!
- Lepton 无损压缩原理及性能分析
- Taobao product details API | get baby SKU, main map, evaluation and other API interfaces
- Transactions on December 23, 2021
- Vonedao solves the problem of organizational development effectiveness
猜你喜欢
LeetCode20.有效的括号
开发者,云原生数据库是未来吗?
A few years ago, I outsourced for four years. Qiu Zhao felt that life was like this
[cloud native] use of Nacos taskmanager task management
A deep long article on the simplification and acceleration of join operation
Laravel文档阅读笔记-mews/captcha的使用(验证码功能)
Overflow toolbar control in SAP ui5 view
将函数放在模块中
MySQL 巨坑:update 更新慎用影响行数做判断!!!
JSON parsing error special character processing (really speechless... Troubleshooting for a long time)
随机推荐
函数传递参数小案例
10 minute fitness method reading notes (5/5)
初识Linkerd项目
Taobao, pinduoduo, jd.com, Doudian order & Flag insertion remarks API solution
Transactions from December 29, 2021 to January 4, 2022
stirring! 2022 open atom global open source summit registration is hot!
Sqoop import and export operation
#yyds干货盘点# 解决名企真题:搬圆桌
Introduction to the principle of DNS
Introduction aux contrôles de la page dynamique SAP ui5
Rasa Chat Robot Tutorial (translation) (1)
#yyds干货盘点#js截取文件后缀名
insmod 提示 Invalid module format
《信息系统项目管理师》备考笔记---信息化知识
在家庭智能照明中应用的测距传感芯片4530A
leetcode:221. 最大正方形【dp状态转移的精髓】
Oppo Xiaobu launched Obert, a large pre training model, and promoted to the top of kgclue
Kotlin function
从39个kaggle竞赛中总结出来的图像分割的Tips和Tricks
[cloud native] event publishing and subscription in Nacos -- observer mode