当前位置：网站首页>[elt.zip] openharmony paper Club - witness file compression system erofs

[elt.zip] openharmony paper Club - witness file compression system erofs

2022-06-25 18:32:00 【ELT. ZIP】

This article from the ELT.ZIP The team ,ELT<=>Elite( elite ),.ZIP In compressed format ,ELT.ZIP That is to compress the elite .
member ：
- Sophomore at Shanghai University of engineering and Technology
- Sophomores of Hefei Normal University
- Sophomores at Tsinghua University
- Freshman of Chengdu University of Information Engineering
- Freshmen of Heilongjiang University
- Freshman of South China University of Technology
We are from 6 A place to Classmate , We are OpenHarmony Growth plan gnawing paper Club in , And Huawei 、 Soft power 、 Runhe software 、 Rio d information 、 Shenkaihong Wait for the company , Study and Research Operating system technology …

List of articles

【 This issue focuses on 】

Uncover Google Apps EROFS The reason for going to Android
EROFS Why can it shine in the field of file compression systems
The mystery of the LZ4m How to bring new vitality to the memory compression field

【 technology DNA】

Please add a picture description

【 Intelligent scene 】

**********	********************	********************	********************	********************	********************	********************	********************	********************	********************	********************	********************	********************	********************	********************	*****************	*****************	********************
scene	Autopilot / AR	Voice signals	Streaming video	GPU Rendering	science 、 Cloud computing	Memory reduction	Scientific application	Medical images	database server	Artificial intelligence image	Text transfer	GAN Media compression	Image compression	File synchronization	Database system	General data	System data reading and writing
technology	Point cloud compression	‎ Sparse fast Fourier transform ‎	Lossy video compression	Grid compression	Dynamic selection compression algorithm framework	lossless compression	Hierarchical data compression	Medical image compression	Lossless universal compression	Artificial intelligence image compression	Short string compression	GAN Compressed on-line multi particle distillation	Image compression	File transfer compression	Fast random access string compression	High throughput parallel lossless compression	Enhanced read-only file systems
Open source project	Draco / Based on deep learning algorithm /PCL/OctNet	SFFT	AV1 / H.266 code / H.266 decode /VP9	MeshOpt / Draco	Ares	LZ4	HCompress	DICOM	Brotli	RAISR	AIMCS	OMGD	OpenJPEG	rsync	FSST	ndzip	EROFS

introduction

In a highly information-based modern society , A smart phone , Has become an integral part of our lives .

Insert picture description here

For the growing Phubber , A smart phone Performance is particularly important .

Insert picture description here

The current problem

Smart phones usually have limited storage space and runtime memory .

1. Low user perceived storage space

Due to cost constraints , Smart phones are usually resource scarce . meanwhile ,Android The space occupied by the operating system It's also increasing . meanwhile ,Android Storage consumption of applications And it's growing , therefore , The storage capacity of low-end smartphones available to users is quite small . Besides , Many top applications of smart phones tend to consume a lot of memory , Even on high-end smartphones , also Only a small amount of memory is left for the system startup operation .

2. The design purpose of the system is contrary to the application reality

Read only file systems are designed to minimize the use of storage . However , Using the existing compression to read only the file system on the smart phone with scarce resources will lead to huge overhead in terms of performance and memory consumption , therefore , In the long term , the nation is in peril .

3. Limitations of the old scheme

· Btrfs

Btrfs It's a universal file system , Therefore, its internal structure must consider effective data modification , Not for compression And actively optimize . Besides , Compression is not the only measure . decompression In process Memory consumption is limited . For devices like smartphones , performance and Response ability Is an unshakable key indicator . therefore , With the burden of efficient data modification ,Btrfs It's hard to satisfy at the same time performance and Compression efficiency The requirements of .

· Squashfs

Fixed size input compression

Existing file systems compress raw data in fixed size blocks , Generate compressed data of variable size .
This will cause great I/O Amplification and computation waste , Even if the number of compressed blocks is very large , The file system must read all related compressed blocks . meanwhile , The process of decompressing useless data can also cause huge CPU waste , This leads to other high-performance disturbances in operation .

Massive memory consumption and data movement .

When file read request ,Squashfs The metadata will be searched first , Get the number of related compressed blocks . therefore , During decompression Squashfs Need a lot of temporary memory .

New scheme EROFS（ Enhanced compressed file system ）

Android be based on Kernel Developed from , So it's been used all the time Linux Standard file system for ext4, but ext4 The storage image structure of is not suitable for mobile phone flash memory .2016 year , Huawei has launched optimized F2FS(Flash-Friendly File System) file system , It replaces the traditional ext4, Improved file reading and writing fluency , However, the system partition where the operating system read-only files are running has not been changed . To solve this problem ,2018 year , Huawei has further launched its own EROFS(Enhanced Read-Only File System), It uses an optimized fixed output size lz4 Compression algorithm , For the first time EMUI9.1, The equipment performance has been greatly improved . And more recently ,Android 13 Primordial also announced to follow up .

Insert picture description here

Naturally, ,HarmonyOS And OpenHarmony It also has its shadow in ：

Insert picture description here

characteristic

Use fixed size output compression .
Compress file data into multiple fixed size blocks , To significantly alleviate the read amplification problem , And minimize unnecessary calculations .
Efficient decompression of memory , High performance with little additional memory overhead .
By using the compression algorithm （ Such as LZ4） Characteristics ,EROFS Different memory efficient decompression schemes are designed , To reduce additional memory usage during decompression .
EROFS It also adopts a set of optimization schemes that carefully ensure a guaranteed user experience .

Detailed explanation ：

1. Fixed size output compression

Insert picture description here

reason ： In order to overcome the read amplification caused by fixed size input compression ,EROFS A different compression method is used ： Fixed size output compression
Implementation method ： Fixed size sliding window
technological process ： The compression algorithm will be called many times , Until all file data is compressed .
for example , about 1MB Sliding window of ,EROFS Each time... Is provided to the compression algorithm 1MB Raw data . Then the algorithm compresses the original data as much as possible , Until all the 1MB All the data is consumed , Or the consumed data can just be generated 4KB Of compressed data .（ The minimum size of the requested underlying storage device is 4KB） The remaining raw data is combined with more data , Form another 1MB Raw data , Used for the next call to compress .
Compression ratio ： Because fixed size output compression can compress data as much as possible , Until the output limit is reached , The fixed size input compression can only compress the fixed size data at a time .

Insert picture description here

During decompression , Only compressed blocks containing the requested data are read and processed . When a single raw block is requested , Up to two compressed blocks will be read and decompressed .
Post - sized output compression makes in - place decompression possible , This further reduces EROFS Memory consumption in .

2. cache I/O And local I/O（I/O—— Input / Output ）

starting point ： Before the actual decompression ,EROFS Need space to store compressed data retrieved from storage . Each compression retrieves only a maximum of two compressed blocks .
Strategy ：

cache I/O( Partially decomposed data block )

EROFS Manage a special The index node , Its Page caching Storage press Physical block ordinal index The compression block of . therefore , For cached I/O, EROFS Will be in a special inode Allocate a page to the page cache of launch I/O（ Input / Output ） request , In this way, the compressed data will be Storage drive Get directly to the assigned page .
- Tips:
- Page caching ：
  Because the speed of reading and writing hard disk is much slower than that of reading and writing memory . To avoid every time you read or write a file , All need to read and write to the hard disk ,Linux The kernel will be in page size （4KB） In units of , Divide the file into multiple data blocks , When a user reads and writes a data block in a file , The kernel first requests a memory page （ This is called page caching ） Cache with data blocks in the file .
- inode：
  inode Stores some meta information about file system objects , The address of the corresponding file system storage block, etc , got it 1 File inode number , You can go to inode Find out the storage address of the file content data in the metadata .

In situ I/O（ Fully decomposed compressed block ）

Every time you read a file ,VFS（ Virtual file system ） In the page cache , Assign pages to the file system , To store file data . For any of these pages that do not contain meaningful data before decompression , It is called a reusable page . For local I/O,EROFS Will use the last reusable page , To initialize the I/O request .
Tips：VFS（ Virtual file system ）：
Interfaces that allow operating systems to use different file system implementations .

3. decompression

Take a single block as an example

Insert picture description here

front 5 Block (D0 To D4) And partial blocks D5 Compressed into blocks C0, The remaining blocks are compressed into blocks C1.

Three decompression methods are introduced below ：

Vmap decompression

Decompress to get the block D3 and D4 Data in ,EROFS First read the compressed block from the memory C0, And store it in memory . then EROFS The following steps will be followed to decompress .

Find the data stored in the compressed block (C0) The maximum encoding of the required block in , This is the fifth block in the example (D4).EROFS Just before unzipping 5 Block (D0 To D4), Instead of decompressing all the original data blocks .

Insert picture description here

For each data block that needs to be decomposed , Find storage space to store them . In the legend ,EROFS Allocated 3 individual Temporary physical pages to store D0、D1 and D2. For the two blocks requested D3 and D4, EROFS Reuse in the page cache VFS Two physical pages that have been allocated .

Insert picture description here

Because the decompression algorithm needs Continuous memory As Destination for decompression , EROFS adopt vmap Interface Will be prepared in the previous step Physical pages are mapped to contiguous virtual memory areas .

Insert picture description here

If it is local I/O, under these circumstances , Compression block (C0) Stored in Page cache pages in ,EROFS You also need to compress the data (C0) Copied to the A temporary pre——CPU page （ The next scenario will introduce ）, In this way, the decompressed data will not overwrite the compressed data during the decompression process .

Insert picture description here

Finally, the decompression algorithm is called , Extract the data in the compressed block to the continuous storage area . After the compression , All three temporary physical pages and virtual memory areas can be reclaimed , The requested data has been written to the corresponding page cache page .

Existing problems ：

Need to dynamically allocate physical memory pages , This increases memory pressure on memory constrained devices .
Use vmap and vunmap The efficiency of each decompression is very low .

There are two optimization schemes for dynamic memory allocation

Per-CPU buffer decompression

EROFS utilize percpu buffer To mitigate that the decompressed data is less than 4 Page problems .

Insert picture description here

1. A four page memory buffer is pre allocated to each CPU As every CPU buffer , For extraction no more than 4 Decompression of data blocks ,EROFS Decompress the data to per-cpu buffer .

Insert picture description here

D8 Data in is requested ,C1 Compressed data in Extract directly to per-cpu buffer .

Then copy the requested data to the page cache page .

Insert picture description here

D8 The content of memcpy complex Make it to the page cache page .

advantage ：

per-CPU The length of the buffer is determined by experience , According to this fixed length, data is extracted for caching ; meanwhile ,per-CPU Buffers can be reused in different decompression processes , To sum up, it can effectively eliminate vmap Dynamic memory allocation problem .
pre-cpu Buffer decompression is a cost-effective trade-off , It relieves vmap Problems in decompression , Additional memory copies are also introduced .

Scroll to extract

principle ：

EROFS Allocated a large virtual memory area as well as for each CPU Distribute 16 A physical page . Before each compression ,EROFS Will use this 16 Physics pages , Populate with the physical pages of the page cache VM ( fictitious ) Area .

Insert picture description here

Tips：
EROFS Use LZ4 As a compression algorithm , You need to look back and decompress it before it exceeds 64KB The data of . therefore , For an extract more than 16 Compression of pages ,EROFS Maps can be reused 16 Physical pages of virtual pages （ namely 64 kb）.

Roll

In the picture , Used to store D0 To D15 The virtual address of the block is determined by 16 Physical pages support .

Insert picture description here

because D16 Each virtual address in is associated with D0 The distance between the corresponding addresses in is 16 A physical page , Start a new round of rolling .

Insert picture description here

therefore D16 Virtual pages of can be created by and D0 The same physical page to support ,D17 adopt D1 The physical pages used are supported in the same way .
D18 Is requested by the file being read , Physical pages using page cache .

Insert picture description here

Sum up , By using this scrolling decompression 16 A physical page is enough to handle any decompression situation . up to now , We have a preliminary understanding of ：EROFS choose 4k The reason for the fixed output size is —— 4k It is the smallest unit for page management and data storage , Can effectively avoid I/O A larger version of ; choose lz4 As a compression algorithm, I like its extremely fast decompression speed and good compression ratio ; Only the file data is in EROFS Will be compressed ,inode Metadata such as catalog entries remains the same .
The file of the operating system contains the file data itself , There are also many permissions and attributes attached . Permissions and attributes are placed in inode, The actual data is placed in the data block . without doubt , The data block records the actual contents of the file . and inode In addition to the extended attributes of the record file , The number of the corresponding data block is also recorded , And one file will occupy one inode. The super block records the overall information of the entire file system , Include inode And the total number of data blocks 、 Usage quantity 、 The remaining amount, file system format and related information, etc .

Structural layout

Similar to many file systems ,EROFS There is also a Super block At the front end of the whole structure . After that , Metadata and actual data can be stored in order or in mixed form .

Insert picture description here

As shown in the figure , Each block behind the super block corresponds to a file . Expand it , You can see that the beginning stores inode, Followed by the xattrs Extended attribute and block index , Last , An encoded block representing compressed or uncompressed file data is stored at the end of each file . The block index is used to quickly locate the corresponding encoded block of the read request , It is an inclusion of 8 An array of byte length entries , Each entry corresponds to a data block before compression , It also indicates whether the data block is a header block （ head? ）, To start a new encoding block . If it's a header , The address of the encoding block will also be stored （ blkaddr? ）、 Offset of the first byte in the new encoding block （ offset? ）、 Whether the encoding block has been compressed （ cmpr? ） And whether this block can be decompressed in place （ dip? ）; If it wasn't for the header , You must try to determine a header block before uncompressed blocks , The difference in the number of uncompressed blocks and header blocks is recorded in dist in .
The overall structural layout is now almost ready , When we need to use these data ,EROFS How to operate it ？
First , Read requests for uncompressed data blocks ,EROFS The entry of the block index will be obtained based on the requested block number . secondly , If it is for the head block ,EROFS Will be taken from blkaddr Decompress the data of the encoding block at , If the offset is non-zero , May need to be from blkaddr The last code block stored before begins to decompress . in addition , For non header blocks ,EROFS Based on stored dist Value reverses the position of the corresponding header block , From the beginning blkaddr Start decompression at , Until the data of the requested block is completely decompressed . Some data blocks are still large after compression , Therefore, most of them are not compressed , Instead, it is stored directly in the form of coding blocks , Avoid excessive time and space costs . Above situation , It is embodied in the block index cmpr The field value is set to N(false) The entry of .

Technical optimization scheme

1. Index memory optimization

EROFS Hundreds of pages of raw data can be compressed into only one compressed block , But in this case , Hundreds of pointers will be required to track the pages where the original data should be stored , It may consume a lot of memory . So ,EROFS Will try to use reusable pages to store this information . If there are more than one VFS Assigned pages can be reused , Select the last page to store compressed data , Pointers to the remaining pages are moved to the stack .

2. Scheduling optimization

In image Btrfs In such a file system , When compressed data is read into memory , Will wake up a dedicated thread to decompress the data . After the decompression , Used to implement I/O The reader thread will be awakened , To get the unzipped data from the page cache . therefore , This scheduling overhead between each other cannot be ignored . therefore ,EROFS The strategy of using reader threads throughout the process is adopted .

3. Queue decompression algorithm

You can make multiple requests to decompress at the same time . When a thread A The requested data is being processed by the thread B When decompressing , Threads A You don't need to decompress it again , You can reuse threads directly from the page cache B Extracted data , So as to realize cooperative reuse of decompressed data , Prevent single data from being decompressed multiple times .

4. Mirror patch

Even though EROFS Compressed is a read-only partition , However, the data may still change during system update and other operations . Even if only a few bits in the file data are simply modified , A large number of other relevant scattered data may also need to be modified , This causes the entire partition to recompress .EROFS Provides an image patch that can be placed at the end of the image , Exempt the modification of compressed read-only partition , Instead, apply a patch to update and patch the original data after decompression .

Scene performance

1. Compression ratio and memory usage

The use sizes are 953.67MB、202.1MB Of enwik9 And silesia.tar Test the compression ratio as a data set , The results are as follows . You can see , And 4K Size input Squashfs comparison ,4K Size output EROFS The compression ratio is better 10% and 9%.

Insert picture description here

The following figure shows understanding compression enwik9 The amount of memory used by ：

Insert picture description here

2. Throughput and storage space

Insert picture description here

The picture above depicts EROFS and ext4 Performance of throughput and storage space saving under different conditions . Under normal circumstances ,ext4 The throughput of is stable ;EROFS The performance is to save space while increasing , The overall level is gradually rising , When the amount of space saved is high enough , Achieved very good throughput . When the amount of space saved is low ,EROFS The performance for random reads is similar to ext4, Compared with the performance of sequential reading ext4 Poor .

3. Application start time

The R & D personnel have tested and evaluated the startup time of several applications , It turns out that , And ext4 comparison ,EROFS The average number of low-end mobile phones app Reduced startup time 5%, High end phones are shorter 2.3%. Take one of the cameras app For example ：

Insert picture description here