当前位置:网站首页>Snappy format parsing

Snappy format parsing

2022-06-22 03:35:00 A rainy spring night

(Owed by: Happy rain in spring night http://blog.csdn.net/chunyexiyu Personal essay )
Reference resources :https://en.wikipedia.org/wiki/Snappy_(compression)

executive summary

snappy yes google An open source compression method , Use and lz4 Close to , It is mainly used for high-speed compression and decompression .

snappy The compressed string encoding form is as follows :

  1. Original string length (7bit Significant bit compression shaping );
  2. (1-N) 00 Type string /01 Type string /10 Type string /11 Type string : These types appear one after the other based on needs .

and lz The difference is ,snappy First, the original string length is saved , Then the compressed data content ;
This is convenient for decompression , Read the original length when decompressing , Directly allocate the memory size and length according to this , Used for storing decompressed data .

00/01/10/11 String type description

snappy Format compressed string , The first is to save the following 7bits Length of storage , Then is 4 The expression of the type in .

00/01/10/11 The description of the type string is as follows :
The lower two of the first byte bits To mark , Which type does it belong to 00/01/10/11, Then the subsequent bytes are parsed based on the type .
The remaining bytes in the stream are encoded using one of four element types. The element type is encoded in the lower two bits of the first byte (tag byte) of the element.

00- Original string literal type :
low 2 It's stored 00;
high 6 Bit a value that identifies the length of the next raw string mapping ( non-existent 0 Of length , Value bit 0 Hour stands for length 1);
Several of these values have special significance :
0b111100=60 After the identification length exists 1 In bytes
0b111101=61 After the identification length exists 2 In bytes
0b111110=62 After the identification length is saved 3 In bytes
0b111111=63 After the identification length exists 4 In bytes
for example :0b10000000 Represents that the length of the original string is 32+1=33 length ;
for example :0b11110000 0b10000000, Represents that the length of the original string is 128+1=129;
Be careful : After solving the length , You need to read the following bytes of length characters into the output , This type can only be solved .
00 – Literal – uncompressed data; upper 6 bits are used to store length (len-1) of data. Lengths larger than 60 are stored in a 1-4 byte integer indicated by a 6 bit length of 60 (1 byte) to 63 (4 bytes).

01- From existing data copy: After the description information band 1 byte
low 2 Bitstore 01;
one by one 3 Bitstore copy length ,copy Length from 4 Start ;
in addition 3 Bit and the next byte as the offset value .
for example :0b00010001 0b10000000 representative ,0b010=2, The length is 2+4=6,0b000 100000000 = 128, The offset value is -128
01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;

10- From existing data copy: After the description information band 2 byte
low 2 Bitstore 10;
Other 6 Bitstore copy length ,copy Length from 4 Start ;
The next two bytes store the offset value .
for example :0b10000010 0b00000001 0b00000000 0b100000=32, The length is 32+4=36,0x100 = 256, Offset for the -256
10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;

11- From existing data copy: After the description information band 3 byte
low 2 Bitstore 11;
Same as 10 equally , Other 6 Bitstore copy length ,copy Length from 4 Start ;;
The next three bytes store the offset value .
for example :0b10000011 0b00000001 0b00000000 0b00000000, The length is 32+4=36,0x100000 = 65536, Offset for the -65536
11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;

Unzip sample

The following is an example of decompression after string compression ;

The original string :
length -112
Content

Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream).

Compressed string :
length -108
Content

0x0000021CB4585340  70 b0 53 6e 61 70 70 79 20 65 6e 63 6f 64 69 6e  p?Snappy encodin
0x0000021CB4585350  67 20 69 73 20 6e 6f 74 20 62 69 74 2d 6f 72 69  g is not bit-ori
0x0000021CB4585360  65 6e 74 65 64 2c 20 62 75 74 20 62 79 74 65 15  ented, but byte.
0x0000021CB4585370  13 e4 20 28 6f 6e 6c 79 20 77 68 6f 6c 65 20 62  .? (only whole b
0x0000021CB4585380  79 74 65 73 20 61 72 65 20 65 6d 69 74 74 65 64  ytes are emitted
0x0000021CB4585390  20 6f 72 20 63 6f 6e 73 75 6d 65 64 20 66 72 6f   or consumed fro
0x0000021CB45853A0  6d 20 61 20 73 74 72 65 61 6d 29 2e 00 00 00 00  m a stream).....

The compressed string is decoded byte by byte :

  1. 0x70 - 112, Identifies the source string length 112

  2. 0xb0 - 0b10110000 (00)-Literal, 101100=32+8+4=44 length ,+1=45 length

  3. The original string -45 byte ,copy To the output
    0x53 6e 61 70 70 79 20 65 6e 63 6f 64 69 6e
    0x67 20 69 73 20 6e 6f 74 20 62 69 74 2d 6f 72 69
    0x65 6e 74 65 64 2c 20 62 75 74 20 62 79 74 65

  4. 0x15-0b00010101 (01)-3bits-len,11bits-offset, len=0b101+4=9( from 4 Start )
    0x13-0b00010011 offset=0b00010011 = 19,
    offset=19 Back to -oriented Before , Length use 9, The corresponding 9 byte copy To the output

  5. e4 - 0b11100100 (00)-literal, 111001=32+16+8+1=57,+1=58 length

  6. The original string -58 byte ,copy To the output
    0x20 28 6f 6e 6c 79 20 77 68 6f 6c 65 20 62
    0x79 74 65 73 20 61 72 65 20 65 6d 69 74 74 65 64
    0x20 6f 72 20 63 6f 6e 73 75 6d 65 64 20 66 72 6f
    0x6d 20 61 20 73 74 72 65 61 6d 29 2e

After the above operations , Restore the compressed string and output the original string ;

(Owed by: Happy rain in spring night http://blog.csdn.net/chunyexiyu Personal essay )

原网站

版权声明
本文为[A rainy spring night]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/173/202206220315370841.html