当前位置:网站首页>Snappy format parsing
Snappy format parsing
2022-06-22 03:35:00 【A rainy spring night】
(Owed by: Happy rain in spring night http://blog.csdn.net/chunyexiyu Personal essay )
Reference resources :https://en.wikipedia.org/wiki/Snappy_(compression)
executive summary
snappy yes google An open source compression method , Use and lz4 Close to , It is mainly used for high-speed compression and decompression .
snappy The compressed string encoding form is as follows :
- Original string length (7bit Significant bit compression shaping );
- (1-N) 00 Type string /01 Type string /10 Type string /11 Type string : These types appear one after the other based on needs .
and lz The difference is ,snappy First, the original string length is saved , Then the compressed data content ;
This is convenient for decompression , Read the original length when decompressing , Directly allocate the memory size and length according to this , Used for storing decompressed data .
00/01/10/11 String type description
snappy Format compressed string , The first is to save the following 7bits Length of storage , Then is 4 The expression of the type in .
00/01/10/11 The description of the type string is as follows :
The lower two of the first byte bits To mark , Which type does it belong to 00/01/10/11, Then the subsequent bytes are parsed based on the type .
The remaining bytes in the stream are encoded using one of four element types. The element type is encoded in the lower two bits of the first byte (tag byte) of the element.
00- Original string literal type :
low 2 It's stored 00;
high 6 Bit a value that identifies the length of the next raw string mapping ( non-existent 0 Of length , Value bit 0 Hour stands for length 1);
Several of these values have special significance :
0b111100=60 After the identification length exists 1 In bytes
0b111101=61 After the identification length exists 2 In bytes
0b111110=62 After the identification length is saved 3 In bytes
0b111111=63 After the identification length exists 4 In bytes
for example :0b10000000 Represents that the length of the original string is 32+1=33 length ;
for example :0b11110000 0b10000000, Represents that the length of the original string is 128+1=129;
Be careful : After solving the length , You need to read the following bytes of length characters into the output , This type can only be solved .
00 – Literal – uncompressed data; upper 6 bits are used to store length (len-1) of data. Lengths larger than 60 are stored in a 1-4 byte integer indicated by a 6 bit length of 60 (1 byte) to 63 (4 bytes).
01- From existing data copy: After the description information band 1 byte
low 2 Bitstore 01;
one by one 3 Bitstore copy length ,copy Length from 4 Start ;
in addition 3 Bit and the next byte as the offset value .
for example :0b00010001 0b10000000 representative ,0b010=2, The length is 2+4=6,0b000 100000000 = 128, The offset value is -128
01 – Copy with length stored as 3 bits and offset stored as 11 bits; one byte after tag byte is used for part of offset;
10- From existing data copy: After the description information band 2 byte
low 2 Bitstore 10;
Other 6 Bitstore copy length ,copy Length from 4 Start ;
The next two bytes store the offset value .
for example :0b10000010 0b00000001 0b00000000 0b100000=32, The length is 32+4=36,0x100 = 256, Offset for the -256
10 – Copy with length stored as 6 bits of tag byte and offset stored as two-byte integer after the tag byte;
11- From existing data copy: After the description information band 3 byte
low 2 Bitstore 11;
Same as 10 equally , Other 6 Bitstore copy length ,copy Length from 4 Start ;;
The next three bytes store the offset value .
for example :0b10000011 0b00000001 0b00000000 0b00000000, The length is 32+4=36,0x100000 = 65536, Offset for the -65536
11 – Copy with length stored as 6 bits of tag byte and offset stored as four-byte little-endian integer after the tag byte;
Unzip sample
The following is an example of decompression after string compression ;
The original string :
length -112
Content
Snappy encoding is not bit-oriented, but byte-oriented (only whole bytes are emitted or consumed from a stream).
Compressed string :
length -108
Content
0x0000021CB4585340 70 b0 53 6e 61 70 70 79 20 65 6e 63 6f 64 69 6e p?Snappy encodin
0x0000021CB4585350 67 20 69 73 20 6e 6f 74 20 62 69 74 2d 6f 72 69 g is not bit-ori
0x0000021CB4585360 65 6e 74 65 64 2c 20 62 75 74 20 62 79 74 65 15 ented, but byte.
0x0000021CB4585370 13 e4 20 28 6f 6e 6c 79 20 77 68 6f 6c 65 20 62 .? (only whole b
0x0000021CB4585380 79 74 65 73 20 61 72 65 20 65 6d 69 74 74 65 64 ytes are emitted
0x0000021CB4585390 20 6f 72 20 63 6f 6e 73 75 6d 65 64 20 66 72 6f or consumed fro
0x0000021CB45853A0 6d 20 61 20 73 74 72 65 61 6d 29 2e 00 00 00 00 m a stream).....
The compressed string is decoded byte by byte :
0x70 - 112, Identifies the source string length 112
0xb0 - 0b10110000 (00)-Literal, 101100=32+8+4=44 length ,+1=45 length
The original string -45 byte ,copy To the output
0x53 6e 61 70 70 79 20 65 6e 63 6f 64 69 6e
0x67 20 69 73 20 6e 6f 74 20 62 69 74 2d 6f 72 69
0x65 6e 74 65 64 2c 20 62 75 74 20 62 79 74 650x15-0b00010101 (01)-3bits-len,11bits-offset, len=0b101+4=9( from 4 Start )
0x13-0b00010011 offset=0b00010011 = 19,
offset=19 Back to -oriented Before , Length use 9, The corresponding 9 byte copy To the outpute4 - 0b11100100 (00)-literal, 111001=32+16+8+1=57,+1=58 length
The original string -58 byte ,copy To the output
0x20 28 6f 6e 6c 79 20 77 68 6f 6c 65 20 62
0x79 74 65 73 20 61 72 65 20 65 6d 69 74 74 65 64
0x20 6f 72 20 63 6f 6e 73 75 6d 65 64 20 66 72 6f
0x6d 20 61 20 73 74 72 65 61 6d 29 2e
After the above operations , Restore the compressed string and output the original string ;
(Owed by: Happy rain in spring night http://blog.csdn.net/chunyexiyu Personal essay )
边栏推荐
- AtCoder Beginner Contest 252(dijkstra,逆向思维)
- R数据分析:临床预测模型中校准曲线和DCA曲线的意义与做法
- 策略模式
- Modèle stratégique
- Nebula Graph学习篇2_版本v2.6.1之前的bug导致OOM
- Atcoder beginer contest 252 (Dijkstra, reverse thinking)
- Based on logback XML to realize the insensitive operation of saving log information
- 倍福TwinCAT3伺服控制常用功能块的实现
- eu5,eu7,ex3,ex5安装第三方app
- 【NVMe2.0b 5】NVM Subsystem
猜你喜欢

调度功能:splunk-operator-controller-manager

3DE recover from design

Why is the first program a programmer writes "Hello world!"

基于logback.xml实现保存日志信息的无感操作

Vs loading symbols causes program to start slowly

cmd看控制台输出红桃、方块、黑桃、梅花乱码解决

美容院怎样做活动

eu5,eu7,ex3,ex5安装第三方app

利用jemalloc解决flink的内存溢出问题

Analyzing iceberg merge tasks to resolve data conflicts
随机推荐
AtCoder Regular Contest 142
作为接口的模板
倍福TwinCAT3中PLC程序变量定义和硬件IO关联
华硕重装系统键盘灯失效 =>重装ATK驱动
std::make_shared特点
The cloned VMware virtual host network card cannot be started solution
【NVMe2.0b 5】NVM Subsystem
C # custom sorting
Rabbmitmq simple mode < 1 >
zombie进程与orphan进程
指定它为网关,它就成为网关了么
Policy mode
powerdesigner CDM中联系理解
Summary of image classification based on pytoch: swing transformer
R数据分析:临床预测模型中校准曲线和DCA曲线的意义与做法
倍福Twincat NC PTP使用介绍
3DE new simulation status
c# 自定义排序
3000 yuan projector comparison and evaluation, dangbei D3x beats Jimi new Z6 x
Fastdfs-5.0.5 installation