当前位置:网站首页>Experiment III LZW
Experiment III LZW
2022-07-23 07:58:00 【Myster_ KID】
LZW Codec algorithm
List of articles
LZW code (LZW Encoding) also called “ String table compression algorithm ”, from J.Ziv and A.Lempel stay 1978 First introduction in , And by the Terry A.Welch stay 1984 Improved in , Finally, the coding method is named after three people .
This coding method belongs to dictionary compression coding method . Dictionary coding is a general coding method , It is applicable to the statistical characteristics of new sources that cannot be observed , Or although it can be observed, the statistical characteristics are not fixed .
LZW Encoding can be applied to general file compression ( Such as WinZip)、 Animation image compression ( Such as GIF、TIFF) Other fields .
Coding principle
LZW The core idea of coding is to create a “ Phrase dictionary (Dictionary of Phrases)”, This phrase can be any combination of characters . When encountering something that has appeared in the dictionary in the process of encoding data “ The phrase ” when , The encoder outputs the “ Reference no. ”, Not the phrase itself , To achieve compression .
The coding process is as follows :
for example : Yes “abbababac” Conduct LZW code .
Initialization Dictionary :a = 97,b = 98,c = 99(ASCII code )
| step | P | C | PC In the dictionary ? | If not , Output P | Add a new entry | explain |
|---|---|---|---|---|---|---|
| 1 | NULL | a | initialization , Don't deal with | |||
| 2 | a | b | no | 97(a) | ab:256 | ab Not in the dictionary , expand ,P = C |
| 3 | b | b | no | 98(b) | bb:257 | bb Not in the dictionary , expand ,P = C |
| 4 | b | a | no | 98(b) | ba:258 | ba Not in the dictionary , expand ,P = C |
| 5 | a | b | yes | ab In the dictionary ,P = PC | ||
| 6 | ab | a | no | 256(ab) | aba:259 | aba Not in the dictionary , expand ,P = C |
| 7 | a | b | yes | ab In the dictionary ,P = PC | ||
| 8 | ab | a | yes | aba In the dictionary ,P = PC | ||
| 9 | aba | c | no | 259(aba) | abac:260 | abac Not in the dictionary , expand ,P = C |
| 10 | c | NULL | 99(c) | end , Output uncoded characters |
That is, the coding result is :
“97 98 98 256 259 99”
Decoding principle
First of all, we need to be clear about , The dictionary established in the coding process , In fact, it is not transmitted with the code stream . This mainly takes into account the following two points : First, the code table may take up a lot of space , The compression ratio can be maximized without transmitting the code table ; Second, if it is necessary to decode after receiving the dictionary at the encoder , Then it is impossible to realize the synchronous operation at both ends of encoding and decoding .
that , We need to establish a dictionary synchronously at the decoder according to the same rules as at the encoder (Trie Trees ), This is LZW The core idea of decoding .
The decoding process is shown in the figure below :
For example, the encoded code stream output above “97 98 98 256 259 99” decode :
| step | pW | cW | cW In the dictionary ? | Output | Add a new entry | explain |
|---|---|---|---|---|---|---|
| 1 | 97 | yes | a | In the dictionary , Output cW; newly added pW + cW First character | ||
| 2 | 97 | 98 | yes | b | ab:256 | In the dictionary , Output cW; newly added pW + cW First character |
| 3 | 98 | 98 | yes | b | bb:257 | In the dictionary , Output cW; newly added pW + cW First character |
| 4 | 98 | 256 | yes | ab | ba:258 | In the dictionary , Output cW; newly added pW + cW First character |
| 5 | 256 | 259 | no | aba | aba:259 | Not in the dictionary , Output pW + pW First character , And add to the dictionary |
| 6 | 259 | 99 | yes | c | abac:260 | In the dictionary , Output cW; newly added pW + cW First character |
Decode according to the above process “abbababac”, That is, the original data is successfully restored .
Code implementation
declarations.h
#pragma once
#include "BitIO.h"
#define DICT_CAPACITY 65535 // Capacity of the dictionary
/* Global variables */
struct {
int suffix;
int parent;
int firstChild;
int nextSibling;
} dictionary[DICT_CAPACITY + 1];
extern int nextNodeIdx; // Index of next node (i.e. next dictionary entry)
extern int decStack[DICT_CAPACITY]; // Stack for decoding a phrase
/* Functions */
void LzwEncoding(FILE* inFilePtr, BITFILE* outBitFilePtr);
void LzwDecoding(BITFILE* inBitFilePtr, FILE* outFilePtr);
BitIO.h
#pragma once
#include <iostream>
typedef struct {
FILE* fp;
unsigned char mask;
int rack;
}BITFILE;
BITFILE* OpenBitFileInput(char* fileName);
BITFILE* OpenBitFileOutput(char* fileName);
void CloseBitFileInput(BITFILE* bf);
void CloseBitFileOutput(BITFILE* bf);
int BitInput(BITFILE* bf);
unsigned long BitsInput(BITFILE* bf, int count);
void BitOutput(BITFILE* bf, int bit);
void BitsOutput(BITFILE* bf, unsigned long code, int count);
BitIO.cpp
/* Definitions for bitwise IO */
#include <iostream>
#include <stdlib.h>
#include "BitIO.h"
BITFILE* OpenBitFileInput(char* fileName) {
//BITFILE* bf = (BITFILE*)malloc(sizeof(BITFILE));
BITFILE* bf = new BITFILE;
if (bf == NULL) {
return NULL;
}
if (fileName == NULL) {
bf->fp = stdin;
} else {
fopen_s(&bf->fp, fileName, "rb");
}
if (bf->fp == NULL) {
return NULL;
}
bf->mask = 0x80;
bf->rack = 0;
return bf;
}
BITFILE* OpenBitFileOutput(char* fileName) {
//BITFILE* bf = (BITFILE*)malloc(sizeof(BITFILE));
BITFILE* bf = new BITFILE;
if (bf == NULL) {
return NULL;
}
if (fileName == NULL) {
bf->fp = stdout;
} else {
fopen_s(&bf->fp, fileName, "wb");
}
if (bf->fp == NULL) {
return NULL;
}
bf->mask = 0x80;
bf->rack = 0;
return bf;
}
void CloseBitFileInput(BITFILE* bf) {
fclose(bf->fp);
//free(bf);
delete bf;
}
void CloseBitFileOutput(BITFILE* bf) {
/* Output the remaining bits */
if (bf->mask != 0x80) {
fputc(bf->rack, bf->fp);
}
fclose(bf->fp);
//free(bf);
delete bf;
}
int BitInput(BITFILE* bf) {
int value;
if (bf->mask == 0x80) {
bf->rack = fgetc(bf->fp);
if (bf->rack == EOF) {
fprintf(stderr, "Reached the end of file.\n");
exit(-1);
}
}
value = bf->mask & bf->rack;
bf->mask >>= 1;
if (0 == bf->mask) {
bf->mask = 0x80;
}
return((value == 0) ? 0 : 1);
}
unsigned long BitsInput(BITFILE* bf, int count) {
unsigned long mask;
unsigned long value;
mask = 1L << (count - 1);
value = 0L;
while (mask != 0) {
if (BitInput(bf) == 1)
value |= mask;
mask >>= 1;
}
return value;
}
void BitOutput(BITFILE* bf, int bit) {
if (bit != 0) {
bf->rack |= bf->mask;
}
bf->mask >>= 1;
if (bf->mask == 0) {
// 8 bits in rack
fputc(bf->rack, bf->fp);
bf->rack = 0;
bf->mask = 0x80;
}
}
void BitsOutput(BITFILE* bf, unsigned long code, int count) {
unsigned long mask;
mask = 1L << (count - 1);
while (mask != 0) {
BitOutput(bf, (int)((code & mask) == 0 ? 0 : 1));
mask >>= 1;
}
}
LzwED.cpp
#include <iostream>
#include "declarations.h"
#include "BitIO.h"
using namespace std;
/* Global variables */
int nextNodeIdx;
int decStack[DICT_CAPACITY];
/* Macros */
#define Input(f) ((int)BitsInput(f, 16))
#define Output(f, x) BitsOutput(f, (unsigned long)(x), 16)
void InitialiseDict() {
// Dictionary initialisation (initialise root node 0-255)
for (int i = 0; i < 256; i++) {
dictionary[i].suffix = i; // The suffix of each node is the corresponding ASCII code
dictionary[i].parent = -1; // Temporarily doesn't have a parent node (i.e. prefix)
dictionary[i].firstChild = -1; // Temporarily doesn't have any child nodes
dictionary[i].nextSibling = i + 1; // The index of the next sibling root node is the next ASCII code
}
dictionary[255].nextSibling = -1; // No next sibling for the last root node
nextNodeIdx = 256; // The index of next dictionary entry
}
int InDict(int P, int C) {
if (P == -1) {
/* In this case, the current character is the start of the file, and it's evidently in the dictionary, thus return the corresponding ASCII code (let P = this character). */
return C;
}
/* Traverse all child node(s) of node P from left to right (i.e. all sibling nodes of the first child node) */
int sibling = dictionary[P].firstChild; // Start from the first child of P
while (sibling > -1) {
// sibling == -1 indicates the end of sibling traversal
/* If a C-suffixed sibling is found, then return the code of PC (i.e. the index of this sibling) */
if (C == dictionary[sibling].suffix) {
return sibling;
}
/* If the suffixes don't match, then look for the next */
sibling = dictionary[sibling].nextSibling;
}
/* The suffix of all siblings don't match PC, thus PC isn't in the dictionary */
return -1;
}
void NewDictEntry(int P, int C) {
if (P < 0) {
return;
}
dictionary[nextNodeIdx].suffix = C;
dictionary[nextNodeIdx].parent = P;
dictionary[nextNodeIdx].nextSibling = -1; // Indicates that the node is the last sibling
dictionary[nextNodeIdx].firstChild = -1; // Temporarily this node doesn't have a child
int pFirstChild = dictionary[P].firstChild; // The first child of P
int pChild;
/* Set up the new sibling-relation */
if (pFirstChild > -1) {
/* Parent of the new node originally have a child node */
pChild = pFirstChild; // Start from the first child of P
/* Look for the youngest child of P (i.e. the last sibling) */
while (dictionary[pChild].nextSibling > -1) {
pChild = dictionary[pChild].nextSibling;
}
dictionary[pChild].nextSibling = nextNodeIdx; // Set the new node as the next sibling of the current last sibling
} else {
/* Parent of the new node originally doesn't have a child */
dictionary[P].firstChild = nextNodeIdx; // Set the new node as PC (i.e. the first child of P)
}
nextNodeIdx++; //Index of the next entry + 1
}
void LzwEncoding(FILE* inFilePtr, BITFILE* outBitFilePtr) {
int previousStr; // P
int currentChar; // C
int PC; // P & C combined as 1 character
/* Compute the size of file */
fseek(inFilePtr, 0, SEEK_END);
int inFileSize = ftell(inFilePtr);
fseek(inFilePtr, 0, SEEK_SET);
BitsOutput(outBitFilePtr, inFileSize, 4 * 8);
/* Initialise the dictionary and P */
InitialiseDict();
previousStr = -1; // Initialise P
//fprintf(outFilePtr, "LZW-encoded string: ");
while ( (currentChar = fgetc(inFilePtr)) != EOF ) {
/* Check if PC is in the dictionary */
PC = InDict(previousStr, currentChar);
if (PC >= 0) {
/* PC is in the dictionary */
previousStr = PC; // Set P = PC
} else {
/* PC isn't in the dictionary */
Output(outBitFilePtr, previousStr); // Output P
if (nextNodeIdx < DICT_CAPACITY) {
/* Enough space to add PC into the dictionary */
NewDictEntry(previousStr, currentChar);
}
previousStr = currentChar; // Set P = C
}
}
Output(outBitFilePtr, previousStr); // Output the last unencoded character(s)
}
int DecodeString(int start, int code) {
int count = start;
while (code >= 0) {
/* Look for the root node */
decStack[count] = dictionary[code].suffix; // Store the original string in inverted order
code = dictionary[code].parent; // Set the parent of the current node as the next node
count++; // Points to the next node
}
return count; // The distance between the current node and the root
}
void LzwDecoding(BITFILE* inBitFilePtr, FILE* outFilePtr) {
int character;
int previousCode; // pW
int currentCode; // cW
int phraseLen; // Length of phrase
unsigned long inFileSize = BitsInput(inBitFilePtr, 4 * 8);
if (inFileSize == -1) {
inFileSize = 0;
}
/* Initialise dictionary and pW*/
InitialiseDict();
previousCode = -1;
while (inFileSize > 0) {
currentCode = Input(inBitFilePtr);
if (currentCode < nextNodeIdx) {
/* cW is in dictionary */
phraseLen = DecodeString(0, currentCode); // The length of cW
} else {
/* When cW ¡Ý next node index, which means cW > current node index, cW isn't in dictionary */
decStack[0] = character; // The last character in stack of the last loop, i.e. 1st character of pW
phraseLen = DecodeString(1, previousCode); // The length of pW + 1
}
character = decStack[phraseLen - 1]; // The last character in the stack, i.e. the 1st character of pW or cW
while (phraseLen > 0) {
phraseLen--;
fputc(decStack[phraseLen], outFilePtr); // Output the decoded string (in inverted order of decStack)
inFileSize--;
}
if (nextNodeIdx < DICT_CAPACITY) {
/* Add the new phrase into dictionary */
NewDictEntry(previousCode, character); // Add "pW + 1st character of cW" or "pW + 1st character of pW" into dictionary
}
previousCode = currentCode; // Set pW = cW
}
}
main.cpp
#include <iostream>
#include "declarations.h"
#include "BitIO.h"
using namespace std;
int main(int argc, char* argv[]) {
FILE* fp;
BITFILE* bf;
if (argc < 4) {
fprintf(stdout, "Usage: \n%s <options> <inFile> <outFile>\n", argv[0]);
fprintf(stdout, "\t<options>: E for LZW encoding or D for LZW decoding.\n");
fprintf(stdout, "\t<inFile>: name of input file.\n");
fprintf(stdout, "\t<outFile>: name of output file.\n");
return -1;
}
if ('E' == argv[1][0]) {
/* Do LZW encoding */
/* Open the files */
if (fopen_s(&fp, argv[2], "rb") != 0) {
cout << "Failed to open \"" << argv[2] << "\"." << endl;
exit(-1);
}
bf = OpenBitFileOutput(argv[3]);
if ((fp != NULL) && (bf != NULL)) {
LzwEncoding(fp, bf);
//LZWEncode(fp, bf);
fclose(fp);
CloseBitFileOutput(bf);
fprintf(stdout, "LZW encoding done.\n");
}
} else if ('D' == argv[1][0]) {
/* Do LZW decoding */
/* Open the files */
bf = OpenBitFileInput(argv[2]);
if (fopen_s(&fp, argv[3], "wb") != 0) {
cout << "Failed to open \"" << argv[2] << "\"." << endl;
exit(-1);
}
if ((fp != NULL) && (bf != NULL)) {
LzwDecoding(bf, fp);
//LZWDecode(bf, fp);
fclose(fp);
CloseBitFileInput(bf);
fprintf(stdout, "LZW decoding done.\n");
}
} else {
fprintf(stderr, "Unsupported operation.\n");
}
}
The original file is string.txt, The encoded file is written as a binary file string.dat, The decoded file is string_D.txt:
Open the encoded and decoded file :

It can be seen that the encoding and decoding process is correct .
边栏推荐
- Scala gets all files in the specified directory
- 微信小程序项目实战
- URL的结构解读
- Bottom compulsory of large factory: "communication realization between application program and AMS"
- Mysql无法访问,navicat提示:is not allowed to connect to this MySQL server
- Ftxui basic notes (Hello World)
- Qt+VTK+PCL图片转灰度图且以灰度为Y轴显示
- 主题域模型
- ProSci LAG3抗体:改善体外研究,助力癌症免疫治疗
- 图的存储结构及方法(一)
猜你喜欢

Kubernetes deployment strategy

VScode配置用户代码片段

PNY file to picture

图的存储结构及方法(一)

Application of workflow engine in vivo marketing automation

程序员最想干的三件事 |漫画

實驗二 YUV

工作流引擎在vivo营销自动化中的应用实践

Bottom compulsory of large factory: "communication realization between application program and AMS"

Alibaba Cloud Security Center's best practices for vulnerability repair
随机推荐
1.10 API 和字符串
Building a sylixos environment in VMWare
使用路由协议配置 IPv6 over IP手动隧道
Customize flick es source
[original dry goods] found a useful tool for data analysis
Worthington纯化酶制剂助力新生儿心肌细胞分离系统研究
无代码生产新模式探索
ROS based navigation framework
Rust——关于Option详解
How to open the file in keil is the real path in the 109th blog of fledgling Xiao Li
Codeforces Round #809 (Div. 2)(C和D1两题)
golang--module
6-14漏洞利用-rpcbind漏洞利用
etcdv3·watch操作实现及相关重点说明
The new idea 2022.2 was officially released, and the new features are really fragrant
使用Hystrix实现容错处理
亚马逊旗下Zoox通过安全测试 并在加州申请试驾
1.11 ArrayList&学生管理系统
RN底层原理 -- 1. Component和PureComponent解析
工作流引擎在vivo营销自动化中的应用实践