当前位置:网站首页>UTF encoding and character set in golang

UTF encoding and character set in golang

2022-07-04 21:04:00 Nanyidao street

One 、UTF Coding and Golang Character set

1. Character set

A bit is either 1 Or 0, In any case, you can't get a letter A, We can take these for example A-Z A mapping relationship between the characters of and numbers , such as 0100 0001 representative A, Then we create a character set , Collect these mapping relationships , Get a character number comparison table , Just Called character set

img

image-20220703204046836

2.ASCII Character set

ASCII Only 128 character , The extended character set has 256 individual

image-20220703204258217

3.GB2312 Character set

ASCII Chinese characters are not supported , And then there is GB2312 Character set

image-20220703204337531

4.Unicode Character set

There are many characters not included in the above character set , We want to make a general character set ,Unicode This is what the association does

5. Fixed length coding , Variable length coding

5.1 Fixed length coding

If you want to express "eggo The world ", We use it directly Unicode Character sets get their numbers , But how to divide the order after getting the number is another problem , For example, it is randomly divided into " lean to one side "

image-20220703204848318

image-20220703205007728

terms of settlement : No matter how long these characters are , Unify according to the longest boundary , The number of digits is not high enough to fill 0, The character boundary problem is solved ,

The new problem : Waste of memory , And the more symbols in the character set , The larger the coding span ,“ Fixed length coding wastes significantly ”, We have to find a way to solve the problem of memory consumption

image-20220703205416801

5.2 Variable length coding

Fixed length coding is not OK , We use variable length coding , Small numbers use fewer bytes , Large number multi-purpose bytes

The solution is as follows :

[0,127] One byte , The highest flag bit is 0

[128,2047] Two bytes , Highest flag bit 110, There are also fixed flags 10

[2048,65535], Highest flag bit 1110, There are two fixed flag bits 10

01100101, The highest byte is 0, Remove the flag bit , The other corresponding is e

11100100 10111000 1001011 use 1110 start , Remove the three flag bits , The remaining parts are combined , You can get the world " the "

image-20220703211048530

6.UT8 Detailed explanation

UTF-8 It is variable length encoding , It can be used 1~4 Byte representation ,

The coding rules are as follows :

1. For one byte , The first is 0, be left over 7 For use Unicode Coding means

2. about n Bytes (n>1), The first byte of front n Position as 1, Bytes left The first two are 10

原网站

版权声明
本文为[Nanyidao street]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207041946308430.html