当前位置:网站首页>UTF encoding and character set in golang
UTF encoding and character set in golang
2022-07-04 21:04:00 【Nanyidao street】
One 、UTF Coding and Golang Character set
1. Character set
A bit is either 1 Or 0, In any case, you can't get a letter A, We can take these for example A-Z A mapping relationship between the characters of and numbers , such as 0100 0001 representative A, Then we create a character set , Collect these mapping relationships , Get a character number comparison table , Just Called character set


2.ASCII Character set
ASCII Only 128 character , The extended character set has 256 individual

3.GB2312 Character set
ASCII Chinese characters are not supported , And then there is GB2312 Character set

4.Unicode Character set
There are many characters not included in the above character set , We want to make a general character set ,Unicode This is what the association does
5. Fixed length coding , Variable length coding
5.1 Fixed length coding
If you want to express "eggo The world ", We use it directly Unicode Character sets get their numbers , But how to divide the order after getting the number is another problem , For example, it is randomly divided into " lean to one side "


terms of settlement : No matter how long these characters are , Unify according to the longest boundary , The number of digits is not high enough to fill 0, The character boundary problem is solved ,
The new problem : Waste of memory , And the more symbols in the character set , The larger the coding span ,“ Fixed length coding wastes significantly ”, We have to find a way to solve the problem of memory consumption

5.2 Variable length coding
Fixed length coding is not OK , We use variable length coding , Small numbers use fewer bytes , Large number multi-purpose bytes
The solution is as follows :
[0,127] One byte , The highest flag bit is 0
[128,2047] Two bytes , Highest flag bit 110, There are also fixed flags 10
[2048,65535], Highest flag bit 1110, There are two fixed flag bits 10
01100101, The highest byte is 0, Remove the flag bit , The other corresponding is e
11100100 10111000 1001011 use 1110 start , Remove the three flag bits , The remaining parts are combined , You can get the world " the "

6.UT8 Detailed explanation
UTF-8 It is variable length encoding , It can be used 1~4 Byte representation ,
The coding rules are as follows :
1. For one byte , The first is 0, be left over 7 For use Unicode Coding means
2. about n Bytes (n>1), The first byte of front n Position as 1, Bytes left The first two are 10
边栏推荐
猜你喜欢

【1200. 最小绝对差】

接口设计时的一些建议

MySQL --- 数据库查询 - 聚合函数的使用、聚合查询、分组查询

What if the brightness of win11 is locked? Solution to win11 brightness locking

Summary of the mistakes in the use of qpainter in QT gobang man-machine game

Gobang go to work fishing tools can be LAN / man-machine

Explication détaillée du mécanisme de distribution des événements d'entrée multimodes

c语言函数形参自增自减情况分析

PS竖排英文和数字文字怎么改变方向(变竖直显示)

网络命名空间
随机推荐
Go language notes (2) some simple applications of go
How does the computer save web pages to the desktop for use
Advantages of semantic tags and block level inline elements
【解决方案】PaddlePaddle 2.x调用静态图模式
Sword finger offer II 80-100 (continuous update)
宝塔 7.9.2 宝塔控制面板绕过 手机绑定认证 绕过官方认证
仿ps样式js网页涂鸦板插件
Android原生数据库的基本使用和升级
Idea restore default shortcut key
jekins初始化密码没有或找不到
软件开发过中的采购
E-week finance | Q1 the number of active people in the insurance industry was 86.8867 million, and the licenses of 19 Payment institutions were cancelled
字节测试工程师十年经验直击UI 自动化测试痛点
Vue cleans up the keepalive cache scheme in a timely manner
Idea plug-in
科普达人丨一文看懂阿里云的秘密武器“神龙架构”
go defer的使用说明
黄金k线图中的三角形有几种?
PS竖排英文和数字文字怎么改变方向(变竖直显示)
Explication détaillée du mécanisme de distribution des événements d'entrée multimodes