当前位置:网站首页>UTF encoding and character set in golang
UTF encoding and character set in golang
2022-07-04 21:04:00 【Nanyidao street】
One 、UTF Coding and Golang Character set
1. Character set
A bit is either 1 Or 0, In any case, you can't get a letter A, We can take these for example A-Z A mapping relationship between the characters of and numbers , such as 0100 0001 representative A, Then we create a character set , Collect these mapping relationships , Get a character number comparison table , Just Called character set
2.ASCII Character set
ASCII Only 128 character , The extended character set has 256 individual
3.GB2312 Character set
ASCII Chinese characters are not supported , And then there is GB2312 Character set
4.Unicode Character set
There are many characters not included in the above character set , We want to make a general character set ,Unicode This is what the association does
5. Fixed length coding , Variable length coding
5.1 Fixed length coding
If you want to express "eggo The world ", We use it directly Unicode Character sets get their numbers , But how to divide the order after getting the number is another problem , For example, it is randomly divided into " lean to one side "
terms of settlement : No matter how long these characters are , Unify according to the longest boundary , The number of digits is not high enough to fill 0, The character boundary problem is solved ,
The new problem : Waste of memory , And the more symbols in the character set , The larger the coding span ,“ Fixed length coding wastes significantly ”, We have to find a way to solve the problem of memory consumption
5.2 Variable length coding
Fixed length coding is not OK , We use variable length coding , Small numbers use fewer bytes , Large number multi-purpose bytes
The solution is as follows :
[0,127] One byte , The highest flag bit is 0
[128,2047] Two bytes , Highest flag bit 110, There are also fixed flags 10
[2048,65535], Highest flag bit 1110, There are two fixed flag bits 10
01100101, The highest byte is 0, Remove the flag bit , The other corresponding is e
11100100 10111000 1001011 use 1110 start , Remove the three flag bits , The remaining parts are combined , You can get the world " the "
6.UT8 Detailed explanation
UTF-8 It is variable length encoding , It can be used 1~4 Byte representation ,
The coding rules are as follows :
1. For one byte , The first is 0, be left over 7 For use Unicode Coding means
2. about n Bytes (n>1), The first byte of front n Position as 1, Bytes left The first two are 10
边栏推荐
- Summary of the mistakes in the use of qpainter in QT gobang man-machine game
- Hwinfo hardware detection tool v7.26 green version
- How does the computer save web pages to the desktop for use
- 测试用例 (TC)
- 实操自动生成接口自动化测试用例
- 测试员的算法面试题-找众数
- 扩展你的KUBECTL功能
- 字节测试工程师十年经验直击UI 自动化测试痛点
- Four traversal methods of binary tree, as well as the creation of binary tree from middle order to post order, pre order to middle order, pre order to post order, and sequence [specially created for t
- What if the win11 shared file cannot be opened? The solution of win11 shared file cannot be opened
猜你喜欢
Common verification rules of form components -1 (continuously updating ~)
网络命名空间
c语言函数形参自增自减情况分析
Solution of 5g unstable 5g signal often dropped in NetWare r7000 Merlin system
js 3D爆炸碎片图片切换js特效
WinCC7.5 SP1如何通过交叉索引来寻找变量及其位置?
Leetcode+ 81 - 85 monotone stack topic
[1200. Différence absolue minimale]
搭建一个仪式感点满的网站,并内网穿透发布到公网 1/2
render函数与虚拟dom
随机推荐
Hands on deep learning (III) -- convolutional neural network CNN
uniapp 富文本编辑器使用
【微服务|SCG】Predicate的使用
js 3D爆炸碎片图片切换js特效
What if the brightness of win11 is locked? Solution to win11 brightness locking
word中插入图片后,图片上方有一空行,且删除后布局变乱
c语言函数形参自增自减情况分析
Sword finger offer II 80-100 (continuous update)
网件r7000梅林系统5g不稳定 5g信号经常掉线解决方法
浏览器渲染页面过程
Nmap scan
Idea case shortcut
vim异步问题
Fleet tutorial 08 introduction to AppBar toolbar Basics (tutorial includes source code)
卷积神经网络在深度学习中新发展的5篇论文推荐
【1200. 最小绝对差】
shp数据制作3DTiles白膜
网络命名空间
[1200. Différence absolue minimale]
Advantages of RFID warehouse management system solution