当前位置:网站首页>ASCII, Unicode, and UTF-8

ASCII, Unicode, and UTF-8

2022-06-11 12:42:00 Xiaobai, a vegetable seller

This blog refers to :ASCII,Unicode and UTF-8
One 、ASCll code
We all know that a byte has 8 position , Convert to binary , Can be said 256 Number , That is from 00000000 To 11111111.ASCll code It's stipulated that 128 Character encoding , Such as the blank space space yes 32( Binary for 0010000), Capital letters A yes 65. this 128 The symbols also contain 32 A control symbol that can't be printed . It only takes up the space after one byte 7 position , The first one is uniformly defined as 0.
Two 、 Not ASCll code
We know what's ahead 128 Each code is certain , But other countries still have some other letters , I'm going to use theta 128 - 255 To encode , At this point, other coding forms are required .
3、 ... and 、Unicode
If each country uses a different coding form , At this point, there will be the problem of garbled code , Let's say we send an email , Use a coding method at the sender , However, another encoding method is used at the receiver , At this point, there will be the problem of garbled code .Unicode Coding can solve the problem of garbled code ,Unicode Encoding can encode 100w Characters , So it contains characters from all countries .
Four 、Unicode What happened
unicode There are some problems with coding , For example, Chinese characters yan Of unicode Encoded as 4E25, Converting to binary code is enough 15 Bits to save , Of course, for larger characters , You can use three or four bytes to save . How can we distinguish unicode and ascll? Because for a unicode We can think of it as two , Three or four ascll code . How do we distinguish now ? If we encode all the codes according to the maximum number of digits , For ascll The first few bytes of are 0, Waste of resources .
5、 ... and 、UTF-8
UTF-8 yes Unicode One of the implementation schemes of , Of course, it's not just this one , for example UTF-16( Characters are represented by two or four bytes ),UTF-32( Characters are represented by four bytes ). Compared with UTF-16 and UTF-32 Come on ,UTF-8 Is a side length encoding method , That is, the range of bytes that can be represented is : One byte to four bytes .
UTF-8 Coding standards :
1、 about ASCll Encoding uses a byte to represent , And the first is 0, for instance a Of UTF-8 The code of is 01100001.
2、 For non ASCll Encoding uses multiple bytes to represent , And first ( Count from left to right ) First few 1, Represents a few bytes , And the remaining bytes are represented by 10 start .
For example : yan Of Unicode yes 4E25(100111000100101),4E25 Three bytes are required to save , So the starting position is 1110xxxx 10xxxxxx 10xxxxxx, Then start from back to front , In turn 100111000100101 Fill in , So the final yan Corresponding UTF-8 The value of is expressed as :11100100 10111000 10100101, Hex is :E4B8A5.

原网站

版权声明
本文为[Xiaobai, a vegetable seller]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203012126084191.html