当前位置:网站首页>Line up to pick up the express. At this meeting, I sorted out all kinds of code sets

Line up to pick up the express. At this meeting, I sorted out all kinds of code sets

2022-06-11 17:44:00 Poplar branch

The foreword is full of thoughts

Mention character set , Most of the kids may think of the time when they were learning programming , Or someone else's code , Or engage in web development , Browser one run , Ah, this , Why is it all garbled ...

 The little shark stopped thinking

Never decide , Ask Du Niang

 Baidu

After a query, we know that there is no unified character set , Quick correction , Okay , Problem solving , Continue to learn ~.
But I don't know if there are any friends like me , I know how to change it when I see the garbled code UTF-8, Simple characters a I know its ASCII Code is 97, See capitalized A I know its ASCII Code is 65. But you let me say UTF-8 What is it? , The difference between them , I really can't say anything

 Meet with difficulties

Okay , Stop gossiping , We are serious people , Solve today's problems , How to distinguish them systematically .

The distinction between bytes and characters

One 、 byte

byte ?! I am familiar with this , Bytes jump

 Bytes to beat

Dabao , I am also thinking about the big factory

 sad


Return to the right topic
In the computer , Bytes are used to A unit that measures the storage capacity of a computer . English is Byte. This is the common storage unit MB、GB The last one is capitalized "B". The minimum unit in which a computer summarizes and stores information is bit (bit), Generally speaking , One of the computers '0' Or a '1' Just be one . The relationship between them is this kind of ,
Octet Count as One byte : 1Byte = 8 bit

Two 、 character

Characters are words and symbols used in computers , such as “1、2、3、A、B、C、~!·#¥%…*()+” And so on are called characters .

ASCII code

ASCII Code should be the most silly and sweet of the four codes , It should also be the most contact with the students of the science class , Freshman C Language homework should have word operator a character a word operator a Turn into word operator A character A word operator A Your figure .

 silly

ASCII The full name is American Standard Code for Information Interchange, The Chinese translation comes from American Standard Code for information exchange


ASCII In code , An English letter ( Case insensitive ) Take up a byte of space , One Chinese character takes up two bytes of space .
ASCII The number of characters that the code can represent is 128 Characters .

about ASCII In terms of code , What impressed me most was ASCII Code table .
A few more important things to remember :0 Of ASCII Code is 48;A Of ASCII Code is 65;a Of ASCII Code is 97. Other 1、b、B Yes, gradually 1 Just push it . then ASCII There is nothing more important about the code ( The dog's head lives ).

 surface

I put my watch here , Don't go to Baidu , Originally, I was taking advantage of the time to watch the meeting CSDN, There is no need to switch to Baidu .

 Switch

ANSI code

ANSI The code is right ASCII An extension of the code . because ASCII Code means 128 Characters are not enough to meet our needs .
ANSI For coding 0x00~0x7f ( Decimal 0 To 127) Scope 1 In bytes 1 English characters , More than one byte 0x80~0xFFFF Range to represent other characters in other languages . in other words ,ANSI Code first only 128(0-127) One and ASCII Same code , The following characters are all characters of a national language .
ANSI Coding actually includes a lot of coding : China made GB2312 code , It's used to encode Chinese. In addition , Japan compiles Japanese into Shift_JIS in , South Korea compiles Korean into Euc-kr in , Every country has its own standards . Subject to the conditions at that time , Between different languages ANSI Codes cannot be converted to each other , This will lead to garbled text in multilingual mixed text .

Unicode code

In order to solve different countries ANSI Coding conflicts ,Unicode Coding is born of this —— If every symbol in the world is given a unique code , Then the confusion will disappear .

 Wordy GA

Unicode Standards are evolving , But the most common one is Use two bytes to represent a character ( If you want to use very remote characters , Need 4 Bytes ). Modern operating systems and most programming languages directly support Unicode.
But the problem is , Originally, it was only required to store English letters in one byte Unicode There must be two bytes in it ( The rule is that the original English letter corresponds to ASCII Fill in the front of the yard 0), This produces waste . So is there one that can eliminate garbled code , And avoid wasteful coding methods , here , Our lovely UTF-8 It's coming out. .

UTF-8 code

UTF-8 It's a variable length encoding , It can be used 1~4 Bytes represent a symbol , The length of the bytes varies according to the symbol
When the character ASCII The range of yards , It's just one byte , Retain the ASCII The encoding of a byte of a character as part of it , In this way UTF-8 Coding can also be regarded as a pair of ASCII Code expansion .
What's more interesting is that :
unicode One Chinese character in the code accounts for 2 Bytes , and UTF-8 One Chinese character 3 Bytes . from unicode To uft-8 It's not a direct correspondence , It's about algorithms and rules

A small summary

In computer memory , Unified use Unicode code , When you need to save to a hard disk or need to transfer , Just switch to UTF-8 code .

When editing with Notepad , Read from file UTF-8 The characters are converted to Unicode Characters in memory , After editing , Save it with Unicode Convert to UTF-8 Save to file . This is a clever way , That is, the characters are unified , It also solves the problem of garbled code , It also saves space

 too strong

原网站

版权声明
本文为[Poplar branch]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/03/202203011903167611.html