当前位置:网站首页>Line up to pick up the express. At this meeting, I sorted out all kinds of code sets
Line up to pick up the express. At this meeting, I sorted out all kinds of code sets
2022-06-11 17:44:00 【Poplar branch】
The four codes of encirclement and suppression ——ANSI、ASCII、Unicode and UTF-8
The foreword is full of thoughts
| Mention character set , Most of the kids may think of the time when they were learning programming , Or someone else's code , Or engage in web development , Browser one run , Ah, this , Why is it all garbled ... |

| Never decide , Ask Du Niang |

| After a query, we know that there is no unified character set , Quick correction , Okay , Problem solving , Continue to learn ~. |
| But I don't know if there are any friends like me , I know how to change it when I see the garbled code UTF-8, Simple characters a I know its ASCII Code is 97, See capitalized A I know its ASCII Code is 65. But you let me say UTF-8 What is it? , The difference between them , I really can't say anything |

| Okay , Stop gossiping , We are serious people , Solve today's problems , How to distinguish them systematically . |
The distinction between bytes and characters
One 、 byte
| byte ?! I am familiar with this , Bytes jump |

| Dabao , I am also thinking about the big factory |

| Return to the right topic |
| In the computer , Bytes are used to A unit that measures the storage capacity of a computer . English is Byte. This is the common storage unit MB、GB The last one is capitalized "B". The minimum unit in which a computer summarizes and stores information is bit (bit), Generally speaking , One of the computers '0' Or a '1' Just be one . The relationship between them is this kind of , Octet Count as One byte : 1Byte = 8 bit |
Two 、 character
| Characters are words and symbols used in computers , such as “1、2、3、A、B、C、~!·#¥%…*()+” And so on are called characters . |
ASCII code
ASCII Code should be the most silly and sweet of the four codes , It should also be the most contact with the students of the science class , Freshman C Language homework should have word operator a character a word operator a Turn into word operator A character A word operator A Your figure .

ASCII The full name is American Standard Code for Information Interchange, The Chinese translation comes from American Standard Code for information exchange
| ASCII In code , An English letter ( Case insensitive ) Take up a byte of space , One Chinese character takes up two bytes of space . |
| ASCII The number of characters that the code can represent is 128 Characters . |
| about ASCII In terms of code , What impressed me most was ASCII Code table . A few more important things to remember :0 Of ASCII Code is 48;A Of ASCII Code is 65;a Of ASCII Code is 97. Other 1、b、B Yes, gradually 1 Just push it . then ASCII There is nothing more important about the code ( The dog's head lives ). |

I put my watch here , Don't go to Baidu , Originally, I was taking advantage of the time to watch the meeting CSDN, There is no need to switch to Baidu .

ANSI code
| ANSI The code is right ASCII An extension of the code . because ASCII Code means 128 Characters are not enough to meet our needs . |
| ANSI For coding 0x00~0x7f ( Decimal 0 To 127) Scope 1 In bytes 1 English characters , More than one byte 0x80~0xFFFF Range to represent other characters in other languages . in other words ,ANSI Code first only 128(0-127) One and ASCII Same code , The following characters are all characters of a national language . |
| ANSI Coding actually includes a lot of coding : China made GB2312 code , It's used to encode Chinese. In addition , Japan compiles Japanese into Shift_JIS in , South Korea compiles Korean into Euc-kr in , Every country has its own standards . Subject to the conditions at that time , Between different languages ANSI Codes cannot be converted to each other , This will lead to garbled text in multilingual mixed text . |
Unicode code
| In order to solve different countries ANSI Coding conflicts ,Unicode Coding is born of this —— If every symbol in the world is given a unique code , Then the confusion will disappear . |

| Unicode Standards are evolving , But the most common one is Use two bytes to represent a character ( If you want to use very remote characters , Need 4 Bytes ). Modern operating systems and most programming languages directly support Unicode. |
| But the problem is , Originally, it was only required to store English letters in one byte Unicode There must be two bytes in it ( The rule is that the original English letter corresponds to ASCII Fill in the front of the yard 0), This produces waste . So is there one that can eliminate garbled code , And avoid wasteful coding methods , here , Our lovely UTF-8 It's coming out. . |
UTF-8 code
| UTF-8 It's a variable length encoding , It can be used 1~4 Bytes represent a symbol , The length of the bytes varies according to the symbol |
| When the character ASCII The range of yards , It's just one byte , Retain the ASCII The encoding of a byte of a character as part of it , In this way UTF-8 Coding can also be regarded as a pair of ASCII Code expansion . |
| What's more interesting is that : unicode One Chinese character in the code accounts for 2 Bytes , and UTF-8 One Chinese character 3 Bytes . from unicode To uft-8 It's not a direct correspondence , It's about algorithms and rules |
A small summary
| In computer memory , Unified use Unicode code , When you need to save to a hard disk or need to transfer , Just switch to UTF-8 code . When editing with Notepad , Read from file UTF-8 The characters are converted to Unicode Characters in memory , After editing , Save it with Unicode Convert to UTF-8 Save to file . This is a clever way , That is, the characters are unified , It also solves the problem of garbled code , It also saves space |

边栏推荐
- Bentley uses authing to quickly integrate application system and identity
- 【题解】Codeforces Round #798 (Div. 2)
- 6-3 读文章(*)
- 6-1 需要多少单词可以组成一句话?
- 6-3 batch sum (*)
- Vscode configures eslint to automatically format an error "auto fix is enabled by default. use the single string form“
- 送给大模型的「高考」卷:442人联名论文给大模型提出204个任务,谷歌领衔
- 6-8 创建、遍历链表
- R语言寻找数据集缺失值位置
- Authing biweekly news: authing forum launched (4.25-5.8)
猜你喜欢

ffmpeg CBR精准码流控制三个步骤

Use of forcescan in SQL server and precautions

Bentley uses authing to quickly integrate application system and identity

Custom or subscription? What is the future development trend of China's SaaS industry?

04_特征工程—特征选择

【先收藏,早晚用得到】100个Flink高频面试题系列(二)

Chorus translation

Chapter II relational database

Windows technology - how to view the instruction set, model, attribute and other details supported by the CPU, and how to use the CPU-Z tool to view the processor, memory, graphics card, motherboard,

送给大模型的「高考」卷:442人联名论文给大模型提出204个任务,谷歌领衔
随机推荐
Summary of clustering methods
【线上问题】Timeout waiting for connection from pool 问题排查
简单理解事件
R语言 mice包 Error in terms.formula(tmp, simplify = TRUE) : ExtractVars里的模型公式不对
CLP information -5 keywords to see the development trend of the financial industry in 2022
你还不懂线程池的设计及原理吗?掰开揉碎了教你设计线程池
Tidb lightning configuration data restore route
tidb-ddl的速度的调整
Set object mapping + scene 6-face mapping + create space in threejs
Connect the server with springboard / fortress through xshell
MFSR:一种新的推荐系统多级模糊相似度量
Automated testing selenium
删除链表的倒数第N个节点---2022/02/22
How to simplify a lot of if... Elif... Else code?
Leetcode force deduction question
括号生成---2022/02/25
Hash表、 继承
Read and understand the development plan for software and information technology service industry during the "14th five year plan"
信息安全数学基础 Chapter 3——有限域(二)
聚类方法汇总