当前位置:网站首页>What is the difference between UTF-8, utf-16 and UTF-32 character encoding? [graphic explanation]
What is the difference between UTF-8, utf-16 and UTF-32 character encoding? [graphic explanation]
2022-07-28 15:11:00 【Zi Yan Ruoshui】
Link to the English version of this article :https://javarevisited.blogspot.com/2015/02/difference-between-utf-8-utf-16-and-utf.html#axzz7aIy1al00
UTF-8、UTF-16 and UTF-32 The main difference between character encodings is how many bytes it needs to represent characters in memory .UTF-8 Use at least 1 Bytes , and UTF-16 Use at least 2 Bytes . By the way , If the code point of a character is greater than 127, be byte The maximum value of is UTF-8 You may need to 2、3 or 4 Bytes , but UTF-16 It only takes two or four bytes . On the other hand ,UTF-32 It is a fixed width coding scheme , Always used 4 Bytes to encode a Unicode Code points . Now? , Let's start with what is character encoding , And why it matters ? Um. , Character encoding is an important concept in the process of converting byte stream into characters , Can be displayed .
There are two things for Convert bytes to characters Very important , namely Character set and code . Because there are so many characters and symbols in the world , So we need a character set to support all these characters . A character set is just a list of characters , Each symbol or character is mapped to a numeric value , Also known as code points .
On the other hand ,UTF-16、UTF-32 and UTF-8 Is the coding scheme , Their description These values ( Code points ) How to map to bytes ( Use different bit values as the basis ; for example UTF-16 by 16 position ,UTF-32 by 32 position ,UTF-8 by 8 position ).UTF representative Unicode transformation , It defines an algorithm to put each Unicode Code points map to a unique sequence of bytes .
for example , For characters A, Latin capital letters A,Unicode The code point is U+0041,UTF-8 The encoded bytes are 41,UTF-16 Encoded as 0041,Java The character text is '\u0041'. In short , You only need one Character encoding scheme To explain a byte stream , Without character encoding , You cannot display them correctly .Java Programming languages widely support different character sets and character encodings , By default, it uses UTF-8.
UTF-32、UTF-16 and UTF-8 The difference between coding
As I said before ,UTF-8、UTF-16 and UTF-32 Just storage Unicode Several ways of code points , That is, it is used in the computer memory 8、16 and 32 Bit U+ Magic number . once Unicode Character to byte , It can be easily saved in the disk , Transmit over the network and recreate at the other end .1. UTF-8 Use at least one byte when encoding characters , and UTF-16 Use at least two bytes .
stay UTF-8 in , from 0 To 127 Each code point of is stored in a single byte . Use only 2,3 Or actually at most 4 Bytes store code points 128 And above . In short ,UTF-8 yes Variable length coding , Occupy 1 To 4 Bytes , It depends on the code point .UTF-16 It is also a variable length character encoding , But you need 2 or 4 Bytes . On the other hand ,UTF-32 Is constant 4 Bytes .2. UTF-8 And ASCII compatible , and UTF-16 And ASCII Are not compatible
UTF-8 The advantage is that ASCII Is the most commonly used character , under these circumstances , Most characters only need one byte . Contains only ASCII Character UTF-8 File with the ASCII Files have the same encoding , It means UTF-8 The English text in Chinese looks like ASCII It's exactly the same in . In view of the past ASCII Take the lead , This is the initial acceptance Unicode and UTF-8 The main reason is .This is an example , It shows different character encoding schemes ( for example UTF-16、UTF-8 and UTF-32) Next , How different characters are mapped to bytes . You can see how different schemes use different numbers of bytes to represent the same characters .

边栏推荐
- Is the expansion operator a deep copy or a shallow copy
- The second 1024, come on!
- VTK notes - picker picker summary
- Partition and index of Oracle Database
- The modified network card name of rocky foundation is eth0
- Compose learning notes 1-compose, state, flow, remember
- Bcompare key expired or bcompare license key revoked
- Read the introduction tutorial of rainbow
- SystemVerilog
- Namespace conflict problem
猜你喜欢

Instructions for common symbols in kotlin

Introduction to mqtt protocol

Multi merchant mall system function disassembly lecture 17 - platform side order list

Mlx90640 infrared thermal imager sensor module development notes (VIII)

Deploy flask on Alibaba cloud server

Product Manager

VTK annotation class widget vtkborderwidget

21、 TF coordinate transformation (I): coordinate MSG message

charles如何安装并使用

Mysql使用left join连表查询时,因连接条件未加索引导致查询很慢
随机推荐
The first self introduction quotation
VTK annotation class widget vtkborderwidget
MITK create module
MLX90640 红外热成像仪传感器模块开发笔记(八)
MITK creates plug-ins and generates plug-ins
Robot mathematics foundation 3D space position representation space position
VTK vtkcontourwidget extracts regions of interest
Introduction to MITK
SQL learning
Enterprise wechat customer service link, enterprise wechat customer service chat
QT qlineedit, qtextedit, qplaintextedit differences
Three pop-up boxes commonly used in JS
The third pre class exercise
SQL labs detailed problem solving process (less1-less10)
Vtkcellpicker picking triangular patches
22、 TF coordinate transformation (II): static coordinate transformation
Instructions for common symbols in kotlin
即刻体验 | 借助 CTS-D 进一步提升应用设备兼容性
安全与隐私计算在国内发展现状
21、 TF coordinate transformation (I): coordinate MSG message