当前位置:网站首页>What is the matter that programmers often say "the left hand is knuckled and the right hand is hot"?
What is the matter that programmers often say "the left hand is knuckled and the right hand is hot"?
2022-08-03 01:32:00 【InfoQ】
Actually Kunjinpao/Tengtangtang/Tuntuntun/Nuo Nuo Nuo are all problems between character set conversions, and some special-purpose bytes are represented as these special characters in another character set encoding.
Next, let's take a look at their origins one by one.
Kunjinkao
originated from the conversion between GBK character set and Unicode character set.As we all know, Unicode is one of the most common encoding methods. When it is converted with other old encoding systems, there must be some words that Unicode cannot represent (because Unicode is not fully compatible with all encoding systems when it is designed, and there is no necessary), so Unicode officially uses a placeholder to represent these unrepresentable characters, namely: U+FFFD (REPLACEMENT CHARATER), so we examine the UTF-8 encoding representation of U+FFFD, which is "0xef0xbf0xbd", if Repeat several times, such as "0xef0xbf0xbd0xef0xbf0xbd", and then decode it in the GBK/CP936/GB2312/GB18030 character set, a Chinese character with two bytes, the final result is Kunjin copy - Kun (0xEFBF), catties (0xBDEF), copy (0xBFBD).
#!/usr/bin/env python
# -*- coding=utf8 -*-
"""
# File Name : kunjinkao.py
# Author: SangYu
# Email: [email protected]
# Created Time: Tuesday, August 02, 2022 21:53:16
# Description:
"""
if __name__ == "__main__":
for i in u'\uFFFD'.encode('utf-8'):
print("%x" % ord(i)),
print ""
print((u'\uFFFD'.encode('utf-8')*2).decode('gbk'))
for i in u'Kunjinkao'.encode('gbk'):
print("%x" % ord(i)),
passExecution result:

TangTangTangTunTunTun
In the Windows platform, in order to debug the memory problem, the VC compiler of MicroSoft will use 0xcc for the uninitialized stack memory when the user turns on the Debug mode.Filling, all uninitialized heap memory is filled with 0xcd. For the same reason, we look at this worth GBK code as follows:
[hex(ord(i)) for i in u''.encode('gbk')]
[hex(ord(i)) for i in u'tuntuntun'.encode('gbk')]Test result:div>


锘锘锘
UTF encoding scheme has a standard mark for identifying encoding, such as FF FE in UTF-16, when weWhen writing a web page, use Notepad to edit. At this time, GBK encoding is used, then a Unicode reserved character U+FEFF will be added after conversion. Similarly, if this character header is represented as UTF-8 encoding, it is "0xEF0xBB0xBF", when converted to GBK encoding, it will be the following characters:
- 锘EFBB
- Kuan BFEF
- 豢BBBF
[hex(ord(i)) for i in u'锘kuangzhu'.encode('gbk')]Test result:

边栏推荐
猜你喜欢
随机推荐
Test | ali internship 90 days in life: from the perspective of interns, talk about personal growth
创建型模式 - 单例模式Singleton
语音合成模型小抄(1)
today‘s task
IDEA 重复代码的黄色波浪线取消设置
Matplotlib drawing core principles explain (more detailed)
MYSQL查看表结构
Find My技术|智能防丢还得看苹果Find My技术
Kubernetes 进阶训练营 网络
No code development platform data ID introductory tutorial
创建型模式 - 抽象工厂模式AbstractFactory
Mock工具之Moco使用教程
程序员如何优雅地解决线上问题?
执子手,到永恒
qt静态编译出现Project ERROR: Library ‘odbc‘ is not defined
2022中国眼博会,山东眼健康展,视力矫正仪器展,护眼产品展
1 - vector R language self-study
【斯坦福计网CS144项目】Lab5: NetworkInterface
图像识别从零写出dnf脚本关键要点
思源笔记 本地存储无使用第三方同步盘,突然打不开文件。









