当前位置:网站首页>What is the matter that programmers often say "the left hand is knuckled and the right hand is hot"?
What is the matter that programmers often say "the left hand is knuckled and the right hand is hot"?
2022-08-03 01:32:00 【InfoQ】
Actually Kunjinpao/Tengtangtang/Tuntuntun/Nuo Nuo Nuo are all problems between character set conversions, and some special-purpose bytes are represented as these special characters in another character set encoding.
Next, let's take a look at their origins one by one.
Kunjinkao
originated from the conversion between GBK character set and Unicode character set.As we all know, Unicode is one of the most common encoding methods. When it is converted with other old encoding systems, there must be some words that Unicode cannot represent (because Unicode is not fully compatible with all encoding systems when it is designed, and there is no necessary), so Unicode officially uses a placeholder to represent these unrepresentable characters, namely: U+FFFD (REPLACEMENT CHARATER), so we examine the UTF-8 encoding representation of U+FFFD, which is "0xef0xbf0xbd", if Repeat several times, such as "0xef0xbf0xbd0xef0xbf0xbd", and then decode it in the GBK/CP936/GB2312/GB18030 character set, a Chinese character with two bytes, the final result is Kunjin copy - Kun (0xEFBF), catties (0xBDEF), copy (0xBFBD).
#!/usr/bin/env python
# -*- coding=utf8 -*-
"""
# File Name : kunjinkao.py
# Author: SangYu
# Email: [email protected]
# Created Time: Tuesday, August 02, 2022 21:53:16
# Description:
"""
if __name__ == "__main__":
for i in u'\uFFFD'.encode('utf-8'):
print("%x" % ord(i)),
print ""
print((u'\uFFFD'.encode('utf-8')*2).decode('gbk'))
for i in u'Kunjinkao'.encode('gbk'):
print("%x" % ord(i)),
passExecution result:

TangTangTangTunTunTun
In the Windows platform, in order to debug the memory problem, the VC compiler of MicroSoft will use 0xcc for the uninitialized stack memory when the user turns on the Debug mode.Filling, all uninitialized heap memory is filled with 0xcd. For the same reason, we look at this worth GBK code as follows:
[hex(ord(i)) for i in u''.encode('gbk')]
[hex(ord(i)) for i in u'tuntuntun'.encode('gbk')]Test result:div>


锘锘锘
UTF encoding scheme has a standard mark for identifying encoding, such as FF FE in UTF-16, when weWhen writing a web page, use Notepad to edit. At this time, GBK encoding is used, then a Unicode reserved character U+FEFF will be added after conversion. Similarly, if this character header is represented as UTF-8 encoding, it is "0xEF0xBB0xBF", when converted to GBK encoding, it will be the following characters:
- 锘EFBB
- Kuan BFEF
- 豢BBBF
[hex(ord(i)) for i in u'锘kuangzhu'.encode('gbk')]Test result:

边栏推荐
猜你喜欢
随机推荐
Numpy数组中d[True]=1的含义
执子手,到永恒
数据库主键一定要自增吗?有哪些场景不建议自增?
2022暑假牛客多校1 (A/G/D/I)
如何通过 IDEA 数据库管理工具连接 TDengine?
【UE5 骨骼动画】全形体IK导致Two Bone IK只能斜着移动,不能平移
Cholesterol-PEG-Amine,CLS-PEG-NH2,胆固醇-聚乙二醇-氨基脂两亲性脂质衍生物
centos7安装mysql5.7
典型相关分析CCA计算过程
ZCMU--5230: 排练方阵(C语言)
Token、Redis实现单点登录
IDEA 重复代码的黄色波浪线取消设置
00 -- jieba分词
你离「TDengine 开发者大会」只差一条 SQL 语句!
PHP实现登录失败三次需要输入验证码需求
工业元宇宙的价值和发展
Shunted Self-Attention via Multi-Scale Token Aggregation
买母婴产品先来京东“券民空间站”抢券!大牌好物低至5折
Apache Doris 1.1 特性揭秘:Flink 实时写入如何兼顾高吞吐和低延时
today‘s task









