当前位置:网站首页>What is the matter that programmers often say "the left hand is knuckled and the right hand is hot"?
What is the matter that programmers often say "the left hand is knuckled and the right hand is hot"?
2022-08-03 01:32:00 【InfoQ】
Actually Kunjinpao/Tengtangtang/Tuntuntun/Nuo Nuo Nuo are all problems between character set conversions, and some special-purpose bytes are represented as these special characters in another character set encoding.
Next, let's take a look at their origins one by one.
Kunjinkao
originated from the conversion between GBK character set and Unicode character set.As we all know, Unicode is one of the most common encoding methods. When it is converted with other old encoding systems, there must be some words that Unicode cannot represent (because Unicode is not fully compatible with all encoding systems when it is designed, and there is no necessary), so Unicode officially uses a placeholder to represent these unrepresentable characters, namely: U+FFFD (REPLACEMENT CHARATER), so we examine the UTF-8 encoding representation of U+FFFD, which is "0xef0xbf0xbd", if Repeat several times, such as "0xef0xbf0xbd0xef0xbf0xbd", and then decode it in the GBK/CP936/GB2312/GB18030 character set, a Chinese character with two bytes, the final result is Kunjin copy - Kun (0xEFBF), catties (0xBDEF), copy (0xBFBD).
#!/usr/bin/env python
# -*- coding=utf8 -*-
"""
# File Name : kunjinkao.py
# Author: SangYu
# Email: [email protected]
# Created Time: Tuesday, August 02, 2022 21:53:16
# Description:
"""
if __name__ == "__main__":
for i in u'\uFFFD'.encode('utf-8'):
print("%x" % ord(i)),
print ""
print((u'\uFFFD'.encode('utf-8')*2).decode('gbk'))
for i in u'Kunjinkao'.encode('gbk'):
print("%x" % ord(i)),
pass
Execution result:
TangTangTangTunTunTun
In the Windows platform, in order to debug the memory problem, the VC compiler of MicroSoft will use 0xcc for the uninitialized stack memory when the user turns on the Debug mode.Filling, all uninitialized heap memory is filled with 0xcd. For the same reason, we look at this worth GBK code as follows:
[hex(ord(i)) for i in u''.encode('gbk')]
[hex(ord(i)) for i in u'tuntuntun'.encode('gbk')]
Test result:div>
锘锘锘
UTF encoding scheme has a standard mark for identifying encoding, such as FF FE in UTF-16, when weWhen writing a web page, use Notepad to edit. At this time, GBK encoding is used, then a Unicode reserved character U+FEFF will be added after conversion. Similarly, if this character header is represented as UTF-8 encoding, it is "0xEF0xBB0xBF", when converted to GBK encoding, it will be the following characters:
- 锘EFBB
- Kuan BFEF
- 豢BBBF
[hex(ord(i)) for i in u'锘kuangzhu'.encode('gbk')]
Test result:
边栏推荐
- 程序员如何优雅地解决线上问题?
- Task 4 Machine Learning Library Scikit-learn
- Directing a non-relational database introduction and deployment
- 厌倦了安装数据库?改用 Docker
- mysql 错误:The driver has not received any packets from the server.
- 数据库主键一定要自增吗?有哪些场景不建议自增?
- Week 7 - Distributional Representations
- vscode 自定义快捷键——设置eslint
- 停止使用 Storyboards 和 Interface Builder
- 分库分表索引设计:二级索引、全局索引的最佳设计实践
猜你喜欢
随机推荐
FastCorrect:语音识别快速纠错模型丨RTC Dev Meetup
令人心动的AI综述(1)
TCP三次握手与四次挥手
WAF WebShell Trojan free to kill
合并两个excel表格工具
# DWD层及DIM层构建## ,220801 ,
VS保存后Unity不刷新
聚乙二醇衍生物4-Arm PEG-DSPE,四臂-聚乙二醇-磷脂
Unity WallFxPack使用
threejs 动态调整相机位置,使相机正好能看到对象
程序员的七夕浪漫时刻
严格反馈非线性系统基于事件触发的自抗扰预设有限时间跟踪控制
如何通过开源数据库管理工具 DBeaver 连接 TDengine
目前为止 DAO靠什么盈利?
典型相关分析CCA计算过程
VMware workstation program starts slowly
1 - vector R language self-study
最近公共祖先(LCA)学习笔记 | P3379 【模板】最近公共祖先(LCA)题解
airflow db init 报错
MySQL 与InnoDB 下的锁做朋友 (四)行锁/记录锁