当前位置:网站首页>What is the encoding that starts with ?
What is the encoding that starts with ?
2022-07-31 09:32:00 【Flying fat meow】
What code does start with?
When using cheerio at the Node layer to parse the web page, the output Chinese content is all with A bunch of things like garbled characters at the beginning of x, tried all kinds of encodings to no avail, and the magic is that after saving this bunch of "garbled characters" as a web page, it can be displayed normally when opened through a browser.What exactly is this??
The reduced sample code is as follows:
const cheerio = require('cheerio');const $ = cheerio.load('Hello')console.log($('#content').html()) //你好Actually, the above pile of garbled things, its scientific name is entity code.
The following quote knowledgeAlmost the answer found.
In HTML, certain characters are reserved, such as the less-than sign "<", the greater-than sign ">", etc., and browsers will treat them as tags.If we want to display these reserved characters in HTML, we need to use character entities.We are more familiar with character entities such as space " ", less than sign "<", greater than sign ">" and so on.This format is more semantic and easy to remember, but in fact, there are other formats for character entities:
&name;dddd;hhhh;
- These three escaping methods are all called character references. The first is character entity reference. The "&" symbol is followed by a predefined entity name.
- The latter two are numeric character references, and the number value is the Unicode code point of the target character; the one starting with "" is followed by a decimal number, and the one starting with "" is followed by a hexadecimal number.
Starting with HTML4, the numeric character reference is in Unicode, regardless of the document encoding.The two characters "Hello" are the Unicode characters U+4F60 and U+597D, respectively, and the code point values "4F60" and "597D" in hexadecimal, which are also "20320" and "22909" in decimal.So
Type in HTML
你好你好
will appear as "Hello".
After knowing the reason, how to solve the above problem?
Method 1:Use attributes provided by cheerio
cheerio will decode the entity by default, we just need to turn off this function
const cheerio = require('cheerio');const $ = cheerio.load('Hello', { decodeEntities: false })console.log($('#content').html()) // helloMethod 2:Decode manually

function decode(str) {// Generally, it can be converted to standard unicode format first (add if necessary: when the returned data presents too many \\\u and so on)str = unescape(str.replace(/\\u/g, "%u"));// Then escape the entity character// If there is x, it means it is hexadecimal, $1 is to match whether there is an x, $2 is the content captured by the second bracket that matches, and convert $2 to the corresponding hexadecimal representationstr = str.replace(/(x)?(\w+);/g, function($, $1, $2) {return String.fromCharCode(parseInt($2, $1? 16: 10));});return str;}
Reprint address: &What code starts with #x? - Cannon~ - Blog Park
边栏推荐
猜你喜欢
随机推荐
Scala basics [seq, set, map, tuple, WordCount, queue, parallel]
【Excel】生成随机数字/字符
postgresql 生成随机日期,随机时间
比较并交换 (CAS) 原理
【TCP/IP】Network Model
js实现2020年元旦倒计时公告牌
如何将亚马逊广告添加到您的 WordPress 网站(3 种方法)
A Spark SQL online problem troubleshooting and positioning
来n遍剑指--06. 从尾到头打印链表
编译器R8问题Multidex
centos7安装mysql5.7
loadrunner脚本--添加事务
js right dot single page scrolling introduction page
优信年营收16亿:亏损3亿 已与蔚来资本及58集团签署股权协议
【TCP/IP】网络模型
Come n times - 06. Print the linked list from end to end
Gradle系列——Groovy概述,基础使用(基于Groovy文档4.0.4)day2-1
Kotlin 优点
js右侧圆点单页滚动介绍页面
VMware下安装win10









