当前位置:网站首页>What is the encoding that starts with ?
What is the encoding that starts with ?
2022-07-31 09:32:00 【Flying fat meow】
What code does start with?
When using cheerio at the Node layer to parse the web page, the output Chinese content is all with A bunch of things like garbled characters at the beginning of x, tried all kinds of encodings to no avail, and the magic is that after saving this bunch of "garbled characters" as a web page, it can be displayed normally when opened through a browser.What exactly is this??
The reduced sample code is as follows:
const cheerio = require('cheerio');const $ = cheerio.load('Hello')console.log($('#content').html()) //你好Actually, the above pile of garbled things, its scientific name is entity code.
The following quote knowledgeAlmost the answer found.
In HTML, certain characters are reserved, such as the less-than sign "<", the greater-than sign ">", etc., and browsers will treat them as tags.If we want to display these reserved characters in HTML, we need to use character entities.We are more familiar with character entities such as space " ", less than sign "<", greater than sign ">" and so on.This format is more semantic and easy to remember, but in fact, there are other formats for character entities:
&name;dddd;hhhh;
- These three escaping methods are all called character references. The first is character entity reference. The "&" symbol is followed by a predefined entity name.
- The latter two are numeric character references, and the number value is the Unicode code point of the target character; the one starting with "" is followed by a decimal number, and the one starting with "" is followed by a hexadecimal number.
Starting with HTML4, the numeric character reference is in Unicode, regardless of the document encoding.The two characters "Hello" are the Unicode characters U+4F60 and U+597D, respectively, and the code point values "4F60" and "597D" in hexadecimal, which are also "20320" and "22909" in decimal.So
Type in HTML
你好你好
will appear as "Hello".
After knowing the reason, how to solve the above problem?
Method 1:Use attributes provided by cheerio
cheerio will decode the entity by default, we just need to turn off this function
const cheerio = require('cheerio');const $ = cheerio.load('Hello', { decodeEntities: false })console.log($('#content').html()) // helloMethod 2:Decode manually

function decode(str) {// Generally, it can be converted to standard unicode format first (add if necessary: when the returned data presents too many \\\u and so on)str = unescape(str.replace(/\\u/g, "%u"));// Then escape the entity character// If there is x, it means it is hexadecimal, $1 is to match whether there is an x, $2 is the content captured by the second bracket that matches, and convert $2 to the corresponding hexadecimal representationstr = str.replace(/(x)?(\w+);/g, function($, $1, $2) {return String.fromCharCode(parseInt($2, $1? 16: 10));});return str;}
Reprint address: &What code starts with #x? - Cannon~ - Blog Park
边栏推荐
猜你喜欢
随机推荐
Aleo Testnet3规划大纲
Spark 在 Yarn 上运行 Spark 应用程序
loadrunner脚本--添加集合点
ReentrantLock
Pytorch学习记录(七):自定义模型 & Auto-Encoders
生成随机数
【节选】吴恩达给出的AI职业生涯规划
刷题《剑指Offer》day07
【机器学习】用特征量重要度(feature importance)解释模型靠谱么?怎么才能算出更靠谱的重要度?
浏览器使用占比js雷达图
Chapter Six
一次Spark SQL线上问题排查和定位
Rich text editor Tinymce
js部门预算和支出雷达图
安装gnome-screenshot截图工具
JSP pagecontext对象的简介说明
多个js雷达图同时显示
js radar chart statistical chart plugin
MySQL 排序
Come n times with the sword--05. Replace spaces








