当前位置:网站首页>What is the encoding that starts with ?
What is the encoding that starts with ?
2022-07-31 09:32:00 【Flying fat meow】
What code does start with?
When using cheerio at the Node layer to parse the web page, the output Chinese content is all with A bunch of things like garbled characters at the beginning of x, tried all kinds of encodings to no avail, and the magic is that after saving this bunch of "garbled characters" as a web page, it can be displayed normally when opened through a browser.What exactly is this??
The reduced sample code is as follows:
const cheerio = require('cheerio');const $ = cheerio.load('Hello')console.log($('#content').html()) //你好Actually, the above pile of garbled things, its scientific name is entity code.
The following quote knowledgeAlmost the answer found.
In HTML, certain characters are reserved, such as the less-than sign "<", the greater-than sign ">", etc., and browsers will treat them as tags.If we want to display these reserved characters in HTML, we need to use character entities.We are more familiar with character entities such as space " ", less than sign "<", greater than sign ">" and so on.This format is more semantic and easy to remember, but in fact, there are other formats for character entities:
&name;dddd;hhhh;
- These three escaping methods are all called character references. The first is character entity reference. The "&" symbol is followed by a predefined entity name.
- The latter two are numeric character references, and the number value is the Unicode code point of the target character; the one starting with "" is followed by a decimal number, and the one starting with "" is followed by a hexadecimal number.
Starting with HTML4, the numeric character reference is in Unicode, regardless of the document encoding.The two characters "Hello" are the Unicode characters U+4F60 and U+597D, respectively, and the code point values "4F60" and "597D" in hexadecimal, which are also "20320" and "22909" in decimal.So
Type in HTML
你好你好
will appear as "Hello".
After knowing the reason, how to solve the above problem?
Method 1:Use attributes provided by cheerio
cheerio will decode the entity by default, we just need to turn off this function
const cheerio = require('cheerio');const $ = cheerio.load('Hello', { decodeEntities: false })console.log($('#content').html()) // helloMethod 2:Decode manually

function decode(str) {// Generally, it can be converted to standard unicode format first (add if necessary: when the returned data presents too many \\\u and so on)str = unescape(str.replace(/\\u/g, "%u"));// Then escape the entity character// If there is x, it means it is hexadecimal, $1 is to match whether there is an x, $2 is the content captured by the second bracket that matches, and convert $2 to the corresponding hexadecimal representationstr = str.replace(/(x)?(\w+);/g, function($, $1, $2) {return String.fromCharCode(parseInt($2, $1? 16: 10));});return str;}
Reprint address: &What code starts with #x? - Cannon~ - Blog Park
边栏推荐
猜你喜欢
随机推荐
MySQL----多表查询
使用turtle画按钮
MySQL (2)
手写promise
Chapter Six
Kotlin 优点
【微信小程序开发】生命周期与生命周期函数
MySQL 高级(进阶) SQL 语句 (一)
51单片机-----外部中断
matlab常用符号用法总结
A Spark SQL online problem troubleshooting and positioning
Scala基础【seq、set、map、元组、WordCount、队列、并行】
第八章 、接口
乐观锁和悲观锁
ReentrantLock
JSP application对象简介说明
js实现2020年元旦倒计时公告牌
自定义v-drag指令(横向拖拽滚动)
Kotlin—基本语法(二)
js implements the 2020 New Year's Day countdown bulletin board









