当前位置:网站首页>What is the encoding that starts with ?
What is the encoding that starts with ?
2022-07-31 09:32:00 【Flying fat meow】
What code does start with?
When using cheerio at the Node layer to parse the web page, the output Chinese content is all with A bunch of things like garbled characters at the beginning of x, tried all kinds of encodings to no avail, and the magic is that after saving this bunch of "garbled characters" as a web page, it can be displayed normally when opened through a browser.What exactly is this??
The reduced sample code is as follows:
const cheerio = require('cheerio');const $ = cheerio.load('Hello')console.log($('#content').html()) //你好Actually, the above pile of garbled things, its scientific name is entity code.
The following quote knowledgeAlmost the answer found.
In HTML, certain characters are reserved, such as the less-than sign "<", the greater-than sign ">", etc., and browsers will treat them as tags.If we want to display these reserved characters in HTML, we need to use character entities.We are more familiar with character entities such as space " ", less than sign "<", greater than sign ">" and so on.This format is more semantic and easy to remember, but in fact, there are other formats for character entities:
&name;dddd;hhhh;
- These three escaping methods are all called character references. The first is character entity reference. The "&" symbol is followed by a predefined entity name.
- The latter two are numeric character references, and the number value is the Unicode code point of the target character; the one starting with "" is followed by a decimal number, and the one starting with "" is followed by a hexadecimal number.
Starting with HTML4, the numeric character reference is in Unicode, regardless of the document encoding.The two characters "Hello" are the Unicode characters U+4F60 and U+597D, respectively, and the code point values "4F60" and "597D" in hexadecimal, which are also "20320" and "22909" in decimal.So
Type in HTML
你好你好
will appear as "Hello".
After knowing the reason, how to solve the above problem?
Method 1:Use attributes provided by cheerio
cheerio will decode the entity by default, we just need to turn off this function
const cheerio = require('cheerio');const $ = cheerio.load('Hello', { decodeEntities: false })console.log($('#content').html()) // helloMethod 2:Decode manually

function decode(str) {// Generally, it can be converted to standard unicode format first (add if necessary: when the returned data presents too many \\\u and so on)str = unescape(str.replace(/\\u/g, "%u"));// Then escape the entity character// If there is x, it means it is hexadecimal, $1 is to match whether there is an x, $2 is the content captured by the second bracket that matches, and convert $2 to the corresponding hexadecimal representationstr = str.replace(/(x)?(\w+);/g, function($, $1, $2) {return String.fromCharCode(parseInt($2, $1? 16: 10));});return str;}
Reprint address: &What code starts with #x? - Cannon~ - Blog Park
边栏推荐
猜你喜欢

状态机动态规划之股票问题总结

Spark 在 Yarn 上运行 Spark 应用程序
![[NLP] Interpretation of Transformer Theory](/img/5f/8e1b9e48310817a0443eb445479045.png)
[NLP] Interpretation of Transformer Theory
Hematemesis summarizes thirteen experiences to help you create more suitable MySQL indexes

Flink1.15源码阅读——PER_JOB vs APPLICATION执行流程

如何将亚马逊广告添加到您的 WordPress 网站(3 种方法)

学习笔记——七周成为数据分析师《第二周:业务》:业务分析框架

HTC官方RUU固件提取刷机包rom.zip以及RUU解密教程

浏览器使用占比js雷达图

js radar chart statistical chart plugin
随机推荐
文件的逻辑结构与物理结构的对比与区别
MySQL (2)
踩水坑2 数据超出long long
多版本node的安装与切换详细操作
js实现2020年元旦倒计时公告牌
MySQL 视图(详解)
Hematemesis summarizes thirteen experiences to help you create more suitable MySQL indexes
postgresql 范围查询比索引查询快吗?
@RequestBody和@RequestParam区别
【职场杂谈】售前工程师岗位的理解杂谈
Redis Sentinel原理
手写promise
富文本编辑器Tinymce
7. JS ES6新增语法 new Map详讲,还有一道代码实战案例帮你快上手new Map
优信年营收16亿:亏损3亿 已与蔚来资本及58集团签署股权协议
Echart饼图添加轮播效果
Flink1.15 source code reading flink-clients - flink command line help command
JSP pagecontext对象的简介说明
win10镜像下载
Aleo Testnet3规划大纲