当前位置:网站首页>爬虫 对 Get/Post 请求时遇到编码问题的解决方案
爬虫 对 Get/Post 请求时遇到编码问题的解决方案
2022-06-26 08:30:00 【冯大少】
在众多的HTML里,编码格式一般都是 UTF-8 和 GBK为主,(i.e charset = ‘’ utf-8 " or charset = “gbk” ) 。当通过Get/Post 请求时,会遇到在编码上的问题,其内容显示为 “unable to decode value”, 以下将介绍如何解决这问题。
以唯品会的 “Get” 请求为例,可见,title 后面的值为 " unable to decode value", (图1),当遇到 decode 可能会联想到是不是会因为 js 加密所致,但再通过 Get 请求的数据中可见,其实 unable to decode value 的内容就是 Get 请求数据的 title 的 value %u96C0%u5DE2NESTLE%u996E%u6599%u51B2%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A, (图2),没错,其实这是把 title 真实的内容转为的unicode。对于 Get 请求的方法,直接把这数据写成 Key_Value 对就可以了。
通过把 “%u96C0%u5DE2NESTLE%u996E%u6599%u51B2
%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A” decode 后,要把 % 换成 \ (图3),, 可得到 title 内容为 “雀巢NESTLE饮料冲调专场_唯品会” (图4)
但如果需要通过 Post 请求方法,就刚好相反,要把这标题的中文 encode 为 unicode作为 Post 数据请求。首先,要知道 HTML 的 编码为 utf-8 (图5), , 然后用 encode() 方法对 内容 编码, (图6)
通过这个 decode 和 encode 方法,相信以后遇到类似的编码问题,都能举一反三去解决。
对于反爬编码机制,也有最常见的 base64 和 md5,字体映射等,有空会再继续分享。
图1
图2

图3
value = '\u96C0\u5DE2NESTLE\u996E\u6599\u51B2\u8C03\u4E13\u573A_\u552F\u54C1\u4F1A'
print(value)
图4
图5

图6
省略部分 key_value 代码
"bv":"mozilla/5.0 (windows nt 6.1; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/94.0.4606.71 safari/537.36",
"ce":"1",
"vs":"",
"title": "雀巢NESTLE饮料冲调专场_唯品会".encode('utf-8'),
"tab_page_id":"1633709750465_a2405b86-4b0c-37b8-8342-7a179d42d569",
边栏推荐
- I Summary Preface
- Polka lines code recurrence
- 73b2d wireless charging and receiving chip scheme
- VS2005 compiles libcurl to normaliz Solution of Lib missing
- Torch model to tensorflow
- Relation extraction model -- spit model
- Whale conference provides digital upgrade scheme for the event site
- Time functions supported in optee
- Learning signal integrity from scratch (SIPI) -- 3 challenges faced by Si and Si based design methods
- optee中的timer代码导读
猜你喜欢

Fabrication of modulation and demodulation circuit

(5) Matrix key

Discrete device ~ diode triode

Intra class data member initialization of static const and static constexpr

XXL job configuration alarm email notification

Relationship extraction --r-bert

QT_ AI

73b2d wireless charging and receiving chip scheme

Apple motherboard decoding chip, lightning Apple motherboard decoding I.C

Install Anaconda + NVIDIA graphics card driver + pytorch under win10_ gpu
随机推荐
WBC learning notes (II): practical application of WBC control
STM32 porting mpu6050/9250 DMP official library (motion_driver_6.12) modifying and porting DMP simple tutorial
STM32 project design: an e-reader making tutorial based on stm32f4
Opencv learning notes II
Koa_ mySQL_ Integration of TS
Text to SQL model ----irnet
Embedded Software Engineer (6-15k) written examination interview experience sharing (fresh graduates)
Recyclerview item gets the current position according to the X and Y coordinates
Use of PCL
Batch modify file name
Comparison between Apple Wireless charging scheme and 5W wireless charging scheme
How to correctly PIP install pyscipopt
The principle and function of focus
Whale conference one-stop intelligent conference system helps organizers realize digital conference management
关于极客时间 | MySQL实战45讲的部分总结
static const与static constexpr的类内数据成员初始化
Why are you impetuous
Structure diagram of target detection network
Leetcode22 summary of types of questions brushing in 2002 (XII) and collection search
多台三菱PLC如何实现无线以太网高速通讯?