当前位置:网站首页>Solution to the encoding problem encountered by the crawler when requesting get/post

Solution to the encoding problem encountered by the crawler when requesting get/post

2022-06-26 08:51:00 Feng Dashao

   In many HTML in , The encoding format is generally UTF-8 and GBK Mainly ,(i.e charset = ‘’ utf-8 " or charset = “gbk” ) . When passed Get/Post When asked , There will be coding problems , The content is shown as “unable to decode value”, Here's how to solve this problem .

   With vipshop “Get” Request as an example , so ,title The later value is " unable to decode value", ( chart 1), When you meet decode I may wonder if it is because js Caused by encryption , But then through Get Visible in the requested data , Actually unable to decode value The content of is Get Requesting data title Of value %u96C0%u5DE2NESTLE%u996E%u6599%u51B2%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A, ( chart 2), you 're right , In fact, this is a title The real content turns into unicode. about Get Requested method , Write the data directly as Key_Value That's right .


   Through the “%u96C0%u5DE2NESTLE%u996E%u6599%u51B2
%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A” decode after , To put % Switch to \ ( chart 3),, available title The content is “ Nestle NESTLE Special session for beverage preparation _ Vipshop ” ( chart 4)


   But if you need to pass Post Request method , Just the opposite , To put the title in Chinese encode by unicode As Post Data request . First , Need to know HTML Of Encoded as utf-8 ( chart 5), , And then use encode() Method pair Content code , ( chart 6)

   Through this decode and encode Method , I believe we will encounter similar coding problems in the future , Can draw inferences from one instance to solve .

   For the anti - crawling coding mechanism , There are also the most common base64 and md5, Font mapping, etc , I will continue to share when I am free .


chart 1
 Insert picture description here

chart 2

 Insert picture description here

chart 3

value = '\u96C0\u5DE2NESTLE\u996E\u6599\u51B2\u8C03\u4E13\u573A_\u552F\u54C1\u4F1A'
print(value)


chart 4
 Insert picture description here


chart 5

 Insert picture description here


chart 6

Omit the part key_value Code

"bv":"mozilla/5.0 (windows nt 6.1; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/94.0.4606.71 safari/537.36",
"ce":"1",
"vs":"",
"title": " Nestle NESTLE Special session for beverage preparation _ Vipshop ".encode('utf-8'),
"tab_page_id":"1633709750465_a2405b86-4b0c-37b8-8342-7a179d42d569",
原网站

版权声明
本文为[Feng Dashao]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206260830111753.html