当前位置:网站首页>Solution to the encoding problem encountered by the crawler when requesting get/post
Solution to the encoding problem encountered by the crawler when requesting get/post
2022-06-26 08:51:00 【Feng Dashao】
In many HTML in , The encoding format is generally UTF-8 and GBK Mainly ,(i.e charset = ‘’ utf-8 " or charset = “gbk” ) . When passed Get/Post When asked , There will be coding problems , The content is shown as “unable to decode value”, Here's how to solve this problem .
With vipshop “Get” Request as an example , so ,title The later value is " unable to decode value", ( chart 1), When you meet decode I may wonder if it is because js Caused by encryption , But then through Get Visible in the requested data , Actually unable to decode value The content of is Get Requesting data title Of value %u96C0%u5DE2NESTLE%u996E%u6599%u51B2%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A, ( chart 2), you 're right , In fact, this is a title The real content turns into unicode. about Get Requested method , Write the data directly as Key_Value That's right .
Through the “%u96C0%u5DE2NESTLE%u996E%u6599%u51B2
%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A” decode after , To put % Switch to \ ( chart 3),, available title The content is “ Nestle NESTLE Special session for beverage preparation _ Vipshop ” ( chart 4)
But if you need to pass Post Request method , Just the opposite , To put the title in Chinese encode by unicode As Post Data request . First , Need to know HTML Of Encoded as utf-8 ( chart 5), , And then use encode() Method pair Content code , ( chart 6)
Through this decode and encode Method , I believe we will encounter similar coding problems in the future , Can draw inferences from one instance to solve .
For the anti - crawling coding mechanism , There are also the most common base64 and md5, Font mapping, etc , I will continue to share when I am free .
chart 1
chart 2

chart 3
value = '\u96C0\u5DE2NESTLE\u996E\u6599\u51B2\u8C03\u4E13\u573A_\u552F\u54C1\u4F1A'
print(value)
chart 4
chart 5

chart 6
Omit the part key_value Code
"bv":"mozilla/5.0 (windows nt 6.1; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/94.0.4606.71 safari/537.36",
"ce":"1",
"vs":"",
"title": " Nestle NESTLE Special session for beverage preparation _ Vipshop ".encode('utf-8'),
"tab_page_id":"1633709750465_a2405b86-4b0c-37b8-8342-7a179d42d569",
边栏推荐
- Learning signal integrity from scratch (SIPI) -- 3 challenges faced by Si and Si based design methods
- 1.Intro_ Math (white board derivation and reprint of station B)
- WBC learning notes (I): manually push WBC formula
- Exploration of webots and ROS joint simulation (I): software installation
- SQL learning experience (II): question brushing record
- Installation of jupyter
- Playing card image segmentation
- ZLMediaKit推流拉流测试
- Steps for ROS to introduce opencv (for cmakelist)
- Recyclerview item gets the current position according to the X and Y coordinates
猜你喜欢

Comparison between Apple Wireless charging scheme and 5W wireless charging scheme

SQL learning experience (II): question brushing record

Opencv learning notes 3

Stream analysis of hevc learning

Principle of playing card image segmentation

Design of reverse five times voltage amplifier circuit

STM32 project design: smart home system design based on stm32

Transformers loading Roberta to implement sequence annotation task

Convex optimization of quadruped

Relationship extraction --tplinker
随机推荐
Segmentation of structured light images using segmentation network
鲸会务为活动现场提供数字化升级方案
Opencv learning notes 3
Learning signal integrity from scratch (SIPI) -- 3 challenges faced by Si and Si based design methods
[resolved]setonnavigationitemselectedlistener() deprecated
滑块验证 - 亲测 (京东)
Steps for ROS to introduce opencv (for cmakelist)
opencv学习笔记三
51 single chip microcomputer project design: schematic diagram of timed pet feeding system (LCD 1602, timed alarm clock, key timing) Protues, KEIL, DXP
Intra class data member initialization of static const and static constexpr
Fourier transform of image
STM32 based d18s20 (one wire)
QT_ AI
golang json unsupported value: NaN 处理
HEVC学习之码流分析
pgsql_ UDF01_ jx
Detailed explanation of self attention & transformer
软件工程-个人作业-提问回顾与个人总结
Corn image segmentation count_ nanyangjx
isinstance()函数用法