当前位置:网站首页>Solution to the encoding problem encountered by the crawler when requesting get/post
Solution to the encoding problem encountered by the crawler when requesting get/post
2022-06-26 08:51:00 【Feng Dashao】
In many HTML in , The encoding format is generally UTF-8 and GBK Mainly ,(i.e charset = ‘’ utf-8 " or charset = “gbk” ) . When passed Get/Post When asked , There will be coding problems , The content is shown as “unable to decode value”, Here's how to solve this problem .
With vipshop “Get” Request as an example , so ,title The later value is " unable to decode value", ( chart 1), When you meet decode I may wonder if it is because js Caused by encryption , But then through Get Visible in the requested data , Actually unable to decode value The content of is Get Requesting data title Of value %u96C0%u5DE2NESTLE%u996E%u6599%u51B2%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A, ( chart 2), you 're right , In fact, this is a title The real content turns into unicode. about Get Requested method , Write the data directly as Key_Value That's right .
Through the “%u96C0%u5DE2NESTLE%u996E%u6599%u51B2
%u8C03%u4E13%u573A_%u552F%u54C1%u4F1A” decode after , To put % Switch to \ ( chart 3),, available title The content is “ Nestle NESTLE Special session for beverage preparation _ Vipshop ” ( chart 4)
But if you need to pass Post Request method , Just the opposite , To put the title in Chinese encode by unicode As Post Data request . First , Need to know HTML Of Encoded as utf-8 ( chart 5), , And then use encode() Method pair Content code , ( chart 6)
Through this decode and encode Method , I believe we will encounter similar coding problems in the future , Can draw inferences from one instance to solve .
For the anti - crawling coding mechanism , There are also the most common base64 and md5, Font mapping, etc , I will continue to share when I am free .
chart 1
chart 2

chart 3
value = '\u96C0\u5DE2NESTLE\u996E\u6599\u51B2\u8C03\u4E13\u573A_\u552F\u54C1\u4F1A'
print(value)
chart 4
chart 5

chart 6
Omit the part key_value Code
"bv":"mozilla/5.0 (windows nt 6.1; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/94.0.4606.71 safari/537.36",
"ce":"1",
"vs":"",
"title": " Nestle NESTLE Special session for beverage preparation _ Vipshop ".encode('utf-8'),
"tab_page_id":"1633709750465_a2405b86-4b0c-37b8-8342-7a179d42d569",
边栏推荐
- ROS learning notes (5) -- Exploration of customized messages
- Recyclerview item gets the current position according to the X and Y coordinates
- [已解决]setOnNavigationItemSelectedListener()被弃用
- VS2005 compiles libcurl to normaliz Solution of Lib missing
- Checkerboard generation + camera calibration + stereo matching
- Whale conference one-stop intelligent conference system helps organizers realize digital conference management
- Jupyter的安装
- golang json unsupported value: NaN 处理
- Drawing with MATLAB (1)
- ZLMediaKit推流拉流测试
猜你喜欢

pgsql_ UDF01_ jx

51 single chip microcomputer project design: schematic diagram of timed pet feeding system (LCD 1602, timed alarm clock, key timing) Protues, KEIL, DXP

Use a switch to control the lighting and extinguishing of LEP lamp

Opencv learning notes II

Digital image processing learning (II): Gaussian low pass filter

Text to SQL model ----irnet

Remote centralized control of distributed sensor signals using wireless technology

Corn image segmentation count_ nanyangjx

And are two numbers of S

Installation of jupyter
随机推荐
Detailed explanation of traditional image segmentation methods
What are the conditions for Mitsubishi PLC to realize Ethernet wireless communication?
Opencv learning notes 3
keras_ Callback function summary
WBC learning notes (II): practical application of WBC control
FFmpeg音视频播放器实现
The best time to buy and sell stocks to get the maximum return
OpenCV Learning notes iii
Installation of jupyter
ZLMediaKit推流拉流测试
torch. fft
Time functions supported in optee
Speckle denoising method for ultrasonic image
51 MCU project design: Based on 51 MCU clock perpetual calendar
ROS learning notes (6) -- function package encapsulated into Library and called
STM32 based d18s20 (one wire)
Leetcode notes: binary search simple advanced
ROS learning notes (5) -- Exploration of customized messages
1.Intro_ Math (white board derivation and reprint of station B)
Analysis of Yolo series principle