当前位置:网站首页>Anti crawling strategy (IP proxy, setting random sleep time, bilbili video information crawling, obtaining real URLs, processing special characters, processing timestamp, and multithreading)
Anti crawling strategy (IP proxy, setting random sleep time, bilbili video information crawling, obtaining real URLs, processing special characters, processing timestamp, and multithreading)
2022-06-13 02:01:00 【Triumph19】
Common anti - Crawler strategies
1. adopt Headers The crawler
- By identifying the... Requested by the user Headers Anti - crawling is the most commonly used anti - crawling strategy for websites . Many websites will be right HTTP Asking for the head User-Agent To test ( Determine whether it is accessed by the browser ); There are some websites that are right Refer To test ( Anti theft links to some resource websites ); And some will be right Cookie To test ( You need to log in to get more data ).
2. Anti crawler based on user behavior
- It is also a common anti - Crawler strategy to determine whether a request is from a crawler by detecting user behavior . for example , same IP The address can be accessed many times in a short time , Or the same account can be operated many times in a short time , It is possible for the website to take anti - Crawler measures .
3. Anti - crawling with dynamically loaded data
- There are some web pages through JavaScript Dynamically generated , Unable to directly crawl the current web page to get the required data , This makes it difficult for the crawler to crawl directly .
Anti - reptile measures
1. Using agents IP
- For website detection IP Anti crawler policy for access , You can use agents IP. agent IP It is a way to obtain network information on behalf of users IP Address , It can help crawlers hide their true identities , breakthrough IP Restrictions on access , Hide the truth of the crawler IP, So as to avoid being prohibited by the anti - Crawler program of the website .
- requests Library implementations use agents IP Very convenient , Just construct a proxy IP Dictionary , And then send HTTP When asked , Use proxies Parameter add proxy IP Just use your dictionary . If you need to use multiple agents IP, All agents can be IP Dictionaries make up a list , Then randomly select the proxy from the list IP.
边栏推荐
- How do you use your own data to achieve your marketing goals?
- Plumber game
- Jeux de plombiers
- LabVIEW large project development tools to improve quality
- Numpy multidimensional array transpose transpose
- [the 4th day of the 10 day smart lock project based on stm32f401ret6] what is interrupt, interrupt service function, system tick timer
- Viewing the ambition of Xiaodu technology from intelligent giant screen TV v86
- Using atexit to realize automatic destruct of singleton mode
- Why is "iFLYTEK Super Brain 2030 plan" more worthy of expectation than "pure" virtual human
- Qt实现思维导图功能(二)
猜你喜欢

Calculation of accuracy, recall rate, F1 value and accuracy rate of pytorch prediction results (simple implementation)

In the third quarter, the revenue and net profit increased "against the trend". What did vatti do right?

移动IPv6光猫登录的一般ip地址账号与密码,移动光猫变桥接模式

Magics 23.0 how to activate and use the slice preview function of the view tool page

What is solid angle

TensorFlow2的Conv1D, Conv2D,Conv3D机器对应的MaxPooling详解

Gome's ambition of "folding up" app

What is the path field—— Competitive advertising

Sensorless / inductive manufacturing of brushless motor drive board based on stm32

万字讲清 synchronized 和 ReentrantLock 实现并发中的锁
随机推荐
【Unity】打包WebGL项目遇到的问题及解决记录
Qt实现思维导图功能(二)
Decompression and compression of chrome resource file Pak
Interruption of 51 single chip microcomputer learning notes (external interruption, timer interruption, interrupt nesting)
Devaxpress Chinese description --tcximagelist (enhanced image list control)
Logging system in chromium
Detailed explanation of C language conditional compilation
Shell command notes
水管工游戏
STM32 IIC protocol controls pca9685 steering gear drive board
When AI meets music, iFLYTEK music leads the industry reform with technology
What is the path field—— Competitive advertising
Delphi Google API text to speech MP3 file
Can't use typedef yet? C language typedef detailed usage summary, a solution to your confusion. (learning note 2 -- typedef setting alias)
Devaxpress Chinese description --tdximageslider (picture rotation control)
The method of drawing rounded panel with Delphi
Get started quickly cmake
华为设备配置CE双归属
水管工遊戲
TensorFlow 2. X multi graphics card distributed training