当前位置:网站首页>Anti crawling strategy (IP proxy, setting random sleep time, bilbili video information crawling, obtaining real URLs, processing special characters, processing timestamp, and multithreading)
Anti crawling strategy (IP proxy, setting random sleep time, bilbili video information crawling, obtaining real URLs, processing special characters, processing timestamp, and multithreading)
2022-06-13 02:01:00 【Triumph19】
Common anti - Crawler strategies
1. adopt Headers The crawler
- By identifying the... Requested by the user Headers Anti - crawling is the most commonly used anti - crawling strategy for websites . Many websites will be right HTTP Asking for the head User-Agent To test ( Determine whether it is accessed by the browser ); There are some websites that are right Refer To test ( Anti theft links to some resource websites ); And some will be right Cookie To test ( You need to log in to get more data ).
2. Anti crawler based on user behavior
- It is also a common anti - Crawler strategy to determine whether a request is from a crawler by detecting user behavior . for example , same IP The address can be accessed many times in a short time , Or the same account can be operated many times in a short time , It is possible for the website to take anti - Crawler measures .
3. Anti - crawling with dynamically loaded data
- There are some web pages through JavaScript Dynamically generated , Unable to directly crawl the current web page to get the required data , This makes it difficult for the crawler to crawl directly .
Anti - reptile measures
1. Using agents IP
- For website detection IP Anti crawler policy for access , You can use agents IP. agent IP It is a way to obtain network information on behalf of users IP Address , It can help crawlers hide their true identities , breakthrough IP Restrictions on access , Hide the truth of the crawler IP, So as to avoid being prohibited by the anti - Crawler program of the website .
- requests Library implementations use agents IP Very convenient , Just construct a proxy IP Dictionary , And then send HTTP When asked , Use proxies Parameter add proxy IP Just use your dictionary . If you need to use multiple agents IP, All agents can be IP Dictionaries make up a list , Then randomly select the proxy from the list IP.
边栏推荐
- Implementation of pointer linked list
- uniapp 预览功能
- SWD debugging mode of stm32
- 水管工游戏
- Ten thousand words make it clear that synchronized and reentrantlock implement locks in concurrency
- STM32 external interrupt Usage Summary
- Pytoch freeze pre training weights (feature extraction and BN layer)
- Delphi 10.4.2 release instructions and installation methods of three patches
- Jeux de plombiers
- Interruption of 51 single chip microcomputer learning notes (external interruption, timer interruption, interrupt nesting)
猜你喜欢

Viewing the ambition of Xiaodu technology from intelligent giant screen TV v86

Establishment of microservice development environment

Get started quickly cmake

一、搭建django自动化平台(实现一键执行sql)

Server installation jupyterab and remote login configuration

Calculation of accuracy, recall rate, F1 value and accuracy rate of pytorch prediction results (simple implementation)

When AI meets music, iFLYTEK music leads the industry reform with technology

Top level configuration + cooling black technology + cool appearance, the Red Devils 6S Pro is worthy of the flagship game of the year

Plumber game

Ten thousand words make it clear that synchronized and reentrantlock implement locks in concurrency
随机推荐
Alertwindowmanager pop up prompt window help (Part 1)
[pytorch FAQ] numpy:dll load failed while importing_ multiarray_ Umath: the specified module could not be found.
SWD debugging mode of stm32
白噪声的详细理解
Opencv camera calibration (2): fish eye camera calibration
Plumber game
反爬虫策略(ip代理、设置随机休眠时间、哔哩哔哩视频信息爬取、真实URL的获取、特殊字符的处理、时间戳的处理、多线程处理)
[printf function and scanf function] (learning note 5 -- standard i/o function)
LabVIEW大型项目开发提高质量的工具
传感器:SHT30温湿度传感器检测环境温湿度实验(底部附代码)
[work notes] xr872 codec driver migration and application program example (with chip debugging method)
LabVIEW large project development tools to improve quality
分享三个关于CMDB的小故事
swiper 横向轮播 grid
rsync 傳輸排除目錄
DFS and BFS to solve Treasure Island exploration
Installing pytorch geometric
(no plug-in) summary of vim basic shortcut keys
5、 Improvement of inventory query function
一、搭建django自动化平台(实现一键执行sql)