当前位置:网站首页>Crawler of explanation and application of agency theory
Crawler of explanation and application of agency theory
2022-07-06 03:28:00 【Master Zheng fried chestnuts】
Explanation and application of agency theory
One 、 Explanation of agency theory
In the process of being a reptile , This is often the case , for example : The initial crawler will run normally , Capture data normally , But there may be some mistakes after a period of time . Open the corresponding website and have a look , You might see “ Yours IP Too high access frequency ” And so on . The corresponding reasons for these situations , It's some anti climbing measures taken by portal websites .
for example : The server will detect something IP Number of requests per unit time , If within the unit time , Some IP The number of requests has exceeded a certain threshold , Then the server will directly reject the request , It will return some error messages to the client .
This situation can be called the corresponding request IP By server side “ It's sealed off ”( Blacklist ).
The server side detects a IP Number of requests per unit time , If we use some way to disguise our IP, Let the server not recognize that it is our local IP The request made , In this way, we can prevent “ seal IP” Such server-side behavior , Such behavior is an effective anti creep strategy .
1. The core purpose of agency
- Crack the seal IP This anti climbing mechanism .
2. What is agency ?
- Proxy is actually a proxy server . function : Act as an agent for network users to obtain relevant network information .
- Proxy server can be understood as a transit station in network information . When we normally request a server website , We send a request to web The server ,web The server will send the response data back to the corresponding client .
- If we set up a proxy server , In fact, it builds a bridge between our local computer and the server , At this time, the machine is not directly to web Server initiates request , Instead, send a request to the proxy server , The request is sent to the proxy server , The proxy server will relay our request to the server , Then the proxy server sends web The response object returned by the server is sent back to our client ( This machine ). In this way, we can also visit normally web These pages in the server .
- But in the process ,web The server recognizes the real IP It's not our local IP 了 . Because we don't use our machine directly IP Send the request to the server , Instead, the request is sent to the proxy server , The proxy server forwarded our request to web Server side , In this way, we have successfully achieved IP The disguise of . This is also the basic principle of agency .
3. The role of agency :
- Break through yourself IP Restrictions on access .
- Hide your true IP, Avoid being attacked .
4. Agent related websites :
- Come on, agent
- The West Temple agent
- https://m.goubanjia.com/
- https://proxy.seofangfa.com/
5. agent IP The type of :
- http: Applied to the http The agreement corresponds to url in
- https: Applied to the https The agreement corresponds to url in
6. agent IP Anonymity
(1) transparent : The server knows that the request uses a proxy , Also know the real request IP
(2) anonymous : Know that the agent is used , I don't know the truth IP
(3) Gao Yin : I don't know if the agent is used , I don't know the real IP
Two 、 Application of agent in crawler
1. Agent operation :
import requests
url = 'http://www.baidu.com/s?wd=IP'
headers = {
'User-Agent': 'Moz...'
}
page_text = requests.get(url=url, headers=headers,proxies={
"http://":'183.247.199.111'}).text
# Persistent storage open ip.html If you find this IP It's native IP Address , Then there is no agent
with open('ip.html','w',encoding='utf-8') as fp:
fp.write(page_text)
print(' End of storage ')
2. Anti climbing mechanism : seal IP
3. Anti-crawl strategy : Send request using proxy
!!!: Just know the method , Free agent IP Most of them don't work , Specific projects can be used to pay purchase agents IP.
边栏推荐
- Redo file corruption repair
- SD卡報錯“error -110 whilst initialising SD card
- 施努卡:3d视觉检测应用行业 机器视觉3d检测
- mysqldump数据备份
- 【概念】Web 基础概念认知
- Teach you to build your own simple BP neural network with pytoch (take iris data set as an example)
- Precautions for single chip microcomputer anti reverse connection circuit
- 记录一下逆向任务管理器的过程
- Pytorch load data
- 2、GPIO相关操作
猜你喜欢
Pytorch load data
three. JS page background animation liquid JS special effect
Schnuka: visual positioning system working principle of visual positioning system
2.2 STM32 GPIO operation
Pytorch基础——(1)张量(tensor)的初始化
给新人工程师组员的建议
Edcircles: a real time circle detector with a false detection control translation
Python implementation of maddpg - (1) openai maddpg environment configuration
Record the process of reverse task manager
Résumé des méthodes de reconnaissance des caractères ocr
随机推荐
3.1 detailed explanation of rtthread serial port device (V1)
Differences and application scenarios between resulttype and resultmap
SD card reports an error "error -110 whilst initializing SD card
Leetcode problem solving -- 173 Binary search tree iterator
resulttype和resultmap的区别和应用场景
[risc-v] external interrupt
C language judgment, ternary operation and switch statement usage
Quartz misfire missed and compensated execution
Analyze menu analysis
[slam] lidar camera external parameter calibration (Hong Kong University marslab) does not need a QR code calibration board
Four logs of MySQL server layer
MPLS experiment
Getting started with applet cloud development - getting user search content
3.2 rtthread 串口设备(V2)详解
暑期刷题-Day3
Idea push rejected solution
ArabellaCPC 2019(补题)
Mysql database operation
Safety science to | travel, you must read a guide
BUUCTF刷题笔记——[极客大挑战 2019]EasySQL 1