当前位置:网站首页>A preliminary study on the middleware of script Downloader
A preliminary study on the middleware of script Downloader
2022-07-03 22:42:00 【Keep a low profile】
Preliminary learning of downloader middleware , This thing is still quite complicated
Mainly complicated in his request 、 Changes in response , If there is no interception , This is easier
stay settings.py It's enabled inside
DOWNLOADER_MIDDLEWARES = {
'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware': 543,
}
@classmethod
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
first spider_opened and The following functions work together
def spider_opened(self, spider):
spider.logger.info('Spider opened: %s' % spider.name)
print('1. The crawler is running ')
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
print('2. Come to the request ', request.url, request.headers)
return None
"""
return none Continue to send the request to the middleware or downloader No interception
return Response Direct return response , The middleware Downloader is not executed , Forward pass
return Request Return the request object Return to the engine , engine Return to scheduler , Continue with the following process
""
def process_response(self, request, response, spider):
# Called with the response returned from the downloader.
# Must either;
# - return a Response object # Respond to the upper layer , To the engine
# - return a Request object # Return request , Give the engine , To the scheduler
# - or raise IgnoreRequest
print('3. Here we are ', response.status, response.headers)
return response
import scrapy
from bs4 import BeautifulSoup
class TestMSpider(scrapy.Spider):
name = 'test_m'
allowed_domains = ['baidu.com']
start_urls = ['https://www.baidu.com/']
def parse(self, response, **kwargs):
print('4. Finally came to the reptile response here , Give something about page parsing ')
soup = BeautifulSoup(response.text, 'lxml')
title = soup.find('title').text
print(title)
Then you will get such a result
Take a chestnut
If it is multiple downloader middleware , As shown in the following code
Focus on
This 100,200 This number Namely Middleware to The distance of the engine
The movement of this thing is linear
So this walking method is shown in the figure below 1,3,4,2
DOWNLOADER_MIDDLEWARES = {
'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware_01': 100,
'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware_02': 200,
}
class TestMiddleDemoDownloaderMiddleware_01:
def process_request(self, request, spider):
print(1)
return None
def process_response(self, request, response, spider):
print(2)
return response
class TestMiddleDemoDownloaderMiddleware_02:
def process_request(self, request, spider):
print(3)
return None
def process_response(self, request, response, spider):
print(4)
return response
边栏推荐
- 股票炒股开户注册安全靠谱吗?有没有风险的?
- How does sentinel, a traffic management artifact, make it easy for business parties to access?
- Programming language (2)
- Weekly leetcode - nc9/nc56/nc89/nc126/nc69/nc120
- 2022 electrician (elementary) examination questions and electrician (elementary) registration examination
- 油猴插件
- WFC900M-Network_ Card/Qualcomm-Atheros-AR9582-2T-2R-MIMO-802.11-N-900M-high-power-Mini-PCIe-Wi-Fi-Mod
- [sg function] lightoj Partitioning Game
- Summary of fluent systemchrome
- Harbor integrated LDAP authentication
猜你喜欢
540. Single element in ordered array
Programming language (2)
Shiftvit uses the precision of swing transformer to outperform the speed of RESNET, and discusses that the success of Vit does not lie in attention!
Cesium terrain clipping draw polygon clipping
Exness: the Central Bank of England will raise interest rates again in March, and inflation is coming
Redis single thread and multi thread
[dynamic planning] counting garlic customers: the log of garlic King (the longest increasing public subsequence)
QGIS grid processing DEM data reclassification
Pointer concept & character pointer & pointer array yyds dry inventory
On my first day at work, this API timeout optimization put me down!
随机推荐
File copy method
Why should enterprises do more application activities?
Buuctf, web:[geek challenge 2019] buyflag
Simple solution of m3u8 file format
The 2022 global software R & D technology conference was released, and world-class masters such as Turing prize winners attended
QGIS grid processing DEM data reclassification
Teach you to easily learn the type of data stored in the database (a must see for getting started with the database)
[Android reverse] use DB browser to view and modify SQLite database (download DB browser installation package | install DB browser tool)
[sg function]split game (2020 Jiangxi university student programming competition)
2022 safety officer-a certificate registration examination and summary of safety officer-a certificate examination
Mindmanager2022 serial number key decompression installer tutorial
IO flow review
Recursion and recursion
Druids connect to mysql8.0.11
LeetCode 540. A single element in an ordered array
Es6~es12 knowledge sorting and summary
Bluebridge cup Guoxin Changtian single chip microcomputer -- hardware environment (I)
Pan Yueming helps Germany's Rochester Zodiac custom wristwatch
string
Data consistency between redis and database