当前位置:网站首页>A preliminary study on the middleware of script Downloader
A preliminary study on the middleware of script Downloader
2022-07-03 22:42:00 【Keep a low profile】
Preliminary learning of downloader middleware , This thing is still quite complicated
Mainly complicated in his request 、 Changes in response , If there is no interception , This is easier
stay settings.py It's enabled inside
DOWNLOADER_MIDDLEWARES = {
'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware': 543,
}
@classmethod
def from_crawler(cls, crawler):
# This method is used by Scrapy to create your spiders.
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
first spider_opened and The following functions work together
def spider_opened(self, spider):
spider.logger.info('Spider opened: %s' % spider.name)
print('1. The crawler is running ')
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
print('2. Come to the request ', request.url, request.headers)
return None
"""
return none Continue to send the request to the middleware or downloader No interception
return Response Direct return response , The middleware Downloader is not executed , Forward pass
return Request Return the request object Return to the engine , engine Return to scheduler , Continue with the following process
""
def process_response(self, request, response, spider):
# Called with the response returned from the downloader.
# Must either;
# - return a Response object # Respond to the upper layer , To the engine
# - return a Request object # Return request , Give the engine , To the scheduler
# - or raise IgnoreRequest
print('3. Here we are ', response.status, response.headers)
return response
import scrapy
from bs4 import BeautifulSoup
class TestMSpider(scrapy.Spider):
name = 'test_m'
allowed_domains = ['baidu.com']
start_urls = ['https://www.baidu.com/']
def parse(self, response, **kwargs):
print('4. Finally came to the reptile response here , Give something about page parsing ')
soup = BeautifulSoup(response.text, 'lxml')
title = soup.find('title').text
print(title)
Then you will get such a result 
Take a chestnut
If it is multiple downloader middleware , As shown in the following code
Focus on
This 100,200 This number Namely Middleware to The distance of the engine
The movement of this thing is linear
So this walking method is shown in the figure below 1,3,4,2
DOWNLOADER_MIDDLEWARES = {
'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware_01': 100,
'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware_02': 200,
}
class TestMiddleDemoDownloaderMiddleware_01:
def process_request(self, request, spider):
print(1)
return None
def process_response(self, request, response, spider):
print(2)
return response
class TestMiddleDemoDownloaderMiddleware_02:
def process_request(self, request, spider):
print(3)
return None
def process_response(self, request, response, spider):
print(4)
return response
边栏推荐
- Yyds dry goods inventory [practical] simply encapsulate JS cycle with FP idea~
- Cesium terrain clipping draw polygon clipping
- How to solve win10 black screen with only mouse arrow
- Esp-idf turns off serial port log output.
- Programming language (2)
- File copy method
- SDNU_ ACM_ ICPC_ 2022_ Winter_ Practice_ 4th [individual]
- Summary of fluent systemchrome
- Learning notes of raspberry pie 4B - IO communication (SPI)
- What are the common computer problems and solutions
猜你喜欢

How to solve the problem of computer networking but showing no Internet connection

4 environment construction -standalone ha

Morning flowers and evening flowers

Kali2021.4a build PWN environment

Weekly leetcode - nc9/nc56/nc89/nc126/nc69/nc120

2022 safety officer-a certificate registration examination and summary of safety officer-a certificate examination

Unique in China! Alibaba cloud container service enters the Forrester leader quadrant

Programming language (1)

Go Technology Daily (2022-02-13) - Summary of experience in database storage selection
![Yyds dry goods inventory [practical] simply encapsulate JS cycle with FP idea~](/img/af/1975b37d81bbdb9709ff181b9a72f9.jpg)
Yyds dry goods inventory [practical] simply encapsulate JS cycle with FP idea~
随机推荐
Harbor integrated LDAP authentication
string
Shell script three swordsman awk
2022 safety officer-b certificate examination summary and safety officer-b certificate simulation test questions
[dynamic planning] counting garlic customers: the log of garlic King (the longest increasing public subsequence)
Oil monkey plug-in
LeetCode 540. A single element in an ordered array
SDMU OJ#P19. Stock trading
STM32 multi serial port implementation of printf -- Based on cubemx
Team collaborative combat penetration tool CS artifact cobalt strike
Sort merge sort
SDNU_ ACM_ ICPC_ 2022_ Winter_ Practice_ 4th [individual]
Niuke winter vacation training camp 4 g (enumeration optimization, Euler power reduction)
Programming language (2)
How can enterprises and developers take advantage of the explosion of cloud native landing?
Conditional statements of shell programming
How to connect a laptop to a projector
2022 free examination questions for safety management personnel of hazardous chemical business units and reexamination examination for safety management personnel of hazardous chemical business units
IPhone development swift foundation 08 encryption and security
WFC900M-Network_ Card/Qualcomm-Atheros-AR9582-2T-2R-MIMO-802.11-N-900M-high-power-Mini-PCIe-Wi-Fi-Mod