当前位置：网站首页>A preliminary study on the middleware of script Downloader

A preliminary study on the middleware of script Downloader

2022-07-03 22:42:00 【Keep a low profile】

Preliminary learning of downloader middleware , This thing is still quite complicated

Mainly complicated in his request 、 Changes in response , If there is no interception , This is easier

stay settings.py It's enabled inside

DOWNLOADER_MIDDLEWARES = {
    
   'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware': 543,
}

@classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

first spider_opened and The following functions work together

   def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)
        print('1. The crawler is running ')

   def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request 
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        # installed downloader middleware will be called
        print('2. Come to the request ', request.url, request.headers)
        return None
        """
        return none  Continue to send the request to the middleware or downloader   No interception 
        return Response  Direct return response  , The middleware Downloader is not executed , Forward pass 
        return Request   Return the request object   Return to the engine  , engine   Return to scheduler  , Continue with the following process 
        ""

    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object #  Respond to the upper layer , To the engine 
        # - return a Request object #  Return request , Give the engine  , To the scheduler 
        # - or raise IgnoreRequest
        print('3. Here we are ', response.status, response.headers)
        return response

import scrapy
from bs4 import BeautifulSoup


class TestMSpider(scrapy.Spider):
    name = 'test_m'
    allowed_domains = ['baidu.com']
    start_urls = ['https://www.baidu.com/']

    def parse(self, response, **kwargs):
        print('4. Finally came to the reptile response here , Give something about page parsing ')
        soup = BeautifulSoup(response.text, 'lxml')
        title = soup.find('title').text
        print(title)

Then you will get such a result
Insert picture description here

Take a chestnut
If it is multiple downloader middleware , As shown in the following code

Focus on
This 100,200 This number Namely Middleware to The distance of the engine
The movement of this thing is linear

So this walking method is shown in the figure below 1,3,4,2
Insert picture description here

DOWNLOADER_MIDDLEWARES = {
    
   'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware_01': 100,
   'test_middle_demo.middlewares.TestMiddleDemoDownloaderMiddleware_02': 200,
}

class TestMiddleDemoDownloaderMiddleware_01:
    

    def process_request(self, request, spider):
        
        print(1)
        return None

    def process_response(self, request, response, spider):
       
        print(2)
        return response



class TestMiddleDemoDownloaderMiddleware_02:
    

    def process_request(self, request, spider):
        
        print(3)
        return None

    def process_response(self, request, response, spider):
       
        print(4)
        return response

原网站

版权声明
本文为[Keep a low profile]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202142146506424.html

当前位置：网站首页>A preliminary study on the middleware of script Downloader

A preliminary study on the middleware of script Downloader

边栏推荐

猜你喜欢

随机推荐