当前位置:网站首页>Scrapy crawler encounters redirection 301/302 problem solution
Scrapy crawler encounters redirection 301/302 problem solution
2022-08-02 04:00:00 【BIG_right】
Scrapy aborts redirects
When scrapy crawls data, it encounters redirection 301/302, especially when crawling a download link, he will redirect directly and start downloading, and will return to crawling after downloadingThe link you retrieved, you need to stop the reset at this time
The following 302 can be replaced with 301, which is the same
Abort redirect
yield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse)If the crawling is crawling with yield Request in parse, then the filter dont_filter=True needs to be added. For details, see the following scenarioTwo
Get the Location value in the response
The redirected link will be placed in the Location in the header of the response, here is how to get the value
location = response.headers.get("Location")Scenario One
If the crawling URL is executed sequentially in start_urls, just add it directly in the start_requests method
def start_requests(self):yield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse)Complete example
import scrapyclass xxSpider(scrapy.Spider):name = 'xx'allowed_domains = ['www.xxx.com']start_urls = ['http://www.xxx.com/download']def start_requests(self):# Abort the 302 redirect directly hereyield Request(start_urls[0],meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse)def parse(self, response):# Get the returned redirect valuelocation = response.headers.get("Location")Scenario Two
If the crawling is crawling with yield Request in parse, then you need to add the filter dont_filter=True
yield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse,dont_filter=True)Complete example
import scrapyclass xxSpider(scrapy.Spider):name = 'xx'allowed_domains = ['www.xxx.com']start_urls = ['http://www.xxx.com/download']def parse(self, response):url = "xxxxxxxxxx"# need to add filter hereyield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse,dont_filter=True)边栏推荐
- Function hoisting and variable hoisting
- Shuriken: 1 vulnhub walkthrough
- hackmyvm-hopper walkthrough
- 多线程(实现多线程、线程同步、生产者消费者)
- hackmyvm-random walkthrough
- easyswoole uses redis to perform geoRadiusByMember Count invalid fix
- PHP8.2 version release administrator and release plan
- 二维码生成API接口,可以直接作为A标签连接
- IP门禁:手把手教你用PHP实现一个IP防火墙
- DVWA drone installation tutorial
猜你喜欢

TypeScript 错误 error TS2469、error TS2731 解决办法

web渗透必玩的靶场——DVWA靶场 1(centos8.2+phpstudy安装环境)

Phpstudy安装Thinkphp6(问题+解决)

13. JS output content and syntax

动力:2 vulnhub预排

SQL classification, DQL (Data Query Language), and corresponding SQL query statement demonstration

New usage of string variable parsing in PHP8.2

SQL:DDL、DML、DQL、DCL相应介绍以及演示
![[sebastian/diff]一个比较两段文本的历史变化扩展库](/img/c7/ea79db7a5003523ece7cf4f39e4987.png)
[sebastian/diff]一个比较两段文本的历史变化扩展库

12. What is JS
随机推荐
PHP image compression to specified size
Xiaoyao multi-open emulator ADB driver connection
4.PHP数组与数组排序
VIKINGS: 1 vulnhub walkthrough
Orasi: 1 vulnhub walkthrough
[symfony/mailer]一个优雅易用的发送邮件类库
1. Beginning with PHP
php函数漏洞总结
[symfony/finder] The best file manipulation library
(2) 顺序结构、对象的布尔值、选择结构、循环结构、列表、字典、元组、集合
SQL: DDL, DML, DQL, DCL corresponding introduction and demonstration
v-bind用法:类动态绑定对象 数组 style样式 及函数方法
GreenOptic: 1 vulnhub walkthrough
hackmyvm: again walkthrough
Basic use of v-on, parameter passing, modifiers
批量替换文件字体,简体->繁体
The focus of the Dom implementation input triggers
IO流、 编码表、 字符流、 字符缓冲流
DVWA drone installation tutorial
hackmyvm-hopper walkthrough