当前位置:网站首页>Scrapy crawler encounters redirection 301/302 problem solution
Scrapy crawler encounters redirection 301/302 problem solution
2022-08-02 04:00:00 【BIG_right】
Scrapy aborts redirects
When scrapy crawls data, it encounters redirection 301/302
, especially when crawling a download link, he will redirect directly and start downloading, and will return to crawling after downloadingThe link you retrieved, you need to stop the reset at this time
The following 302 can be replaced with 301, which is the same
Abort redirect
yield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse)
If the crawling is crawling with yield Request
in parse
, then the filter dont_filter=True
needs to be added. For details, see the following scenarioTwo
Get the Location value in the response
The redirected link will be placed in the Location in the header of the response, here is how to get the value
location = response.headers.get("Location")
Scenario One
If the crawling URL is executed sequentially in start_urls
, just add it directly in the start_requests
method
def start_requests(self):yield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse)
Complete example
import scrapyclass xxSpider(scrapy.Spider):name = 'xx'allowed_domains = ['www.xxx.com']start_urls = ['http://www.xxx.com/download']def start_requests(self):# Abort the 302 redirect directly hereyield Request(start_urls[0],meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse)def parse(self, response):# Get the returned redirect valuelocation = response.headers.get("Location")
Scenario Two
If the crawling is crawling with yield Request
in parse
, then you need to add the filter dont_filter=True
yield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse,dont_filter=True)
Complete example
import scrapyclass xxSpider(scrapy.Spider):name = 'xx'allowed_domains = ['www.xxx.com']start_urls = ['http://www.xxx.com/download']def parse(self, response):url = "xxxxxxxxxx"# need to add filter hereyield Request(url,meta={'dont_redirect': True,'handle_httpstatus_list': [302]},callback=self.parse,dont_filter=True)
边栏推荐
- [symfony/finder]最好用的文件操作库
- Praying: 1 vulnhub walkthrough
- 点名系统和数组元素为对象的排序求最大值和最小值
- 解决uni-app 打包H5网站 下载图片问题
- Query the indexes of all tables in the database and parse them into sql
- [symfony/mailer] An elegant and easy-to-use mail library
- (1)Thinkphp6入门、安装视图、模板渲染、变量赋值
- (3) Thinkphp6 database
- Stable and easy-to-use short connection generation platform, supporting API batch generation
- When PHP initiates Alipay payment, the order information is garbled and solved
猜你喜欢
随机推荐
ES6三点运算符、数组方法、字符串扩展方法
Praying: 1 vulnhub walkthrough
[campo/random-user-agent]随机伪造你的User-Agent
easyswoole uses redis to perform geoRadiusByMember Count invalid fix
QR code generation API interface, which can be directly connected as an A tag
SQL:DDL、DML、DQL、DCL相应介绍以及演示
Various ways of AES encryption
TCP通信程序
(2) 顺序结构、对象的布尔值、选择结构、循环结构、列表、字典、元组、集合
PHP8.2将会有哪些新东西?
14. JS Statements and Comments, Variables and Data Types
CTF入门之md5
Kali install IDEA
[symfony/finder] The best file manipulation library
[symfony/finder]最好用的文件操作库
14.JS语句和注释,变量和数据类型
4.表单与输入
(3) 字符串
[league/flysystem] An elegant and highly supported file operation interface
使用PHPMailer发送邮件