当前位置:网站首页>Can't get data for duplicate urls using Scrapy framework, dont_filter=True
Can't get data for duplicate urls using Scrapy framework, dont_filter=True
2022-08-03 09:32:00 【The moon give me copy code】
Scenario: The code reports no errors, and the xpath expression is determined to be parsed correctly.
Possible cause: You are using Scrapy to request duplicate urls.
Scrapy has duplicate filtering built in, which is turned on by default.
The following example, parse2 cannot be called:
import scrapyclass ExampleSpider(scrapy.Spider):name="test"# allowed_domains = ["https://www.baidu.com/"]start_urls = ["https://www.baidu.com/"]def parse(self, response):yield scrapy.Request(self.start_urls[0],callback=self.parse2)def parse2(self, response):print(response.url)When Scrapy enters parse, it will request start_urls[0] by default, and when you request start_urls[0] again in parse, the bottom layer of Scrapy will filter out duplicate urls by default, and will not process the request.commit, that's why parse2 is not called.
Workaround:
Add dont_filter=True parameter so that Scrapy doesn't filter out duplicate requests.
import scrapyclass ExampleSpider(scrapy.Spider):name="test"# allowed_domains = ["https://www.baidu.com/"]start_urls = ["https://www.baidu.com/"]def parse(self, response):yield scrapy.Request(self.start_urls[0],callback=self.parse2,dont_filter=True)def parse2(self, response):print(response.url)At this point, parse2 will be called normally.
边栏推荐
猜你喜欢
随机推荐
【字节面试】word2vector输出多少个类别
Redis集群概念与搭建
SQL Daily Practice (Nioke New Question Bank) - Day 5: Advanced Query
Partition table (1)
多媒体数据处理实验4:LSH索引
STP普通生成树安全特性— bpduguard特性 + bpdufilter特性 + guard root 特性 III loopguard技术( 详解+配置)
STP生成树(端口状态+端口角色+收敛机制 )|||| STP优化技术( uplinkfast技术+Portfast技术+backbonefast技术 )详解
013-二叉树
LeetCode第三题(Longest Substring Without Repeating Characters)三部曲之二:编码实现
2022最新整理软件测试常见面试题附答案
LeetCode第三题(Longest Substring Without Repeating Characters)三部曲之二:编码实现
兔起鹘落全端涵盖,Go lang1.18入门精炼教程,由白丁入鸿儒,全平台(Sublime 4)Go lang开发环境搭建EP00
Scala parallel collections, parallel concurrency, thread safety issues, ThreadLocal
scala减少,reduceLeft reduceRight,折叠,foldLeft foldRight
二叉查找树的插入
WinCheck Script
Alibaba Cloud SMS Sending
【LeetCode】zj面试-把字符串转换成整数
ORA-06512 数字或值错误字符串缓冲区太小
gpnmb+ gpnmb-AT2 cell idling mapping Epithelial cell idling mapping









