当前位置:网站首页>Script redis write project notes
Script redis write project notes
2022-07-24 11:42:00 【Fan zhidu】
Crawler file :
from scrapy.spiders import Rule
from scrapy.linkextractors import LinkExtractor
from scrapy_redis.spiders import RedisCrawlSpider
class MyCrawler(RedisCrawlSpider):
name = 'mycrawler_redis'
redis_key = 'mycrawler:start_urls'
# The rules
rules = (
# follow all links
Rule(LinkExtractor(), callback='parse_page', follow=True),
)
# The key is allowed_domains Make the back into an array list
def __init__(self, *args, **kwargs):
# Dynamically define the allowed domains list.
domain = kwargs.pop('domain', '')
self.allowed_domains =list(filter(None, domain.split(',')))
super(MyCrawler, self).__init__(*args, **kwargs)
def parse_page(self, response):
return {
'name': response.css('title::text').extract_first(),
'url': response.url,
}stay setting Add the following code to the file :
REDIS_URL = 'redis://root:@127.0.0.1:6379'
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
DUPEFILTER_DEBUG =True
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
SCHEDULER_PERSIST = True
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderPriorityQueue'边栏推荐
- Win10 icon turns white, recovery method
- Chapter 1 Introduction
- Remember to optimize my personal blog once
- 哈希——202. 快乐数
- Hash - 1. Sum of two numbers - some people fall in love during the day, some people watch the sea at night, and some people can't do the first question
- Svn server and client installation (Chinese package) and simple use
- Shengxin weekly issue 37
- 视频回放 | 如何成为一名优秀的地学和生态学领域的国际期刊审稿人?
- Nacos permissions and databases
- Chapter 0 Introduction and environment configuration
猜你喜欢
![Detailed OSPF configuration of layer 3 switch / router [Huawei ENSP experiment]](/img/a9/f080940ec7bf94ab83c922990efa62.png)
Detailed OSPF configuration of layer 3 switch / router [Huawei ENSP experiment]

NFT digital collection system construction - app development
What is cloud native? Why is cloud native technology so popular?

Semaphore详解

HCIP MGRE实验 第三天
![MOS tube - Notes on rapid recovery application (I) [principle]](/img/a1/8427c9b1d0ea0cecce820816510045.png)
MOS tube - Notes on rapid recovery application (I) [principle]

6k+ star, a deep learning code base for Xiaobai! One line of code implements all attention mechanisms!

JPS has no namenode and datanode reasons

链表——142. 环形链表 II

Types and history of bugs in it circle
随机推荐
String - Sword finger offer 05. replace spaces
Linked list - Sword finger offer interview question 02.07. linked list intersection
Semaphore details
Mysql database
MySql的DDL和DML和DQL的基本语法
stream流
MySQL advanced (XVII) cannot connect to database server problem analysis
Notes on @enableconfigurationproperties
Win10 icon turns white, recovery method
Literature record (part109) -- self representation based unsupervised exemplar selection in a union of subspaces
Grep actually uses ps/netstat/sort
Share the typora tool
Nodejs ctf 基础
Basic syntax of MySQL DDL and DML and DQL
MOS tube - Notes on rapid recovery application (I) [principle]
Robot framework official tutorial (I) getting started
Fastcgi operation principle and PHP FPM parameter configuration
【C和指针第11章】动态内存分配
Differences between JS map and foreach
【反序列化漏洞-02】PHP反序列化漏洞原理测试及魔术方法总结