当前位置:网站首页>Regular execution of scratch
Regular execution of scratch
2022-07-28 06:50:00 【Phantom seven illusions】
1. Use schedule Timing execution
# -*- coding: utf-8 -*-
import subprocess
import schedule
import time
import datetime
from multiprocessing import Process
from scrapy import cmdline
import logging
def crawl_work():
# subprocess.Popen('scrapy crawl it')
print('-'*100)
# args = ["scrapy", "crawl", 'it']
# while True:
# start = time.time()
# p = Process(target=cmdline.execute, args=(args,))
# p.start()
# p.join()
# logging.debug("### use time: %s" % (time.time() - start))
if __name__=='__main__':
print('*'*10+' Start scheduled crawler execution '+'*'*10)
schedule.every(1).minutes.do(crawl_work)
print(' The current time is {}'.format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
print('*' * 10 + ' Timed crawler starts running ' + '*' * 10)
while True:
schedule.run_pending()
time.sleep(10)
2. A silly way . Circular sleep
# -*- coding: utf-8 -*-
from multiprocessing import Process
from scrapy import cmdline
import time
import logging
# Configure parameters , Reptile name , Operating frequency
confs = [
{
"spider_name": "it",
"frequency": 2,
},
]
def start_spider(spider_name, frequency):
args = ["scrapy", "crawl", spider_name]
while True:
start = time.time()
p = Process(target=cmdline.execute, args=(args,))
p.start()
p.join()
logging.debug("### use time: %s" % (time.time() - start))
time.sleep(frequency)
if __name__ == '__main__':
for conf in confs:
process = Process(target=start_spider,args=(conf["spider_name"], conf["frequency"]))
process.start()
time.sleep(86400)3.ubuntu In some cases or win Adopt the timing of the system itself
To write cron.sh Script
#! /bin/sh
export PATH=$PATH:/usr/local/bin
cd /home/zhangchao/CVS/testCron
nohup scrapy crawl example >> example.log 2>&1 &scrapy Set the total execution time
Turn off scheduled tasks :
scrapy Of setting Add a configuration item to
CLOSESPIDER_TIMEOUT = 82800 # 23 End the crawler in hours
Explain it.
CLOSESPIDER_TIMEOUT
The default value is :0
An integer value , The unit is in seconds . If one spider Still running after the specified number of seconds , It will take closespider_timeout The reason for is automatically turned off . If the value is set to 0( Or no Settings ),spiders Will not close due to timeout .
边栏推荐
- Ten thousand words summarize and realize the commonly used sorting and performance comparison
- ---栈&队列---
- JS reverse question 100 - question 1
- 技术分享 | 实战详解接口测试请求方式Get、post
- Which brand of air conduction earphones is better? These four should not be missed
- iptables防火墙
- mongoDB快速入门
- HDU-2036-改革春风吹满地(多边形面积模板)
- Battle plague Cup -- strange shape
- [c language] - step by step to achieve minesweeping games
猜你喜欢

软件开发中常见模型

RayMarching实现体积光渲染

Lancher deployment practice
![[C language] dynamic memory management](/img/bb/2ec65b38e85f53269dc03d885d70f4.png)
[C language] dynamic memory management

Mysql-8.0.17-winx64 (additional Navicat) manual configuration version installation

单元测试框架Jest搭配TypeScript的安装与配置

技术分享 | 实战详解接口测试请求方式Get、post
![[C note] data type and storage](/img/3d/6b7a848dff5a8c0ccd0a54c19bce46.png)
[C note] data type and storage

Analysis of reentrantlock source code of AQS

Analysis of the semaphore source code of AQS
随机推荐
mongo ssl 配置实战
MySQL主主
JS逆向100题——第1题
什么是线性表?
elastic常用高频命令
Feignclient @requestmapping parameter setting and simple method setting of request header
How to calculate the size of structure, segment and Consortium (common body)
How to store floating point data in memory
archery数据库审核平台部署
测试面试题集锦(三)| 计算机网络和数据库篇(附答案)
RayMarching实现体积光渲染
链表中结点的插入和删除
Array solution script
@PostConstruct注解及用处示例
MySQL index optimization
HDU-5806-NanoApeLovesSequenceⅡ(尺取法)
Initializingbean interface and examples
网络——传输层(详细版)
It is recommended to wear air conduction earphones, which do not need to wear in ear
---栈&队列---