当前位置:网站首页>The use of scrapy crawler framework
The use of scrapy crawler framework
2022-08-01 09:26:00 【liyan_1013】
First let's download the scrapy package

Select Terminal to create a scrapy project (scrapy startproject project name)

Now that the project is created, we need to modify the settings.py file
Line 20, ROBOTSTXT_OBEY = False

Uncomment 65 to 67 (it was originally commented)

Uncomment 40 to 43 (it was originally commented), and add the request header inside,


Enter the spiders directory to create a crawler file
scrapy genspider crawler file name domain name

We crawl the course name and number of learners

Modify start_urls to the address to be crawled

Select all copy and paste

Find the parent tag to crawl all information, and the parent tag is unique (you can right-click, view the source code of the web page, Ctrl+f search whether there is only one attribute of the tag)


Through analysis, the information we want to crawl is in the li of ul

We use Xpath to get all li tags

Traverse il to locate the content to be crawled






边栏推荐
- SaaS安全认证综合指南
- 【面试:并发篇39:多线程:线程池】ThreadPoolExecutor类-提交、停止
- 高级驾驶辅助系统ADAS简介
- Holoview--Introduction
- navicat mysql 内存占用过高,被强制关闭
- Manual upgrade and optimization tutorial of Lsky Pro Enterprise Edition
- SkiaSharp 之 WPF 自绘 五环弹动球(案例版)
- WLAN networking experiment of AC and thin AP
- sqlserver怎么查询一张表中同人员的交叉日期
- network basic learning
猜你喜欢
随机推荐
【数据集】各类绝缘子、鸟巢及防震锤数据集汇总
GBase 8c中怎么查询数据库配置参数,例如datestyle
372. 超级次方
Leicester Weekly 304 6135. The longest ring in the picture Inward base ring tree
net stop/start mysql80 拒绝访问
Custom IP used in PCIE
力扣周赛304 6135. 图中的最长环 内向基环树
Microsoft Azure & NVIDIA IoT 开发者季 I|Azure IoT & NVIDIA Jetson 开发基础
【杭电多校第四场 B题】最短路图+缩点dp
常见的API安全缺陷有哪些?
灵魂发问:MySQL是如何解决幻读的?
将Servlet项目改为SSM项目
安装GBase 8c数据库的时候,报错显示“Resource,如何解决?
SkiaSharp's WPF self-painted five-ring bouncing ball (case version)
获取页面数据的方法
What do the values 1, 2, and 3 in nodetype mean?
leetcode-6133:分组的最大数量
Pod environment variables and initContainer
pytest interface automation testing framework | skip test classes
将aof文件转换为命令waoffle安装和使用








