当前位置:网站首页>The use of scrapy crawler framework
The use of scrapy crawler framework
2022-08-01 09:26:00 【liyan_1013】
First let's download the scrapy package

Select Terminal to create a scrapy project (scrapy startproject project name)

Now that the project is created, we need to modify the settings.py file
Line 20, ROBOTSTXT_OBEY = False

Uncomment 65 to 67 (it was originally commented)

Uncomment 40 to 43 (it was originally commented), and add the request header inside,


Enter the spiders directory to create a crawler file
scrapy genspider crawler file name domain name

We crawl the course name and number of learners

Modify start_urls to the address to be crawled

Select all copy and paste

Find the parent tag to crawl all information, and the parent tag is unique (you can right-click, view the source code of the web page, Ctrl+f search whether there is only one attribute of the tag)


Through analysis, the information we want to crawl is in the li of ul

We use Xpath to get all li tags

Traverse il to locate the content to be crawled






边栏推荐
- Pod environment variables and initContainer
- 改版去不图床 Token 的获取
- 最新的Cesium和Three的整合方法(附完整代码)
- How to ensure the consistency of database and cache data?
- pytest interface automation testing framework | parametrize source code analysis
- leetcode-6132: Make all elements in array equal to zero
- CTO强烈禁止使用Calendar,那用啥?
- Idea 常用插件
- How does UXDB return the number of records for all tables in the current database?
- 基于MySql,Redis,Mq,ES的高可用方案解析
猜你喜欢
随机推荐
Is the real database data of TiDB stored in kv and pd?
HoloView--Customization
实验。。。。
leetcode-6135:图中的最长环
如何保证数据库与缓存数据一致性?
云原生FAQ
最新的Cesium和Three的整合方法(附完整代码)
codeforces每日5题(均1600)-第二十七天
朴素贝叶斯--学习笔记--基本原理及代码实现
net stop/start mysql80 拒绝访问
scrapy爬虫框架的使用
Explain / Desc 执行计划分析
leetcode-6132: Make all elements in array equal to zero
企业数据虚拟化综合指南
JVM 运行时数据区与JMM 内存模型详解
自定义IP在PCIE中使用
走进音视频的世界——mp3封装格式
Shell executes SQL to send emails
Parsing MySQL Databases: "SQL Optimization" vs. "Index Optimization"
笔记。。。。








![ASP.NET Core 6框架揭秘实例演示[30]:利用路由开发REST API](/img/b3/0167c22f14b97eb0206696495af7b5.png)
![[Beyond programming] When the fig leaf is lifted, when people begin to accept everything](/img/e1/ff8d416c99e1f370d73b9520654ddf.jpg)