当前位置:网站首页>The use of scrapy crawler framework
The use of scrapy crawler framework
2022-08-01 09:26:00 【liyan_1013】
First let's download the scrapy package
Select Terminal to create a scrapy project (scrapy startproject project name)
Now that the project is created, we need to modify the settings.py file
Line 20, ROBOTSTXT_OBEY = False
Uncomment 65 to 67 (it was originally commented)
Uncomment 40 to 43 (it was originally commented), and add the request header inside,
Enter the spiders directory to create a crawler file
scrapy genspider crawler file name domain name
We crawl the course name and number of learners
Modify start_urls to the address to be crawled
Select all copy and paste
Find the parent tag to crawl all information, and the parent tag is unique (you can right-click, view the source code of the web page, Ctrl+f search whether there is only one attribute of the tag)
Through analysis, the information we want to crawl is in the li of ul
We use Xpath to get all li tags
Traverse il to locate the content to be crawled
边栏推荐
- navicat mysql 内存占用过高,被强制关闭
- Manual upgrade and optimization tutorial of Lsky Pro Enterprise Edition
- Delphi MDI appliction documents maximize display, remove buttons such as maximize and minimize
- ASP.NET Core 6框架揭秘实例演示[30]:利用路由开发REST API
- How to get page data
- C语言中编译时出现警告C4013(C语言不加函数原型产生的潜在错误)
- 杨辉三角(c语言实现)
- 《时代》杂志:元宇宙时代将改变世界
- Naive Bayes--Study Notes--Basic Principles and Code Implementation
- 179. 最大数
猜你喜欢
leetcode-6132: Make all elements in array equal to zero
自定义IP在PCIE中使用
Chapters 6 and 7 of Huawei Deep Learning Course
Naive Bayes--Study Notes--Basic Principles and Code Implementation
力扣周赛304 6135. 图中的最长环 内向基环树
【数据集】各类绝缘子、鸟巢及防震锤数据集汇总
解析MySQL数据库:“SQL优化”与“索引优化”
ASP.NET Core 6框架揭秘实例演示[30]:利用路由开发REST API
HoloView--live data
用OpenCV的边缘检测
随机推荐
Chapters 6 and 7 of Huawei Deep Learning Course
WLAN networking experiment of AC and thin AP
leetcode-6134:找到离给定两个节点最近的节点
Leetcode - 6135: the longest part of the figure
PHP获取时间戳后写数据库的一个问题
【Untitled】
Comprehensive experiment BGP
热修复技术可谓是百花齐放
Gethostbyname \ getaddrinfo DNS domain name IP address is not safe
Mysql数据库的部署以及初始化步骤
SkiaSharp 之 WPF 自绘 五环弹动球(案例版)
codeforces每日5题(均1600)-第二十七天
Prime Ring Problem(素数环问题)
XX市消防救援指挥中心实战指挥平台多链路聚合解决方案实例
云原生FAQ
【应用推荐】常见资源管理器整理,含个人使用体验和产品选型推荐
最新的Cesium和Three的整合方法(附完整代码)
Intensive reading of ACmix papers, and analysis of its model structure
【面试:并发篇39:多线程:线程池】ThreadPoolExecutor类-提交、停止
HoloView 在 jyputer lab/notebook 不显示总结