当前位置:网站首页>The use of scrapy crawler framework
The use of scrapy crawler framework
2022-08-01 09:26:00 【liyan_1013】
First let's download the scrapy package
Select Terminal to create a scrapy project (scrapy startproject project name)
Now that the project is created, we need to modify the settings.py file
Line 20, ROBOTSTXT_OBEY = False
Uncomment 65 to 67 (it was originally commented)
Uncomment 40 to 43 (it was originally commented), and add the request header inside,
Enter the spiders directory to create a crawler file
scrapy genspider crawler file name domain name
We crawl the course name and number of learners
Modify start_urls to the address to be crawled
Select all copy and paste
Find the parent tag to crawl all information, and the parent tag is unique (you can right-click, view the source code of the web page, Ctrl+f search whether there is only one attribute of the tag)
Through analysis, the information we want to crawl is in the li of ul
We use Xpath to get all li tags
Traverse il to locate the content to be crawled
边栏推荐
- Is the real database data of TiDB stored in kv and pd?
- Delphi MDI appliction documents maximize display, remove buttons such as maximize and minimize
- HoloView -- Tabular Datasets
- 最新的Cesium和Three的整合方法(附完整代码)
- pytest interface automation testing framework | pass in parameter values in the form of function return values
- Shell: Conditional test action
- SAP ABAP ALV+SMARTFORS 表分页 报表打印程序
- 灵魂发问:MySQL是如何解决幻读的?
- 基于tika实现对文件类型进行判断
- Pytest | skip module interface test automation framework
猜你喜欢
随机推荐
Graduation thesis writing skills
XX市消防救援指挥中心实战指挥平台多链路聚合解决方案实例
pytest interface automation testing framework | parametrize source code analysis
网络个各种协议
Redis 3.2.3 crashed by signal: 11 服务宕机问题排查
在GBase 8c数据库后台,使用什么样的命令来对gtm、dn节点进行主备切换的操作
Microsoft Azure & NVIDIA IoT 开发者季 I|Azure IoT & NVIDIA Jetson 开发基础
杰理AD14N/AD15N---串口中断问题
opencv创建窗口—cv.namedWindow()
安装GBase 8c数据库的时候,报错显示“Resource,如何解决?
AC与瘦AP的WLAN组网实验
The soul asks: How does MySQL solve phantom reads?
Explain / Desc execution plan analysis
Prime Ring Problem(素数环问题)
巧妙利用unbuffer实时写入
网络基础学习
Get the Token from the revised version of Qubutu Bed
SkiaSharp 之 WPF 自绘 五环弹动球(案例版)
WLAN networking experiment of AC and thin AP
Analysis of High Availability Solution Based on MySql, Redis, Mq, ES