当前位置:网站首页>Scripy learning
Scripy learning
2022-07-03 06:14:00 【Black~boy】
scrapy introduction
1.scrapy brief introduction
Scrapy It's based on Twisted The asynchronous processing framework of , Is pure python Implementation of the crawler framework . You can quickly grab data with a small amount of code .
Scrapy Is applicable to Python A quick 、 High level screen grabs and web Grabbing framework , Used to grab web Site and extract structured data from the page .Scrapy A wide range of uses , Can be used for data mining 、 Monitoring and automated testing . Anyone can modify it conveniently according to their needs . It also provides a base class for many types of reptiles , Such as BaseSpider、sitemap Reptiles, etc , The latest version offers web2.0 Reptile support .
2.Scrapy Framework and function
2.1 Frame diagram

2.2 Function of each part
| name | function |
|---|---|
| Scrapy Engine(Scrapy engine ) | Scrapy The engine is the core of the framework , be responsible for Spider、ItemPipeline、Downloader、Scheduler Intermediate communication , The signal 、 Data transfer, etc |
| Spiders( Reptiles ) | Be responsible for handling all messages sent by the engine Response, Extract data from , extract URl, And submit to the engine |
| Scheduler( Scheduler ) | Responsible for receiving the engine sent Request request |
| Downloader( Downloader ) | Responsible for downloading Scrapy Engine( engine ) All sent Requests request , And get it Responses Return to Scrapy Engine( engine ), Engine to Spider To deal with it . |
| Item Pipeline( Project pipeline ) | Be responsible for the data sent by the engine , And do post-processing ( Data analysis , Data storage, etc ) |
3.Scrapy install
3.1 Installation command
windows Next :
pip install Scrapy

Check whether the installation is successful :
scrapy startProject Project name


You can start your first spider with:
First step : cd myspider
The second step :scrapy genspider example( Reptile name ) example.com( The website you want to crawl )

For website xxxx Instead of 
After writing the code : Execute the crawler
scrapy crawl Reptile name
边栏推荐
- Cesium entity (entities) entity deletion method
- PHP用ENV获取文件参数的时候拿到的是字符串
- 88. 合并两个有序数组
- Kubernetes notes (III) controller
- 项目总结--2(Jsoup的基本使用)
- MySQL带二进制的库表导出导入
- 多线程与高并发(7)——从ReentrantLock到AQS源码(两万字大章,一篇理解AQS)
- When PHP uses env to obtain file parameters, it gets strings
- Svn branch management
- The win7 computer can't start. Turn the CPU fan and stop it
猜你喜欢

Understand expectations (mean / estimate) and variances

Project summary --01 (addition, deletion, modification and query of interfaces; use of multithreading)

Skywalking8.7 source code analysis (I): agent startup process, agent configuration loading process, custom class loader agentclassloader, plug-in definition system, plug-in loading

SQL实现将多行记录合并成一行

Skywalking8.7 source code analysis (II): Custom agent, service loading, witness component version identification, transform workflow

Mysql

智牛股--03

Cesium 点击获三维坐标(经纬度高程)

SVN分支管理

Kubesphere - build Nacos cluster
随机推荐
【C#/VB.NET】 将PDF转为SVG/Image, SVG/Image转PDF
Analysis of Clickhouse mergetree principle
Jedis source code analysis (II): jediscluster module source code analysis
Skywalking8.7 source code analysis (I): agent startup process, agent configuration loading process, custom class loader agentclassloader, plug-in definition system, plug-in loading
Mysql5.7 group by error
The programmer shell with a monthly salary of more than 10000 becomes a grammar skill for secondary school. Do you often use it!!!
Loss function in pytorch multi classification
Merge and migrate data from small data volume, sub database and sub table Mysql to tidb
Leetcode solution - 02 Add Two Numbers
Luogu problem list: [mathematics 1] basic mathematics problems
Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster
YOLOV2学习与总结
项目总结--04
使用conda创建自己的深度学习环境
代码管理工具
Kubesphere - build Nacos cluster
表达式的动态解析和计算,Flee用起来真香
[set theory] equivalence relation (concept of equivalence relation | examples of equivalence relation | equivalence relation and closure)
项目总结--2(Jsoup的基本使用)
认识弹性盒子flex