当前位置:网站首页>Scripy learning
Scripy learning
2022-07-03 06:14:00 【Black~boy】
scrapy introduction
1.scrapy brief introduction
Scrapy It's based on Twisted The asynchronous processing framework of , Is pure python Implementation of the crawler framework . You can quickly grab data with a small amount of code .
Scrapy Is applicable to Python A quick 、 High level screen grabs and web Grabbing framework , Used to grab web Site and extract structured data from the page .Scrapy A wide range of uses , Can be used for data mining 、 Monitoring and automated testing . Anyone can modify it conveniently according to their needs . It also provides a base class for many types of reptiles , Such as BaseSpider、sitemap Reptiles, etc , The latest version offers web2.0 Reptile support .
2.Scrapy Framework and function
2.1 Frame diagram
2.2 Function of each part
name | function |
---|---|
Scrapy Engine(Scrapy engine ) | Scrapy The engine is the core of the framework , be responsible for Spider、ItemPipeline、Downloader、Scheduler Intermediate communication , The signal 、 Data transfer, etc |
Spiders( Reptiles ) | Be responsible for handling all messages sent by the engine Response, Extract data from , extract URl, And submit to the engine |
Scheduler( Scheduler ) | Responsible for receiving the engine sent Request request |
Downloader( Downloader ) | Responsible for downloading Scrapy Engine( engine ) All sent Requests request , And get it Responses Return to Scrapy Engine( engine ), Engine to Spider To deal with it . |
Item Pipeline( Project pipeline ) | Be responsible for the data sent by the engine , And do post-processing ( Data analysis , Data storage, etc ) |
3.Scrapy install
3.1 Installation command
windows Next :
pip install Scrapy
Check whether the installation is successful :
scrapy startProject Project name
You can start your first spider with:
First step : cd myspider
The second step :scrapy genspider example( Reptile name ) example.com( The website you want to crawl )
For website xxxx Instead of
After writing the code : Execute the crawler
scrapy crawl Reptile name
边栏推荐
- Es remote cluster configuration and cross cluster search
- Redis cluster creation, capacity expansion and capacity reduction
- The win7 computer can't start. Turn the CPU fan and stop it
- arcgis创建postgre企业级数据库
- Multithreading and high concurrency (7) -- from reentrantlock to AQS source code (20000 words, one understanding AQS)
- The most responsible command line beautification tutorial
- Pytorch builds the simplest version of neural network
- conda和pip的区别
- 1. 兩數之和
- Clickhouse learning notes (I): Clickhouse installation, data type, table engine, SQL operation
猜你喜欢
Oauth2.0 - user defined mode authorization - SMS verification code login
Reinstalling the system displays "setup is applying system settings" stationary
YOLOV3学习笔记
Jedis source code analysis (I): jedis introduction, jedis module source code analysis
Phpstudy setting items can be accessed by other computers on the LAN
輕松上手Fluentd,結合 Rainbond 插件市場,日志收集更快捷
有意思的鼠標指針交互探究
phpstudy设置项目可以由局域网的其他电脑可以访问
Zhiniu stock project -- 05
Code generator - single table query crud - generator
随机推荐
ODL framework project construction trial -demo
Migrate data from Amazon aurora to tidb
Tabbar settings
GPS坐标转百度地图坐标的方法
IE browser flash back, automatically open edge browser
认识弹性盒子flex
Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster
【C#/VB.NET】 将PDF转为SVG/Image, SVG/Image转PDF
arcgis创建postgre企业级数据库
Skywalking8.7 source code analysis (I): agent startup process, agent configuration loading process, custom class loader agentclassloader, plug-in definition system, plug-in loading
MySQL带二进制的库表导出导入
Common interview questions
Detailed explanation of findloadedclass
Virtual memory technology sharing
phpstudy设置项目可以由局域网的其他电脑可以访问
Bernoulli distribution, binomial distribution and Poisson distribution, and the relationship between maximum likelihood (incomplete)
Leetcode problem solving summary, constantly updating!
Analysis of Clickhouse mergetree principle
深入解析kubernetes controller-runtime
Difference between shortest path and minimum spanning tree