当前位置:网站首页>Scripy learning
Scripy learning
2022-07-03 06:14:00 【Black~boy】
scrapy introduction
1.scrapy brief introduction
Scrapy It's based on Twisted The asynchronous processing framework of , Is pure python Implementation of the crawler framework . You can quickly grab data with a small amount of code .
Scrapy Is applicable to Python A quick 、 High level screen grabs and web Grabbing framework , Used to grab web Site and extract structured data from the page .Scrapy A wide range of uses , Can be used for data mining 、 Monitoring and automated testing . Anyone can modify it conveniently according to their needs . It also provides a base class for many types of reptiles , Such as BaseSpider、sitemap Reptiles, etc , The latest version offers web2.0 Reptile support .
2.Scrapy Framework and function
2.1 Frame diagram
2.2 Function of each part
name | function |
---|---|
Scrapy Engine(Scrapy engine ) | Scrapy The engine is the core of the framework , be responsible for Spider、ItemPipeline、Downloader、Scheduler Intermediate communication , The signal 、 Data transfer, etc |
Spiders( Reptiles ) | Be responsible for handling all messages sent by the engine Response, Extract data from , extract URl, And submit to the engine |
Scheduler( Scheduler ) | Responsible for receiving the engine sent Request request |
Downloader( Downloader ) | Responsible for downloading Scrapy Engine( engine ) All sent Requests request , And get it Responses Return to Scrapy Engine( engine ), Engine to Spider To deal with it . |
Item Pipeline( Project pipeline ) | Be responsible for the data sent by the engine , And do post-processing ( Data analysis , Data storage, etc ) |
3.Scrapy install
3.1 Installation command
windows Next :
pip install Scrapy
Check whether the installation is successful :
scrapy startProject Project name
You can start your first spider with:
First step : cd myspider
The second step :scrapy genspider example( Reptile name ) example.com( The website you want to crawl )
For website xxxx Instead of
After writing the code : Execute the crawler
scrapy crawl Reptile name
边栏推荐
- PMP notes
- Mysql
- Oracle database synonym creation
- BeanDefinitionRegistryPostProcessor
- Reinstalling the system displays "setup is applying system settings" stationary
- Apifix installation
- 1. Sum of two numbers
- Jedis source code analysis (I): jedis introduction, jedis module source code analysis
- The mechanical hard disk is connected to the computer through USB and cannot be displayed
- Why should there be a firewall? This time xiaowai has something to say!!!
猜你喜欢
Cesium Click to obtain the longitude and latitude elevation coordinates (3D coordinates) of the model surface
Fluentd facile à utiliser avec le marché des plug - ins rainbond pour une collecte de journaux plus rapide
Pytorch dataloader implements minibatch (incomplete)
Zhiniu stock -- 03
Jedis source code analysis (I): jedis introduction, jedis module source code analysis
Fluentd is easy to use. Combined with the rainbow plug-in market, log collection is faster
Jedis source code analysis (II): jediscluster module source code analysis
轻松上手Fluentd,结合 Rainbond 插件市场,日志收集更快捷
In depth analysis of kubernetes controller runtime
Kubernetes notes (VI) kubernetes storage
随机推荐
MySQL帶二進制的庫錶導出導入
ruoyi接口权限校验
Mysql
ODL framework project construction trial -demo
Redis cluster creation, capacity expansion and capacity reduction
Why should there be a firewall? This time xiaowai has something to say!!!
Leetcode solution - 02 Add Two Numbers
Simple solution of small up main lottery in station B
Selenium ide installation recording and local project maintenance
[system design] proximity service
Pdf files can only print out the first page
Detailed explanation of contextclassloader
Advanced technology management - do you know the whole picture of growth?
Virtual memory technology sharing
JMeter performance automation test
Kubernetes notes (VII) kuberetes scheduling
Introduction to software engineering
Bio, NiO, AIO details
Reinstalling the system displays "setup is applying system settings" stationary
Disruptor learning notes: basic use, core concepts and principles