当前位置:网站首页>Use the scrapy to climb to save data to mysql to prevent repetition
Use the scrapy to climb to save data to mysql to prevent repetition
2022-08-02 09:21:00 【51CTO】
1.环境建立
1.使用xmapp安装php, mysql ,phpmyadmin
2.安装python3,pip
3.安装pymysql
3.(windows 略)我这边是mac,安装brew,用brew 安装scrapy
2.整个流程
1. 创建数据库和数据库表,准备保存
2.write crawler targetURL,进行网络请求
3.Process the crawl return data,得到具体数据
4.For specific data saved to the database
2.1.创建数据库
First create a database called scrapy,然后创建一个表article,我们这里给body加了唯一索引,防止重复插入数据
It's like this after it's done.
2.2 Let's first look at the structure of the entire crawler project
quotes_spider.py是核心,Responsible for processing network requests and content,Then throw the sorted contentpipelines进行具体处理,保存到数据库中,This will not affect the speed.
其他的看 图说明
2.2 write crawler targetURL,进行网络请求
start_requests Is to write the specific to climbURL
parseIt is the core where the returned data is processed,然后以item的形式抛出,Next, define the next content to crawl
2.3 items
2.4 pipelines
2.5 配置
边栏推荐
- Rust from entry to master 03-helloworld
- 剑指offer专项突击版第17天
- Scala类型转换
- Re22:读论文 HetSANN An Attention-based Graph Neural Network for Heterogeneous Structural Learning
- What is the function of the import command of the page directive in JSP?
- The use of thread pool and analysis of ThreadPoolExecutor source code
- shell脚本
- 查看变量的数据格式
- Pycharm (1) the basic use of tutorial
- 百战RHCE(第四十六战:运维工程师必会技-Ansible学习1-基础知识讲解)
猜你喜欢
随机推荐
UVM之sequence机制
AutoJs学习-实现科赫雪花
AI目标分割能力,无需绿幕即可实现快速视频抠图
查看变量的数据格式
The god-level Alibaba "high concurrency" tutorial "basic + actual combat + source code + interview + architecture"
mysql连接池的实现
cococreator dynamically set sprite
leetcode 62. Unique Paths(独特的路径)
leetcode:81. 搜索旋转排序数组 II
Jenkins--基础--07--Blue Ocean
大厂外包,值得拥有吗?
恋爱十不要
练习40,小蓝的旅行【最短路】
node制作一个视频帧长图生成器
不用Swagger,那我用啥?
二分类和多分类
利用minlm比较句子之间的相似度
XML简介
【微信小程序】本地服务页面案例实现
四字节的float比八字结的long范围大???











