当前位置:网站首页>Baidu Encyclopedia data crawling and content classification and recognition
Baidu Encyclopedia data crawling and content classification and recognition
2022-07-06 10:26:00 【CHQIUU】
List of articles
Preface
Recently, I am learning the related content of knowledge map , You need to crawl some structured data . Here is how to crawl the data of Baidu Encyclopedia and extract the effective data code .
One 、 Analyze the page structure
The page can be divided into 5 Regions , As shown in the following illustration ( polypropylene Page structure of introduction ).
https://baike.baidu.com/wikitag/taglist?tagId=76613
Two 、 Use steps
1. Import and stock in
The code is as follows ( Example ):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
2. Read in the data
The code is as follows ( Example ):
data = pd.read_csv(
'https://labfile.oss.aliyuncs.com/courses/1283/adult.data.csv')
print(data.head())
It's used here url Data requested by the network .
边栏推荐
- 简单解决phpjm加密问题 免费phpjm解密工具
- 颜值爆表,推荐两款JSON可视化工具,配合Swagger使用真香
- Time complexity (see which sentence is executed the most times)
- Sed text processing
- What is the difference between TCP and UDP?
- MySQL combat optimization expert 03 uses a data update process to preliminarily understand the architecture design of InnoDB storage engine
- History of object recognition
- Docker MySQL solves time zone problems
- In fact, the implementation of current limiting is not complicated
- text 文本数据增强方法 data argumentation
猜你喜欢
Target detection -- yolov2 paper intensive reading
再有人问你数据库缓存一致性的问题,直接把这篇文章发给他
Nanny hand-in-hand teaches you to write Gobang in C language
Emotional classification of 1.6 million comments on LSTM based on pytoch
Download and installation of QT Creator
Redis集群方案应该怎么做?都有哪些方案?
Implement sending post request with form data parameter
高并发系统的限流方案研究,其实限流实现也不复杂
解决在window中远程连接Linux下的MySQL
MySQL实战优化高手02 为了执行SQL语句,你知道MySQL用了什么样的架构设计吗?
随机推荐
Typescript入门教程(B站黑马程序员)
15 医疗挂号系统_【预约挂号】
MySQL实战优化高手03 用一次数据更新流程,初步了解InnoDB存储引擎的架构设计
MySQL combat optimization expert 02 in order to execute SQL statements, do you know what kind of architectural design MySQL uses?
Docker MySQL solves time zone problems
Jar runs with error no main manifest attribute
What should the redis cluster solution do? What are the plans?
Retention policy of RMAN backup
实现微信公众号H5消息推送的超级详细步骤
Anaconda3 安装cv2
C miscellaneous lecture continued
How to make shell script executable
[after reading the series of must know] one of how to realize app automation without programming (preparation)
MNIST implementation using pytoch in jupyter notebook
The appearance is popular. Two JSON visualization tools are recommended for use with swagger. It's really fragrant
The governor of New Jersey signed seven bills to improve gun safety
在jupyter NoteBook使用Pytorch进行MNIST实现
Record the first JDBC
16 medical registration system_ [order by appointment]
MySQL combat optimization expert 10 production experience: how to deploy visual reporting system for database monitoring system?