当前位置:网站首页>三国演义小说
三国演义小说
2022-08-02 08:35:00 【赵颂@】
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
#爬取三国演义小说所有章节标题和章节内容 https://www.shicimingju.com/book/sanguoyanyi.html
if __name__ =='__main__':
headers={
"User-Agent":UserAgent().chrome
}
get_url='https://www.shicimingju.com/book/sanguoyanyi.html'
#发起请求,获取响应
page_text=requests.get(url=get_url,headers=headers).text.encode('ISO-8859-1')
#在首页中解析出章节标题和章节内容
#1. 实例化BeautifulSoup对象,将html数据加载到该对象中
soup=BeautifulSoup(page_text,'lxml')
# print(soup)
#2.解析章节标题和详情页的url
list_data=soup.select('.book-mulu > ul > li')
fp=open('./sanguo.text','w',encoding='utf-8')
for i in list_data:
title=i.a.text
detail_url='https://www.shicimingju.com/'+ i.a['href']
#对详情页的url发送请求,
detail_text=requests.get(url=detail_url,headers=headers).text.encode('ISO-8859-1')
detail_soup=BeautifulSoup(detail_text,'lxml')
#获取章节内容
content=detail_soup.find('div',class_='chapter_content').text
#持久化存储
fp.write(title+":"+content+"\n")
print(title,'下载完成')
边栏推荐
猜你喜欢
随机推荐
XML简介
TiFlash 存储层概览
PyQt5(一) PyQt5安装及配置,从文件夹读取图片并显示,模拟生成素描图像
Redisson报异常attempt to unlock lock, not locked by current thread by node id解决方案
PyCharm使用教程(较详细,图+文)
【C】关于柔性数组.简要的谈谈柔性数组
day_05模块
软件测试技术之解析图灵测试离我们还有多远
Flink 监控指南 被动拉取 Rest API
抓包工具Charles修改Response步骤
轴流式水轮机隐私政策
RestTemlate源码分析及工具类设计
prometheus monitoring mysql_galera cluster
Mysql Mac版下载安装教程
R language plotly visualization: plotly visualizes the scatter plot of the actual value of the regression model and the predicted value of the regression, analyzes the prediction performance of the re
在 QT Creator 上配置 opencv 环境的一些认识和注意点
Biotin-EDA|CAS:111790-37-5| Ethylenediamine biotin
【Flink 问题】Flink 如何提交轻量jar包 依赖该如何存放 会遇到哪些问题
Business Intelligence Platform BI Business Intelligence Analysis Platform How to Choose the Right Business Intelligence Platform BI
houdini 求出曲线的法向 切线以及副法线









