当前位置:网站首页>Chapter contents of the romance of the Three Kingdoms
Chapter contents of the romance of the Three Kingdoms
2022-07-29 08:00:00 【Zhao [email protected]】
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
# Crawl through the titles and contents of all chapters of the romance novels of the Three Kingdoms https://www.shicimingju.com/book/sanguoyanyi.html
if __name__ =='__main__':
headers={
"User-Agent":UserAgent().chrome
}
get_url='https://www.shicimingju.com/book/sanguoyanyi.html'
# Initiate request , Get a response
page_text=requests.get(url=get_url,headers=headers).text.encode('ISO-8859-1')
# Analyze the chapter title and chapter content in the home page
#1. Instantiation BeautifulSoup object , take html Data is loaded into the object
soup=BeautifulSoup(page_text,'lxml')
# print(soup)
#2. Analyze the chapter title and details page url
list_data=soup.select('.book-mulu > ul > li')
fp=open('./sanguo.text','w',encoding='utf-8')
for i in list_data:
title=i.a.text
detail_url='https://www.shicimingju.com/'+ i.a['href']
# Of the details page url Send a request ,
detail_text=requests.get(url=detail_url,headers=headers).text.encode('ISO-8859-1')
detail_soup=BeautifulSoup(detail_text,'lxml')
# Get chapter content
content=detail_soup.find('div',class_='chapter_content').text
# Persistent storage
fp.write(title+":"+content+"\n")
print(title,' Download complete ')
版权声明
本文为[Zhao [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/210/202207290520357516.html
边栏推荐
- [memo] summary of the reasons why SSH failed? Remember to come next time.
- How to connect VMware virtual machine to external network under physical machine win10 system
- Sqlmap (SQL injection automation tool)
- C language data type
- [flask introduction series] installation and configuration of flask Sqlalchemy
- String class
- [note] the art of research (understand the importance of the problem)
- Qt/PyQt 窗口类型与窗口标志
- Vmstat memory consumption query
- Cross domain problems when downloading webapi interface files
猜你喜欢

Jump from mapper interface to mapping file XML in idea

Cs61abc sharing session (VI) detailed explanation of program input and output - standard input and output, file, device, EOF, command line parameters

Up sampling deconvolution operation

Ionicons icon Encyclopedia

《nlp入门+实战:第五章:使用pytorch中的API实现线性回归》

CentOS deploy PostgreSQL 13
![[introduction to cryoelectron microscopy] Caltech open class course notes part 3:image formation](/img/7b/cbd9e3b6d72155613e53ffdd06c5cd.png)
[introduction to cryoelectron microscopy] Caltech open class course notes part 3:image formation

10 practical uses of NFT

@Use of jsonserialize annotation

IonIcons图标大全
随机推荐
Output 1234 three digits without repetition
10 common software architecture modes
You study, I reward, 21 day learning challenge | waiting for you to fight
准备esp32环境
[paper reading] tomoalign: a novel approach to correcting sample motion and 3D CTF in cryoet
Cs61abc sharing session (VI) detailed explanation of program input and output - standard input and output, file, device, EOF, command line parameters
postman接口测试|js脚本之阻塞休眠和非阻塞休眠
[密码学实验] 0x00 安装NTL库
V-Ray 5 acescg workflow settings
Jiamusi Market Supervision Bureau carried out special food safety network training on epidemic and insect prevention
Vmstat memory consumption query
"Swiss Army Knife" -nc in network tools
在一个sql文件中,上面定义一个测试表及数据,下面可以select* from 测试表
Effective learning of medical image segmentation annotation based on noise pseudo tags and adversarial learning
Detailed explanation of two modes of FTP
Unity beginner 3 - enemy movement control and setting of blood loss area (2D)
Phased learning about the entry-level application of SQL Server statements - necessary for job hunting (I)
C language data type
[introduction to cryoelectron microscopy] Caltech open class course notes part 3:image formation
FLink CDC 的mysql connector中,mysql的字段是varbinary, 官方