当前位置:网站首页>Chapter contents of the romance of the Three Kingdoms
Chapter contents of the romance of the Three Kingdoms
2022-07-29 08:00:00 【Zhao [email protected]】
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
# Crawl through the titles and contents of all chapters of the romance novels of the Three Kingdoms https://www.shicimingju.com/book/sanguoyanyi.html
if __name__ =='__main__':
headers={
"User-Agent":UserAgent().chrome
}
get_url='https://www.shicimingju.com/book/sanguoyanyi.html'
# Initiate request , Get a response
page_text=requests.get(url=get_url,headers=headers).text.encode('ISO-8859-1')
# Analyze the chapter title and chapter content in the home page
#1. Instantiation BeautifulSoup object , take html Data is loaded into the object
soup=BeautifulSoup(page_text,'lxml')
# print(soup)
#2. Analyze the chapter title and details page url
list_data=soup.select('.book-mulu > ul > li')
fp=open('./sanguo.text','w',encoding='utf-8')
for i in list_data:
title=i.a.text
detail_url='https://www.shicimingju.com/'+ i.a['href']
# Of the details page url Send a request ,
detail_text=requests.get(url=detail_url,headers=headers).text.encode('ISO-8859-1')
detail_soup=BeautifulSoup(detail_text,'lxml')
# Get chapter content
content=detail_soup.find('div',class_='chapter_content').text
# Persistent storage
fp.write(title+":"+content+"\n")
print(title,' Download complete ')
版权声明
本文为[Zhao [email protected]]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/210/202207290520357516.html
边栏推荐
- State machine DP 3D
- 佳木斯市场监管局开展防疫防虫害专题食品安全网络培训
- Cs61abc sharing session (VI) detailed explanation of program input and output - standard input and output, file, device, EOF, command line parameters
- Technology sharing | quick intercom integrated dispatching system
- Very practical shell and shellcheck
- [WPF] realize language switching through dynamic / static resources
- Effective learning of medical image segmentation annotation based on noise pseudo tags and adversarial learning
- In JS, 0 means false, and non-0 means true
- Cross domain problems when downloading webapi interface files
- Zero technology is deeply involved in the development of privacy computing financial scenario standards of the ICT Institute
猜你喜欢
MySQL uses date_ FORMAT(date,'%Y-%m')
Some thoughts on growing into an architect
Compare three clock circuit schemes of single chip microcomputer
Excellent urban design ~ good! Design # visualization radio station will be broadcast soon
Day 014 二维数组练习
2022 Shenzhen Cup Title A: get rid of "scream effect" and "echo room effect" and get out of the "information cocoon room"
Jiamusi Market Supervision Bureau carried out special food safety network training on epidemic and insect prevention
[cryoelectron microscope | paper reading] interpretation of sub fault average m software: multi particle cryo EM refining with M
How to connect VMware virtual machine to external network under physical machine win10 system
Up sampling deconvolution operation
随机推荐
佳木斯市场监管局开展防疫防虫害专题食品安全网络培训
智慧城市的应用挑战,昇腾AI给出了新解法
Actual measurement of boot and pH pins of buck circuit
Joseph Ring problem
Implementation of simple matcap+fresnel shader in unity
Taiyuan bus route crawling
20 hacker artifacts
Phased learning about the entry-level application of SQL Server statements - necessary for job hunting (I)
After the access database introduces DataGridView data, an error is displayed
C language data type
Strongly connected component
Cross domain problems when downloading webapi interface files
Sort out the two NFT pricing paradigms and four solutions on the market
Jump from mapper interface to mapping file XML in idea
Pytest set (7) - parameterization
Go 事,如何成为一个Gopher ,并在7天找到 Go 语言相关工作,第1篇
The computer system has no standard tcp/ip port processing operations
NLP introduction + practice: Chapter 5: using the API in pytorch to realize linear regression
Up sampling deconvolution operation
State machine DP (simple version)