当前位置:网站首页>Beginner crawler - biqu Pavilion crawler
Beginner crawler - biqu Pavilion crawler
2022-07-02 04:36:00 【weixin_ forty-three million four hundred and forty-six thousand】
import requests
from lxml import etree
base_url=input(“ Please enter a novel url:”) # Like spring feast url by https://www.xbiquge.la/20/20671/
headers={
“User-Agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0”
}
response=requests.get(base_url,headers=headers)
r=response.content.decode()
r1=etree.HTML # String format html The fragment is parsed into html file
r2=r1.xpath(‘//[@id=“info”]/h1/text()‘) # Get the name of the novel
r3=r1.xpath(’//[@id=“list”]/dl/dd/a/text()’) # Get the title of each chapter of the novel
r4=r1.xpath(‘//[@id=“list”]/dl/dd/a/@href’) # Get links to each chapter
r5=r1.xpath('//[@id=“info”]/p[1]/text()’) # Get the author's name
r6=r1.xpath(‘//[@id=“intro”]/p[2]/text()‘) # Get the copy
r7=’‘.join(r5).split(’:‘)[1] # Get the author's name
chapter_list=[]
for i in r4:
url=“https://www.xbiquge.la”+i
chapter_list.append(url) # Synthesize each chapter url
for i in r2:
title=’{}by{}.txt’.format(i,r7) # When getting and saving txt Name
content_list=[]
with open(title,“a”,encoding=“utf-8”) as f:
f.writelines(r6)
f.write(‘\n’)
for (x,y) in zip(chapter_list,r3):
response2=requests.get(x)
res=response2.content.decode()
res1=etree.HTML(res)
res3=res1.xpath('//[@id=“content”]/text()’) # Get the content of each chapter
f.writelines(y)# Write the title of each chapter
f.write(‘\n’)
f.writelines(res3)# Write the content of each chapter
f.write(‘\n’)
print(“{} Collection completed , common {} chapter ”.format(title,len(chapter_list)))
边栏推荐
- Alibaba cloud polkit pkexec local rights lifting vulnerability
- Exposure X8标准版图片后期滤镜PS、LR等软件的插件
- cookie、session、tooken
- Its appearance makes competitors tremble. Interpretation of Sony vision-s 02 products
- Handling of inconsistency between cursor and hinttext position in shutter textfield
- Binary tree problem solving (1)
- Mysql表insert中文变?号的问题解决办法
- What are the rules and trading hours of agricultural futures contracts? How much is the handling fee deposit?
- 二叉树解题(一)
- Pytoch --- use pytoch to predict birds
猜你喜欢
Play with concurrency: draw a thread state transition diagram
FAQ | FAQ for building applications for large screen devices
The solution to the complexity brought by lambda expression
Leetcode merge sort linked list
Several methods of capturing packets under CS framework
win10 磁盘管理 压缩卷 无法启动问题
MySQL advanced SQL statement 2
6月书讯 | 9本新书上市,阵容强大,闭眼入!
Alibaba cloud polkit pkexec local rights lifting vulnerability
[understand one article] FD_ Use of set
随机推荐
IDEA xml中sql没提示,且方言设置没用。
office_ Delete the last page of word (the seemingly blank page)
win10 磁盘管理 压缩卷 无法启动问题
geotrust ov多域名ssl證書一年兩千一百元包含幾個域名?
Its appearance makes competitors tremble. Interpretation of Sony vision-s 02 products
Use a mask to restrict the input of the qlineedit control
okcc为什么云呼叫中心比传统呼叫中心更好?
Spring moves are coming. Watch the gods fight
Learn AI safety monitoring project from zero [attach detailed code]
FAQ | FAQ for building applications for large screen devices
Let genuine SMS pressure measurement open source code
cookie、session、tooken
Unit testing classic three questions: what, why, and how?
【提高课】ST表解决区间最值问题【2】
Microsoft Research Institute's new book "Fundamentals of data science", 479 Pages pdf
Yyds dry inventory compiler and compiler tools
Exposure X8标准版图片后期滤镜PS、LR等软件的插件
Why can't you remember when reading? Why can't you remember- My technology learning methodology
geotrust ov多域名ssl证书一年两千一百元包含几个域名?
Unity particle Foundation