当前位置:网站首页>Beginner crawler - biqu Pavilion crawler
Beginner crawler - biqu Pavilion crawler
2022-07-02 04:36:00 【weixin_ forty-three million four hundred and forty-six thousand】
import requests
from lxml import etree
base_url=input(“ Please enter a novel url:”) # Like spring feast url by https://www.xbiquge.la/20/20671/
headers={
“User-Agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0”
}
response=requests.get(base_url,headers=headers)
r=response.content.decode()
r1=etree.HTML # String format html The fragment is parsed into html file
r2=r1.xpath(‘//[@id=“info”]/h1/text()‘) # Get the name of the novel
r3=r1.xpath(’//[@id=“list”]/dl/dd/a/text()’) # Get the title of each chapter of the novel
r4=r1.xpath(‘//[@id=“list”]/dl/dd/a/@href’) # Get links to each chapter
r5=r1.xpath('//[@id=“info”]/p[1]/text()’) # Get the author's name
r6=r1.xpath(‘//[@id=“intro”]/p[2]/text()‘) # Get the copy
r7=’‘.join(r5).split(’:‘)[1] # Get the author's name
chapter_list=[]
for i in r4:
url=“https://www.xbiquge.la”+i
chapter_list.append(url) # Synthesize each chapter url
for i in r2:
title=’{}by{}.txt’.format(i,r7) # When getting and saving txt Name
content_list=[]
with open(title,“a”,encoding=“utf-8”) as f:
f.writelines(r6)
f.write(‘\n’)
for (x,y) in zip(chapter_list,r3):
response2=requests.get(x)
res=response2.content.decode()
res1=etree.HTML(res)
res3=res1.xpath('//[@id=“content”]/text()’) # Get the content of each chapter
f.writelines(y)# Write the title of each chapter
f.write(‘\n’)
f.writelines(res3)# Write the content of each chapter
f.write(‘\n’)
print(“{} Collection completed , common {} chapter ”.format(title,len(chapter_list)))
边栏推荐
- The difference between vectorresize and reverse.
- Alibaba cloud polkit pkexec local rights lifting vulnerability
- UNET deployment based on deepstream
- Record the bug of unity 2020.3.31f1 once
- unable to execute xxx. SH: operation not permitted
- Binary tree problem solving (1)
- Homework of the 16th week
- Introduction to vmware workstation and vSphere
- Websites that it people often visit
- Binary tree problem solving (2)
猜你喜欢

Play with concurrency: what's the use of interruptedexception?

Unity particle Foundation

LeetCode-对链表进行插入排序

MySQL table insert Chinese change? Solution to the problem of No

What is 5g industrial wireless gateway? What functions can 5g industrial wireless gateway achieve?

Introduction to vmware workstation and vSphere

Thinkphp Kernel wo system source Commercial Open source multi - user + multi - Customer Service + SMS + email notification

C language practice - binary search (half search)

How much is the tuition fee of SCM training class? How long is the study time?

The core idea of performance optimization, dry goods sharing
随机推荐
idea自动导包和自动删包设置
C语言猜数字游戏
Thinkphp6 limit interface access frequency
win10 磁盘管理 压缩卷 无法启动问题
LCM of Spreadtrum platform rotates 180 °
What is 5g industrial wireless gateway? What functions can 5g industrial wireless gateway achieve?
Pytorch---使用Pytorch进行鸟类的预测
Keil compilation code of CY7C68013A
Binary tree problem solving (2)
Free drawing software recommended - draw io
Exposure X8标准版图片后期滤镜PS、LR等软件的插件
Pytorch-Yolov5從0運行Bug解决:
Homework of the 16th week
Let正版短信测压开源源码
Message mechanism -- message processing
CorelDRAW Graphics Suite2022免费图形设计软件
Spring moves are coming. Watch the gods fight
Which insurance company has a better product of anti-cancer insurance?
Pit encountered in win11 pytorch GPU installation
How to solve the problem that objects cannot be deleted in Editor Mode