当前位置:网站首页>Beginner crawler - biqu Pavilion crawler
Beginner crawler - biqu Pavilion crawler
2022-07-02 04:36:00 【weixin_ forty-three million four hundred and forty-six thousand】
import requests
from lxml import etree
base_url=input(“ Please enter a novel url:”) # Like spring feast url by https://www.xbiquge.la/20/20671/
headers={
“User-Agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0”
}
response=requests.get(base_url,headers=headers)
r=response.content.decode()
r1=etree.HTML # String format html The fragment is parsed into html file
r2=r1.xpath(‘//[@id=“info”]/h1/text()‘) # Get the name of the novel
r3=r1.xpath(’//[@id=“list”]/dl/dd/a/text()’) # Get the title of each chapter of the novel
r4=r1.xpath(‘//[@id=“list”]/dl/dd/a/@href’) # Get links to each chapter
r5=r1.xpath('//[@id=“info”]/p[1]/text()’) # Get the author's name
r6=r1.xpath(‘//[@id=“intro”]/p[2]/text()‘) # Get the copy
r7=’‘.join(r5).split(’:‘)[1] # Get the author's name
chapter_list=[]
for i in r4:
url=“https://www.xbiquge.la”+i
chapter_list.append(url) # Synthesize each chapter url
for i in r2:
title=’{}by{}.txt’.format(i,r7) # When getting and saving txt Name
content_list=[]
with open(title,“a”,encoding=“utf-8”) as f:
f.writelines(r6)
f.write(‘\n’)
for (x,y) in zip(chapter_list,r3):
response2=requests.get(x)
res=response2.content.decode()
res1=etree.HTML(res)
res3=res1.xpath('//[@id=“content”]/text()’) # Get the content of each chapter
f.writelines(y)# Write the title of each chapter
f.write(‘\n’)
f.writelines(res3)# Write the content of each chapter
f.write(‘\n’)
print(“{} Collection completed , common {} chapter ”.format(title,len(chapter_list)))
边栏推荐
- Federal learning: dividing non IID samples according to Dirichlet distribution
- Thinkphp内核工单系统源码商业开源版 多用户+多客服+短信+邮件通知
- A summary of common interview questions in 2022, including 25 technology stacks, has helped me successfully get an offer from Tencent
- 万卷共知,一书一页总关情,TVP读书会带你突围阅读迷障!
- Dare to go out for an interview without learning some distributed technology?
- Wechat applet pull-down loading more waterfall flow loading
- Pytorch---使用Pytorch进行鸟类的预测
- Introduction to vmware workstation and vSphere
- C语言猜数字游戏
- Which insurance company has a better product of anti-cancer insurance?
猜你喜欢

Markdown edit syntax

Play with concurrency: what's the use of interruptedexception?

Exposure X8 Standard Version picture post filter PS, LR and other software plug-ins

Mysql表insert中文变?号的问题解决办法

idea自動導包和自動删包設置

正大美欧4的主账户关注什么数据?

win10 磁盘管理 压缩卷 无法启动问题
![[C language] basic learning notes](/img/d2/1aeb2d37d97b9cfe4b21aa3ac37645.png)
[C language] basic learning notes

Yolov5网络修改教程(将backbone修改为EfficientNet、MobileNet3、RegNet等)

深圳打造全球“鸿蒙欧拉之城”将加快培育生态,优秀项目最高资助 1000 万元
随机推荐
Starting from the classification of database, I understand the map database
Unity particle Foundation
Yyds dry inventory compiler and compiler tools
Use a mask to restrict the input of the qlineedit control
Geotrust OV Multi - Domain Domain SSL Certificate rmb2100 per year contains several Domain names?
Mapping location after kotlin confusion
[JS -- map string]
Keil compilation code of CY7C68013A
What are the rules and trading hours of agricultural futures contracts? How much is the handling fee deposit?
Homework of the 16th week
Message mechanism -- message processing
Federal learning: dividing non IID samples according to Dirichlet distribution
Bitmap principle code record
LeetCode-归并排序链表
Is it safe to open an account with first venture securities? I like to open an account. How can I open it?
Pytorch---使用Pytorch实现U-Net进行语义分割
记录一次Unity 2020.3.31f1的bug
Let正版短信测压开源源码
Thinkphp6 limit interface access frequency
Markdown edit syntax