当前位置:网站首页>初学爬虫-笔趣阁爬虫
初学爬虫-笔趣阁爬虫
2022-07-02 04:35:00 【weixin_43446292】
import requests
from lxml import etree
base_url=input(“请输入小说url:”) #如春日宴的url为https://www.xbiquge.la/20/20671/
headers={
“User-Agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0”
}
response=requests.get(base_url,headers=headers)
r=response.content.decode()
r1=etree.HTML #将字符串格式的 html 片段解析成 html 文档
r2=r1.xpath(‘//[@id=“info”]/h1/text()‘) #获取小说名字
r3=r1.xpath(’//[@id=“list”]/dl/dd/a/text()’) #获取小说每章标题
r4=r1.xpath(‘//[@id=“list”]/dl/dd/a/@href’) #获取每章节链接
r5=r1.xpath('//[@id=“info”]/p[1]/text()’) #获取作者名字
r6=r1.xpath(‘//[@id=“intro”]/p[2]/text()‘) #获取文案
r7=’‘.join(r5).split(’:‘)[1] #获取作者名字
chapter_list=[]
for i in r4:
url=“https://www.xbiquge.la”+i
chapter_list.append(url) #合成每章节的url
for i in r2:
title=’{}by{}.txt’.format(i,r7) #获取保存时txt的名字
content_list=[]
with open(title,“a”,encoding=“utf-8”) as f:
f.writelines(r6)
f.write(‘\n’)
for (x,y) in zip(chapter_list,r3):
response2=requests.get(x)
res=response2.content.decode()
res1=etree.HTML(res)
res3=res1.xpath('//[@id=“content”]/text()’) #获取每章节小说内容
f.writelines(y)#写入每章节的标题
f.write(‘\n’)
f.writelines(res3)#写入每章节的小说内容
f.write(‘\n’)
print(“{}采集完毕,共{}章节”.format(title,len(chapter_list)))
边栏推荐
- Microsoft Research Institute's new book "Fundamentals of data science", 479 Pages pdf
- powershell_ View PowerShell function source code (environment variable / alias) / take function as parameter
- 正大美欧4的主账户关注什么数据?
- cookie、session、tooken
- 6月书讯 | 9本新书上市,阵容强大,闭眼入!
- Yyds dry goods inventory kubernetes introduction foundation pod concept and related operations
- 正大留4的主账户信息汇总
- Mysql中常见的锁
- 【提高课】ST表解决区间最值问题【2】
- unable to execute xxx. SH: operation not permitted
猜你喜欢

C language practice - number guessing game

One click generation and conversion of markdown directory to word format
![[C language] Dynamic Planning --- from entry to standing up](/img/7e/29482c8f3970bb1a40240e975ef97f.png)
[C language] Dynamic Planning --- from entry to standing up

Www2022 | know your way back: self training method of graph neural network under distribution and migration

Lei Jun wrote a blog when he was a programmer. It's awesome
![Learn AI safety monitoring project from zero [attach detailed code]](/img/a9/cb93f349229e86cbb05ad196ae9553.jpg)
Learn AI safety monitoring project from zero [attach detailed code]

66.qt quick QML Custom Calendar component (supports vertical and horizontal screens)

Dare to go out for an interview without learning some distributed technology?

深圳打造全球“鸿蒙欧拉之城”将加快培育生态,优秀项目最高资助 1000 万元

Typescript practice for SAP ui5
随机推荐
How much can a job hopping increase? Today, I saw the ceiling of job hopping.
Record the bug of unity 2020.3.31f1 once
Pytoch yolov5 runs bug solution from 0:
CY7C68013A之keil编译代码
Okcc why is cloud call center better than traditional call center?
win10 磁盘管理 压缩卷 无法启动问题
Research on the security of ognl and El expressions and memory horse
How to write a client-side technical solution
【c语言】基础篇学习笔记
The confusion I encountered when learning stm32
okcc为什么云呼叫中心比传统呼叫中心更好?
BGP experiment the next day
LeetCode-对链表进行插入排序
云服务器的安全设置常识
Pytorch yolov5 exécute la résolution de bogues à partir de 0:
Landing guide for "prohibit using select * as query field list"
Homework of the 16th week
Flag bits in assembly language: CF, PF, AF, ZF, SF, TF, if, DF, of
汇编语言中的标志位:CF、PF、AF、ZF、SF、TF、IF、DF、OF
Use a mask to restrict the input of the qlineedit control