当前位置:网站首页>Beginner crawler - biqu Pavilion crawler
Beginner crawler - biqu Pavilion crawler
2022-07-02 04:36:00 【weixin_ forty-three million four hundred and forty-six thousand】
import requests
from lxml import etree
base_url=input(“ Please enter a novel url:”) # Like spring feast url by https://www.xbiquge.la/20/20671/
headers={
“User-Agent”:“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0”
}
response=requests.get(base_url,headers=headers)
r=response.content.decode()
r1=etree.HTML # String format html The fragment is parsed into html file
r2=r1.xpath(‘//[@id=“info”]/h1/text()‘) # Get the name of the novel
r3=r1.xpath(’//[@id=“list”]/dl/dd/a/text()’) # Get the title of each chapter of the novel
r4=r1.xpath(‘//[@id=“list”]/dl/dd/a/@href’) # Get links to each chapter
r5=r1.xpath('//[@id=“info”]/p[1]/text()’) # Get the author's name
r6=r1.xpath(‘//[@id=“intro”]/p[2]/text()‘) # Get the copy
r7=’‘.join(r5).split(’:‘)[1] # Get the author's name
chapter_list=[]
for i in r4:
url=“https://www.xbiquge.la”+i
chapter_list.append(url) # Synthesize each chapter url
for i in r2:
title=’{}by{}.txt’.format(i,r7) # When getting and saving txt Name
content_list=[]
with open(title,“a”,encoding=“utf-8”) as f:
f.writelines(r6)
f.write(‘\n’)
for (x,y) in zip(chapter_list,r3):
response2=requests.get(x)
res=response2.content.decode()
res1=etree.HTML(res)
res3=res1.xpath('//[@id=“content”]/text()’) # Get the content of each chapter
f.writelines(y)# Write the title of each chapter
f.write(‘\n’)
f.writelines(res3)# Write the content of each chapter
f.write(‘\n’)
print(“{} Collection completed , common {} chapter ”.format(title,len(chapter_list)))
边栏推荐
- Mysql表insert中文变?号的问题解决办法
- cookie、session、tooken
- Landing guide for "prohibit using select * as query field list"
- A summary of common interview questions in 2022, including 25 technology stacks, has helped me successfully get an offer from Tencent
- 二叉树解题(二)
- office_ Delete the last page of word (the seemingly blank page)
- LeetCode-归并排序链表
- Common locks in MySQL
- unable to execute xxx. SH: operation not permitted
- 云服务器的安全设置常识
猜你喜欢

MySQL advanced SQL statement 2

office_ Delete the last page of word (the seemingly blank page)

Pytorch---使用Pytorch进行图像定位

Common sense of cloud server security settings

A summary of common interview questions in 2022, including 25 technology stacks, has helped me successfully get an offer from Tencent

FAQ | FAQ for building applications for large screen devices

Microsoft Research Institute's new book "Fundamentals of data science", 479 Pages pdf

CorelDRAW graphics suite2022 free graphic design software

DC-1靶场搭建及渗透实战详细过程(DC靶场系列)

How much can a job hopping increase? Today, I saw the ceiling of job hopping.
随机推荐
Unity particle Foundation
cookie、session、tooken
Pytoch --- use pytoch for image positioning
powershell_ View PowerShell function source code (environment variable / alias) / take function as parameter
Research on the security of ognl and El expressions and memory horse
缓存一致性解决方案——改数据时如何保证缓存和数据库中数据的一致性
What is 5g industrial wireless gateway? What functions can 5g industrial wireless gateway achieve?
Geotrust OV Multi - Domain Domain SSL Certificate rmb2100 per year contains several Domain names?
How to write a client-side technical solution
Gin framework learning code
What are the rules and trading hours of agricultural futures contracts? How much is the handling fee deposit?
正大美欧4的主账户关注什么数据?
Cache consistency solution - how to ensure the consistency between the cache and the data in the database when changing data
The solution to the complexity brought by lambda expression
How much is the tuition fee of SCM training class? How long is the study time?
CorelDRAW Graphics Suite2022免费图形设计软件
Read "the way to clean code" - function names should express their behavior
Let正版短信测压开源源码
Keil compilation code of CY7C68013A
Introduction to JSON usage scenarios and precautions