当前位置:网站首页>50行代码爬取Top500图书导入TXT文档
50行代码爬取Top500图书导入TXT文档
2022-06-26 18:07:00 【小狐狸梦想去童话镇】
50行代码爬取Top500图书导入TXT文档
import re #正则表达式,进行文字提取
import requests
import json
def main(page):
#声明爬取网址
baseurl = "http://bang.dangdang.com/books/fivestars/01.00.00.00.00.00-recent30-0-0-1-" + str(page)
#爬取网页内容
datalist = getData(baseurl)
#保存网页数据
savepath = "Top500_book.txt"
saveData(datalist,savepath)
#得到数据
def getData(baseurl):
html = askURL(baseurl)
datalist = parse_result(html)
return datalist
#对源码进行解析
def parse_result(html):
pattern = re.compile('<li>.*?list_num.*?(\d+).</div>.*?<img src="(.*?)".*?class="name".*?title="(.*?)">.*?class="star">.*?class="tuijian">(.*?)</span>.*?class="publisher_info">.*?target="_blank">(.*?)</a>.*?class="biaosheng">.*?<span>(.*?)</span></div>.*?<p><span\sclass="price_n">¥(.*?)</span>.*?</li>',re.S)
items = re.findall(pattern,html)
for item in items:
yield {
'range': item[0],
'iamge': item[1],
'title': item[2],
'recommend': item[3],
'author': item[4],
'times': item[5],
'price': item[6]
}
#获取网页源码
def askURL(url):
try:
response = requests.get(url)
if response.status_code == 200:
return response.text
except requests.RequestException:
return None
#保存数据到txt文本文档
def saveData(datalst,savepath):
print("save....")
for item in datalst:
with open(savepath, 'a', encoding='UTF-8') as f:
f.write(json.dumps(item, ensure_ascii=False) + '\n')
f.close()
if __name__ == '__main__':
#for循环实现翻页
for i in range(1,26):
main(i)
【运行结果】
边栏推荐
- ZCMU--1367: Data Structure
- Case study of row lock and isolation level
- Please advise tonghuashun which securities firm to choose for opening an account? Is it safe to open an account online now?
- JS common regular expressions
- 正则匹配相同字符
- Data Encryption Standard DES security
- vutils.make_grid()与黑白图像有关的一个小体会
- KDD 2022 | 如何在跨域推荐中使用对比学习?
- 数据加密标准(DES)概念及工作原理
- 刻录光盘的程序步骤
猜你喜欢
随机推荐
(必须掌握的多线程知识点)认识线程,创建线程,使用Thread的常见方法及属性,以及线程的状态和状态转移的意义
CLion断点单步调试
Chinese (Simplified) language pack
图像二值化处理
RSA概念详解及工具推荐大全 - lmn
JVM入个门(1)
I want to know. I am in Zhaoqing. Where can I open an account? Is it safe to open an account online?
决策树与随机森林
How to open a stock account? Is it safe to open an account online now?
深入理解MySQL锁与事务隔离级别
pycharm如何修改多行注释快捷键
Number of solutions for knapsack problem
数据加密标准DES安全性
Bayesian network explanation
Map and filter methods for processing scarce arrays
ROS查询话题具体内容常用指令
贝叶斯网络详解
RuntimeError: CUDA error: out of memory自己的解决方法(情况比较特殊估计对大部分人不适用)
JVM entry door (1)
sqlite数据库的系统表sqlite_master






![[buuctf.reverse] 126-130](/img/df/e35633d85caeff1dece62a66cb7804.png)


