当前位置:网站首页>爬虫学习知识
爬虫学习知识
2022-06-10 21:49:00 【泸州彭于晏】
beautifulSoup获取标签属性值

beautifulSoup获取标签值
使用.string获取内容
实例
py1.py文件
import requests
from bs4 import BeautifulSoup
import csv
import time
url = "https://book.douban.com/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.5005.63 Safari/537.36 Edg/102.0.1245.33'}
resp = requests.get(url, headers=headers)
result = resp.text
# print(result)
# body = BeautifulSoup(result, 'lxml')
body = BeautifulSoup(result, 'html.parser')
# print(body)
carousel = body.find('div', class_='carousel')
# print(carousel)
slide_list = carousel.find('div', class_='slide-list')
# print(lists)
uls = slide_list.find_all('ul')
# print(uls)
# print(len(uls))
# 存放遍历出的数据
books = []
for ul in uls:
lis = ul.find_all('li')
# print(lis)
# print(len(lis))
for li in lis:
a_s = li.find_all('a') # 找到全部a标签
# print(a_s[0])
t_a = li.find_all('div', class_='title')[0].find('a')
# print(t_a.string) # 获取a标签的内容
# 找到当前下的所有a标签
href_a = li.find_all('div', class_='title')[0].find('a')
# print(href_a['href']) # 获取a标签属性值
# 向book列表中添加数据
books.append([t_a.string, href_a['href']])
# 获取图片
print(a_s[0])
# 获取图片地址
imgs = a_s[0].find('img')['src']
# print(imgs)
# 获取图片alt属性
name = a_s[0].find('img')['alt']
# 转换为二进制流
img_resp = requests.get(imgs).content
print(img_resp)
# print(img_resp)
# time.sleep(0.5)
with open("./images/{}.jpg".format(name), "wb") as f: # 文件写入
f.write(img_resp)
time.sleep(0.5) # 每隔0.5秒下载一张图片放入D://情绪图片测试
f.close()
print("{}图片爬取成功!".format(name))
print(books)
print(len(books))
# 存入csv文件
with open('data.csv', 'w', encoding='utf-8') as csvfile:
# 创建实例
writer = csv.writer(csvfile)
writer.writerow(['title','url'])
for i in books:
writer.writerow(i)
csvfile.close()
py2.py文件
import requests
from bs4 import BeautifulSoup
import csv
books = []
# 打开文件,将数据读出
with open('data.csv', 'r', encoding='utf-8') as csvfile:
# 创建实例
reader = csv.reader(csvfile)
# print(reader)
for row in reader:
# print(row)
books.append(row)
# for i in row:
# print(i)
# books.append(i)
csvfile.close()
#
print(books[2:])
# print(len(books))
https://blog.csdn.net/m0_60964321/article/details/122269923?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522165470154616780366517015%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=165470154616780366517015&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2alltop_click~default-2-122269923-null-null.142v11pc_search_result_control_group,157v13control&utm_term=python%E7%88%AC%E8%99%AB%E4%BF%9D%E5%AD%98%E5%9B%BE%E7%89%87&spm=1018.2226.3001.4187
边栏推荐
- Innovation and exploration are added layer by layer, and the field model of intelligent process mining tends to be mature
- Software features and functions of the blind box mall app system development
- 数字孪生:第三人称鼠标操作
- DC4 of vulnhub
- Image mosaic camera mosaic notes
- MySQL相关-0416
- Web3生态去中心化金融平台——Sealem Finance
- [raise bar C #] how to call the base of the interface
- Watlow signs agreement to acquire EUROTHERM from Schneider Electric
- Flowable BPMN相关知识
猜你喜欢

Icml2022 | revoir la traduction vocale de bout en bout du texte à partir de zéro

简单阻抗匹配电路及公式
![Authoritative guide to Web3 technology stack [2022]](/img/76/0f64604f5e5355300f5ec498ea23e1.png)
Authoritative guide to Web3 technology stack [2022]

vulnhub之dc3

Play electronics, poor three generations
![[tcapulusdb knowledge base] Introduction to tcapulusdb engine parameter adjustment](/img/74/6ce32e007c064c9255269fe38761a4.png)
[tcapulusdb knowledge base] Introduction to tcapulusdb engine parameter adjustment

Sherri Monroe被任命为增材制造商绿色贸易协会的新任执行董事
![C language internal skill cultivation [integer stored in memory]](/img/ef/7fec8e89f432603c503dddb42bd57f.png)
C language internal skill cultivation [integer stored in memory]

vulnhub之dc4

Interpreting the registry class of mmcv
随机推荐
Object 有哪些常用方法
2022 t elevator repair test question simulation test question bank and online simulation test
Mmdetection dataloader construction
Vscode common shortcuts
CCF CSP 202109-3 impulse neural network
图像拼接摄像头拼接笔记
Use of cocoeval function
SMB anonyme
Kubernetes binary installation (v1.20.15) (VI) deploying worknode nodes
【原创】医鹿APP九价HPV数据抓包分析
Opencv_100问_第二章 (6-10)
Blue Bridge Cup_ Pick substring_ Combinatorial mathematics_ Multiplication principle/ Ruler method
"Draw the bow as strong, use the arrow as long", Manfu technology opens a new track for the data service industry
Can Huawei matepad become the secondary screen of your laptop?
Whale conference sharing: what should we do if the conference is difficult?
2022g1 industrial boiler stoker test questions and online simulation test
A journey of database full SQL analysis and audit system performance optimization
Icml2022 | revoir la traduction vocale de bout en bout du texte à partir de zéro
Interpreting the registry class of mmcv
[content co creation] issue 17: summer is hot and you are so sweet! Participating in the signing of Huawei cloud Xiaobian, there is always a pattern gift bag that moves you!