当前位置:网站首页>Selenium crawls Baidu pictures
Selenium crawls Baidu pictures
2022-07-05 13:48:00 【Weichi Begonia】
Selenium Crawling Baidu pictures
# coding=utf-8
""" obtain 10 Baidu pictures """
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time, requests
def download_img(kw):
# Open the browser
browser = webdriver.Chrome()
time.sleep(3)
# The other party visits the corresponding web page
url = r'https://image.baidu.com/'
browser.get(url)
time.sleep(3)
# Enter the corresponding... In the input box key words
keyword = browser.find_element_by_id('kw')
keyword.send_keys(kw) # Enter key
keyword.send_keys(Keys.ENTER) # enter
time.sleep(3)
# Set the picture size
size = browser.find_element_by_id('sizeFilter')
big_size = browser.find_element_by_xpath('/html/body/div[1]/div[4]/div[2]/div/div[2]/div/div[1]')
ActionChains(browser).click(size).move_to_element(big_size).click().perform()
# Click on the first picture
first_pic = browser.find_element_by_xpath('//*[@id="imgid"]/div/ul/li[1]/div/a/img')
ActionChains(browser).click(first_pic).perform() # call perform() When the method is used , The events in the queue will execute in turn .
time.sleep(5)
# Switch to new window in
browser.switch_to.window(browser.window_handles[1])
for i in range(20):
# Get photo
pic = browser.find_element_by_xpath('//*[@id="currentImg"]')
src = pic.get_attribute('src')
r = requests.get(src)
title = browser.find_element_by_class_name('pic-title') # Output picture title at the same time
print(i, ' ', title.text)
if r.status_code == 200:
# Save the picture to a file
file_name = r'D:\1. learning\python_web\pachong\img\{}.jpg'.format(i)
with open(file_name, 'wb') as f:
f.write(r.content) # Use of words r.text / Other formats use r.content
# Switch to the next picture
next_btn = browser.find_element_by_class_name('img-next')
next_btn.click()
time.sleep(5)
return
if __name__ == '__main__':
download_img(' Mickey Mouse ')
边栏推荐
- Self built shooting range 2022
- 2022年机修钳工(高级)考试题模拟考试题库模拟考试平台操作
- Scientific running robot pancakeswap clip robot latest detailed tutorial
- 基于微信小程序的订餐系统
- Hide Chinese name
- asp. Net read TXT file
- 内网穿透工具 netapp
- The "Baidu Cup" CTF competition was held in February 2017, Web: explosion-2
- MySQL - database query - sort query, paging query
- Jasypt configuration file encryption | quick start | actual combat
猜你喜欢
随机推荐
PHP basic syntax
Could not set property ‘id‘ of ‘class XX‘ with value ‘XX‘ argument type mismatch 解决办法
Cloudcompare - point cloud slice
【MySQL 使用秘籍】一网打尽 MySQL 时间和日期类型与相关操作函数(三)
2022年机修钳工(高级)考试题模拟考试题库模拟考试平台操作
The real king of caching, Google guava is just a brother
Mmseg - Mutli view time series data inspection and visualization
Attack and defense world crypto WP
web3.eth. Filter related
Operational research 68 | the latest impact factors in 2022 were officially released. Changes in journals in the field of rapid care
那些考研后才知道的事
When there are too many input boxes such as input transmitted at one time in the form, the post data is intercepted
Kotlin协程利用CoroutineContext实现网络请求失败后重试逻辑
Kafaka log collection
不知道这4种缓存模式,敢说懂缓存吗?
Primary code audit [no dolls (modification)] assessment
Zhubo Huangyu: these spot gold investment skills are not really bad
[public class preview]: basis and practice of video quality evaluation
内网穿透工具 netapp
What about data leakage? " Watson k'7 moves to eliminate security threats







