当前位置:网站首页>Selenium crawls Baidu pictures
Selenium crawls Baidu pictures
2022-07-05 13:48:00 【Weichi Begonia】
Selenium Crawling Baidu pictures
# coding=utf-8
""" obtain 10 Baidu pictures """
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import time, requests
def download_img(kw):
# Open the browser
browser = webdriver.Chrome()
time.sleep(3)
# The other party visits the corresponding web page
url = r'https://image.baidu.com/'
browser.get(url)
time.sleep(3)
# Enter the corresponding... In the input box key words
keyword = browser.find_element_by_id('kw')
keyword.send_keys(kw) # Enter key
keyword.send_keys(Keys.ENTER) # enter
time.sleep(3)
# Set the picture size
size = browser.find_element_by_id('sizeFilter')
big_size = browser.find_element_by_xpath('/html/body/div[1]/div[4]/div[2]/div/div[2]/div/div[1]')
ActionChains(browser).click(size).move_to_element(big_size).click().perform()
# Click on the first picture
first_pic = browser.find_element_by_xpath('//*[@id="imgid"]/div/ul/li[1]/div/a/img')
ActionChains(browser).click(first_pic).perform() # call perform() When the method is used , The events in the queue will execute in turn .
time.sleep(5)
# Switch to new window in
browser.switch_to.window(browser.window_handles[1])
for i in range(20):
# Get photo
pic = browser.find_element_by_xpath('//*[@id="currentImg"]')
src = pic.get_attribute('src')
r = requests.get(src)
title = browser.find_element_by_class_name('pic-title') # Output picture title at the same time
print(i, ' ', title.text)
if r.status_code == 200:
# Save the picture to a file
file_name = r'D:\1. learning\python_web\pachong\img\{}.jpg'.format(i)
with open(file_name, 'wb') as f:
f.write(r.content) # Use of words r.text / Other formats use r.content
# Switch to the next picture
next_btn = browser.find_element_by_class_name('img-next')
next_btn.click()
time.sleep(5)
return
if __name__ == '__main__':
download_img(' Mickey Mouse ')
边栏推荐
- Those things I didn't know until I took the postgraduate entrance examination
- 嵌入式软件架构设计-消息交互
- 如何把大的‘tar‘存档文件分割成特定大小的多个文件
- 面试官灵魂拷问:为什么代码规范要求 SQL 语句不要过多的 join?
- Ueditor + PHP enables Alibaba cloud OSS upload
- Rk3566 add LED
- 龙芯派2代烧写PMON和重装系统
- [South China University of technology] information sharing of postgraduate entrance examination and re examination
- Parsing XML using Dom4j
- Prefix, infix, suffix expression "recommended collection"
猜你喜欢
几款分布式数据库的对比
Usage, installation and use of TortoiseSVN
Huawei push service content, read notes
FPGA learning notes: vivado 2019.1 add IP MicroBlaze
Kotlin协程利用CoroutineContext实现网络请求失败后重试逻辑
uplad_ Labs first three levels
NFT value and white paper acquisition
Aikesheng sqle audit tool successfully completed the evaluation of "SQL quality management platform grading ability" of the Academy of communications and communications
内网穿透工具 netapp
The "Baidu Cup" CTF competition was held in February 2017, Web: explosion-2
随机推荐
Simple PHP paging implementation
MySQL - database query - sort query, paging query
Can and can FD
ELFK部署
嵌入式软件架构设计-消息交互
Idea设置方法注释和类注释
web3.eth. Filter related
These 18 websites can make your page background cool
Datapipeline was selected into the 2022 digital intelligence atlas and database development report of China Academy of communications and communications
restTemplate详解
Clock cycle
Programmer growth Chapter 8: do a good job of testing
通讯录(链表实现)
Idea remote debugging agent
Redis6 transaction and locking mechanism
:: ffff:192.168.31.101 what address is it?
What is information security? What is included? What is the difference with network security?
kafaka 日志收集
What is a network port
[public class preview]: basis and practice of video quality evaluation