当前位置:网站首页>Selenium crawl notes
Selenium crawl notes
2022-06-24 20:36:00 【Yu Xu】
Import third-party library selenium.
import selenium
from selenium import webdriverDownload the corresponding browser driver :
edge:https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
chrome:https://code.google.com/p/chromedriver/downloads/list
firefox:https://github.com/mozilla/geckodriver/releases/
IE:NuGet Gallery | Selenium.WebDriver.IEDriver 4.0.0
After downloading, it is a compressed folder , Open folder , There's a webmsedgedriver.exe file , Copy this file to division C In a dish other than a dish , Then configure the path to the system environment of this computer .
The path of the configuration environment is “ This computer — Right click properties — About — Advanced system setup — senior — environment variable — System variables —path
take msedgedriver.exe The path of the file is configured , And then click OK .
# Create a browser object , I am here edge browser , If you are using chrome Browser words , there edge To be converted into chrome,firefox So it is with , The first letter should be capitalized !!
driver = webdriver.Edge()
driver.get('https://www.taobao.com/?spm=a21bo.jianhua.201857.1.5af911d9NTiGPH')
# Page maximization
driver.maximize_window()Run it here , Find out driver = webdriver.Edge() There is an error .
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the specified file .
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\learn\ test .py", line 4, in <module>
driver = webdriver.Edge()
File "D:\ Study \pycharm practice \learn\lib\site-packages\selenium\webdriver\edge\webdriver.py", line 62, in __init__
super(WebDriver, self).__init__(DesiredCapabilities.EDGE['browserName'], "ms",
File "D:\ Study \pycharm practice \learn\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 90, in __init__
self.service.start()
File "D:\ Study \pycharm practice \learn\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'msedgedriver' executable needs to be in PATH. Please download from https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Here it is said that the driver needs to be in the configuration , But I thought I had configured the path , How to configure , Later I found out , The original path is given to in the form of an object webdriver.Edge() In this way .
So the code has to be changed to this .
# Of course, here it is edge You have to change it to your own browser name , Lowercase is OK
from selenium.webdriver.edge.service import Service
# use Service() Method to give a path to a variable s, Regular expressions are used here
s = Service(r'D:\msedgedriver.exe')
# there service yes Edge Parameters in methods , The specific usage can be selected with the mouse Edge, Then press and hold ctrl, Then click with the left mouse button , The corresponding method file will pop up
driver = webdriver.Edge(service=s)
driver.get('https://www.taobao.com/?spm=a21bo.jianhua.201857.1.5af911d9NTiGPH')
# Page maximization
driver.maximize_window()Then run the code , Taobao will pop up , There's a point here , When code and people browse the web, there will be different situations :
1、 If people come to visit the web , Search in search , Select items , Until the purchase is finalized , The interface pop-up window for logging in to the user account will pop up ;
2、 If it is the code to manipulate the driver to browse the web , Then you will enter the set product in the search column , Pop up the pop-up window of the login interface directly .
Let's first write the code of the content to search .
Here is another content :
General is to use find_element_by_xpath() To get web page elements , It turned out to be mine pycharm But on the bottom
# Here we need to use a different method , Add a... To it from selenium.webdriver.common.by import By
# It is not recommended to use find_element_by_xpath(), Please use find_element() Methods to replace
find_element_by_* commands are deprecated. Please use find_element() instead
# That is to say find_elemnet_by_xpath() == find_element(By.XAPTH, ‘ The element you are looking for ')This is used here. xpath Method to get the web page elements of the search box , Then set the random delay of the web page 1 To 3 second .
import random
driver.find_element(By.XPATH, '//*[@id="J_TSearchForm"]/div[1]/button').click()
time.sleep(random.randint(1, 3))Then get the search button , Also set random delay 1 To 3 second .
边栏推荐
- 京东一面:Redis 如何实现库存扣减操作?如何防止商品被超卖?
- 红象云腾完成与龙蜥操作系统兼容适配,产品运行稳定
- CVPR 2022缅怀孙剑!同济、阿里获最佳学生论文奖,何恺明入围
- Openvino2022 dev tools installation and use
- 《梦华录》“超点”,鹅被骂冤吗?
- 网络安全审查办公室对知网启动网络安全审查,称其“掌握大量重要数据及敏感信息”
- [cann document express issue 05] let you know what operators are
- Berkeley, MIT, Cambridge, deepmind et d'autres grandes conférences en ligne: vers une IA sûre, fiable et contrôlable
- [performance tuning basics] performance tuning strategy
- Some ideas about chaos Engineering
猜你喜欢
随机推荐
【CANN文档速递05期】一文让您了解什么是算子
Predicate
微信小程序中使用vant组件
"Ningwang" was sold and bought at the same time, and Hillhouse capital has cashed in billions by "selling high and absorbing low"
The name of the button in the Siyuan notes toolbar has changed to undefined. Has anyone ever encountered it?
Freshman girls' nonsense programming is popular! Those who understand programming are tied with Q after reading
Basic operation of sequence table
红象云腾完成与龙蜥操作系统兼容适配,产品运行稳定
主数据建设的背景
Set up your own website (14)
年轻人捧红的做饭生意经:博主忙卖课带货,机构月入百万
基于QT+MySQL的相机租赁管理系统
网络安全审查办公室对知网启动网络安全审查,称其“掌握大量重要数据及敏感信息”
Stackoverflow 年度报告 2022:开发者最喜爱的数据库是什么?
Leetcode (146) - LRU cache
JVM tuning
Coinbase will launch the first encryption derivative for individual investors
Showcase是什么?Showcase需要注意什么?
[cann document express issue 05] let you know what operators are
The AI for emotion recognition was "harbouring evil intentions", and Microsoft decided to block it!









