当前位置:网站首页>Web page automation practice 4. get the name, price and rating information of all hotels and write them into the file
Web page automation practice 4. get the name, price and rating information of all hotels and write them into the file
2022-06-21 16:09:00 【QingHan】
Catalog
- One 、find_elements() The role of
- 1. Get the elements of all hotel names in the current page
- 2. Get the elements of all hotel prices in the current page
- 3. Get the elements of all hotel ratings in the current page
- Two 、 Get the price of each store separately 、 score 、 Hotel name and write in the file
- 1. Get the price of each store separately 、 score 、 Hotel name
- 2. Write the obtained data into the file
- 3、 ... and 、 Code
- Four 、 Summary and extension
- 1. summary
- 2. expand
One 、find_elements() The role of
1. Get the elements of all hotel names in the current page
By element class Property to get the hotel name , Got it 20 individual
this 20 The format of each hotel is the same . Every div They're all independent . Every div It's all hotel information .
this 20 All the hotel names have the same father div
1)find_element(By.XPATH,)
find_element It means to find an element .//span[@class="name"] This expression may match one or more elements , How many are determined by the page .
find_element(By.XPATH,) Match only one of the found elements , And it is the first element to appear in the page .
Page in order , From the top html Start loading from top down . If there is more than one... In the page , Then it matches the first element .
2)find_elements(By.XPATH,)
To get this 20 The text content of an element , Their text content is the hotel name .
find_elements(By.XPATH,) ---- Get all the elements that match the expression .
Elements It shows html The elements in the are rendered in the same order as on the page .
2. Get the elements of all hotel prices in the current page
This expression matches to 20 Elements
3. Get the elements of all hotel ratings in the current page
This expression matches to 20 Elements
Two 、 Get the price of each store separately 、 score 、 Hotel name and write in the file
this 20 Elements , Every such element has a price 、 score 、 Hotel name .
1. Get the price of each store separately 、 score 、 Hotel name
These lines of code will be executed repeatedly , This is a traversal process . Wait until the last value is all taken , It will execute the following section . These lines are all indented , Indicates that each value is taken , What everyone will do .
The effect of this newline is print() It's done
2. Write the obtained data into the file
fs = open(" My hotel data .txt", "w",encoding='UTF-8') UTF-8 Support Chinese and English .
read : For example, reading a local data table , Local must be available to read , No, I can't read .
w Writable mode : file does not exist , Just create the file and write . File exists , Direct write .
w This mode directly overwrites the contents of the file when writing .
3、 ... and 、 Code
from selenium.webdriver.common.by import By
from selenium import webdriver
import time
# Open Google browser , Established a session with the browser .
# driver Variable = conversation .
driver = webdriver.Chrome()
driver.get("https://www.elong.com/") # After this line of code is executed , Wait until the page is almost loaded before executing the next line of code .
# get() This function will wait until the page is loaded .
# Sometimes the page is loaded , But the rendering is a little slow .
# So I want to wait 1 Second is OK .
time.sleep(1)
# Find elements through xpath Positioning mode .
ele = driver.find_element(By.XPATH, '//input[@data-bindid="city"]') # Locate the input field to the destination , Copy the expression you just wrote .
# ele= The elements I found
# Click on the action -- Click on the destination input box , The city selection box pops up .
ele.click()
time.sleep(2) # After running this line of code, it will stay 2 second , Then run the next line of code .
# Because the next element to be manipulated , It is dynamic ( It's not the first time you visit a website , But you make an action to make others appear dynamically ).
# It takes time to render on the page . This time you need to wait . Wait a minute , Then go to find this element to operate .
# Input operation --ele.send_keys(" Input operation ")
# Get its properties -- ele.get_attribute(" The attribute name ")
# Get its text content -- ele.text
# Choose Guangzhou, one of the most popular cities
driver.find_element(By.XPATH, '//li[@data="0|15"]').click()
time.sleep(1) # Plus the waiting time .sleep Time should not be too long ,7 second 8 second , This time is too long .
# Run the code without waiting time , You will find that the operation is too fast , And no corresponding date is selected .
# Select the check-in date
ele = driver.find_element(By.XPATH, '//input[@data-bindid="checkIn"]')
ele.clear() # Before input date , Clear the contents of the input box first .
ele.send_keys("2022-05-27")
time.sleep(1) # Waiting time is added to each operation room .
'''
After entering the date , The date box does not disappear , You have to make the date box disappear . Click on other elements besides it
( Select a fixed element of the page , Then click the destination element ),
The date box will disappear . Then go to the next element . Otherwise, the date box will block other elements .
Next, click search , The search button is obscured by the date box . It will affect your operation effect .
So I deal with it according to the characteristics of the page .
'''
# Turn off the pop-up date selection box .
driver.find_element(By.XPATH, '//div[@id="domesticDiv"]//dt[text()=" Destination "]').click()
# Select check-out date
b = driver.find_element(By.XPATH, '//input[@data-bindid="checkOut"]')
b.clear()
b.send_keys("2022-05-30") # Enter the date
time.sleep(1)
driver.find_element(By.XPATH, '//div[@id="domesticDiv"]//dt[text()=" Destination "]').click()
time.sleep(1)
# a=driver.find_element(By.XPATH,'//input[@data-bindid="allInOne"]')
# a.clear()
# a.send_keys(" Joy Gate Hotel ( Guangzhou rongchuang Cultural Tourism City store )")
# time.sleep(1)
# driver.find_element(By.XPATH,'//div[@id="domesticDiv"]//dt[text()=" Destination "]').click()
# ========2、 Click the search button =========
# time.sleep(0.5)
driver.find_element(By.XPATH, '//span[@data-bindid="search"]').click()
# ==========3、 Jump to a new page , Wait for new page content to load =========
time.sleep(7) # Wait for the new content to load , It takes a long time .
# ================4、 Get the name of the hotel 、 The price of the hotel 、 Hotel Evaluation ===============
# Get the information about the first hotel
# hotel_name=driver.find_element(By.XPATH,'//span[@class="name"]').text
# hotel_price=driver.find_element(By.XPATH,'//p[@class="loginToSee"]').text
# hotel_review=driver.find_element(By.XPATH,'//p[@class="score mb5"]').text
# print(" Hotel information :",hotel_name,hotel_review,hotel_price)
# ======================5、 Get the prices of all hotels on the current page 、 score 、 name
# find_elements(By.XPATH,) --- Get all the elements that match the expression .names It's a list . In the list are the element objects .
# All hotel name elements
total_names = driver.find_elements(By.XPATH, '//span[@class="name"]') # 20 Elements of the same type .
time.sleep(1)
# python Is used to store multiple data :list/ Dictionaries / Tuples / Assemble these methods .
# All hotel price elements
total_prices = driver.find_elements(By.XPATH, '//p[@class="loginToSee"]')
time.sleep(1)
# All hotel rating elements
total_previews = driver.find_elements(By.XPATH, '//p[@class="score mb5"]')
# from 3 Of the lists , Each value must be taken out .
# In the shop 20 Clothes . From 1 From the beginning to the 20 I have to take a look at everything . A glance is called an interview .
# This is called traversal / loop . From a to Z , Every member , You have to visit .20 A collection of .
# 20 A hotel . Every hotel , To get a name 、 Price 、 score ------ Traverse .
'''
for Variable in list :# In the list , Take every member , Given variable .
Get every member , What you will do .
Every hotel you get , I have to get the name of the hotel 、 Prices and ratings .
Traversal is :[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]
'''
# File operations -- My hotel data .txt
# Read and write operations . Create a file , Write data , Then close it .
# open -- File operations .
# When opening a file , Indicate how to write , And the encoding format is utf-8
fs = open(" My hotel data .txt", "w",encoding='UTF-8') # write -- w Writable mode . If the file does not exist, the file will be created and written . File exists , Direct write .
# There is only the file name , No write path , That's what I'm telling you python, I will generate the file in the current path .
#w This mode directly overwrites the contents of the file .
# write When writing , No line wrapping . Line break :\n
# for index in [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]:
for index in range(20):
print(total_names[index].text, total_prices[index].text, total_previews[index].text) # Got the name, price and score of each hotel .
fs.write(total_names[index].text + " ") # Before I close this file , It can be written continuously .
fs.write(total_prices[index].text + " ")
# fs.write(total_prices[index].get_attribute(" The attribute name ")+" ")# Get attribute value
# fs.write(total_prices[index].get_attribute('class') + " ")
fs.write(total_previews[index].text + "\n")
# Close file
fs.close()
# for The following variables can be named casually ,in This can be followed by a list , Besides the list, of course , Many are OK .
time.sleep(10)
#########6. More things : Choose the price first , Then go to see the score .###################
jiage=driver.find_element(By.XPATH,'//li[@class="radio fl"]//span[text()="150 Yuan of the following "]').click()
try:
pingfens = driver.find_elements(By.XPATH, '//p[@class="score mb5"]')
fn = open("150 Score data below yuan .txt", "w",encoding='UTF-8')
for score in range(20):
print(pingfens[score].text)
fn.write(pingfens[score].text+ "\n")
fn.close()
except:
pingfens = driver.find_elements(By.XPATH, '//p[@class="score mb5"]')
fn = open("150 Score data below yuan .txt", "w",encoding='UTF-8')
for score in range(20):
print(pingfens[score].text)
fn.write(pingfens[score].text+ "\n")
fn.close()
# Use the method of catching exceptions to avoid the currently encountered exceptions . Otherwise the code is OK , However, errors will be reported after multiple runs .
# ========7、 Close the browser , Close this session ========
time.sleep(10)
driver.quit() # Exit the relevant drive , Close all windows .
The successful running
Four 、 Summary and extension
1. summary
1.find_elements() Used to find all elements , And the result is a list .
2. How lists are handled -- Ergodic value , create a file .
3. Traverse the list -- for loop .
4. Write data to file .
2. Expand
Traverse according to the length of the list : Then master range Function usage . Reference link : Operation list
Run the fs.write(total_prices[index].get_attribute(" The attribute name ")+" ")# Get attribute value
and fs.write(total_prices[index].text + " ") The results are the same .
Operation results of the first mode
The operation result of the second mode
What I learned above , For example, leaders need to look at some data of the platform , You can use this script to access the company's system and take down the data . Sometimes leaders want a report . This is not useless , Use it at the right time .
Crawlers sometimes use this little automation knowledge , But not completely automated knowledge . Reptiles should learn well , I really need to learn deeply .
边栏推荐
- 真香!十五分钟搞定智能标注、模型训练、服务部署……
- mysql提升效率
- TypeScript(6)函数
- SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #1 of SELECT list is not in GROUP
- 多进程的坑记录( 不定时更新)
- 2 万字 + 30 张图 | 细聊 MySQL undo log、redo log、binlog 有什么用?
- Gather high-quality ar application developers, and help the AR field prosper with technology
- What is Objective-C ID in swift- What is the equivalent of an Objective-C id in Swift?
- Unity grid programming 09
- 最近学习的一些思考,附上答案,后续还需深入学习开发知识。
猜你喜欢

【贪心】leetcode1005K次取反后数组后的最大值

Set up your own website (4)
![[number theory] leetcode1006 Clumsy Factorial](/img/fc/21715d0b3a633bf9612a9e292df983.png)
[number theory] leetcode1006 Clumsy Factorial

华为云发布桌面IDE-CodeArts

Using apiccloud to realize document download and Preview

三胎终于来了!通用智能规划平台 - APS模块

使用NMT和pmap解决JVM资源泄漏问题

Rely on the robustness of trusted AI to effectively identify deep forgery and help banks fight identity fraud

2 万字 + 30 张图 | 细聊 MySQL undo log、redo log、binlog 有什么用?

In 2022, the number of mobile banking users in Q1 will reach 650million, and ESG personal financial product innovation will be strengthened
随机推荐
Tomb. Weekly update of Finance (February 14-20)
Uniswap decentralized exchange system development scheme
[greedy] leetcode1005k times the maximum value of the array after negation
关于#sql#的问题:sql中有人知道这些问题大概是怎么解答吗?
addslashes,stripslashes
Web网页自动化实战《4.获取所有酒店的名字、价格、评分信息,并写入文件》上篇
MySQL memory tuning
Gmail:如何跟踪邮件阅读状态
关于cookie和session的一些理解
Some understanding of cookies and sessions
Multi process pit records (updated from time to time)
Lighter weight! Harmonic cloud edge computing contribution won CNCF official praise again
Selection (037) - what is the output of the following code?
Someone is storing credit card data - how do they do it- Somebody is storing credit card data - how are they doing it?
The select drop-down box prohibits drop-down and does not affect presentation submission
Using apiccloud to realize document download and Preview
Web网页自动化实战《5.获取所有酒店的名字、价格、评分信息,并写入文件》下篇
Score-Based Generative Modeling through Stochastic Differential Equations
Daily practice (23): the first character that appears only once
WDS must know and know