当前位置:网站首页>Web page automation practice 4. get the name, price and rating information of all hotels and write them into the file

Web page automation practice 4. get the name, price and rating information of all hotels and write them into the file

2022-06-21 16:09:00 QingHan

Catalog

  • One 、find_elements() The role of
    • 1. Get the elements of all hotel names in the current page
    • 2. Get the elements of all hotel prices in the current page
    • 3. Get the elements of all hotel ratings in the current page
  • Two 、 Get the price of each store separately 、 score 、 Hotel name and write in the file
    • 1. Get the price of each store separately 、 score 、 Hotel name
    • 2. Write the obtained data into the file
  • 3、 ... and 、 Code
  • Four 、 Summary and extension
    • 1. summary
    • 2. expand

One 、find_elements() The role of

1. Get the elements of all hotel names in the current page

By element class Property to get the hotel name , Got it 20 individual

this 20 The format of each hotel is the same . Every div They're all independent . Every div It's all hotel information .

this 20 All the hotel names have the same father div

1)find_element(By.XPATH,)

find_element It means to find an element .//span[@class="name"] This expression may match one or more elements , How many are determined by the page .

find_element(By.XPATH,) Match only one of the found elements , And it is the first element to appear in the page .

Page in order , From the top html Start loading from top down . If there is more than one... In the page , Then it matches the first element .

2)find_elements(By.XPATH,)

To get this 20 The text content of an element , Their text content is the hotel name .

find_elements(By.XPATH,) ---- Get all the elements that match the expression .

Elements It shows html The elements in the are rendered in the same order as on the page .

2. Get the elements of all hotel prices in the current page

This expression matches to 20 Elements

3. Get the elements of all hotel ratings in the current page

This expression matches to 20 Elements

Two 、 Get the price of each store separately 、 score 、 Hotel name and write in the file

this 20 Elements , Every such element has a price 、 score 、 Hotel name .

1. Get the price of each store separately 、 score 、 Hotel name

These lines of code will be executed repeatedly , This is a traversal process . Wait until the last value is all taken , It will execute the following section . These lines are all indented , Indicates that each value is taken , What everyone will do .

The effect of this newline is print() It's done

2. Write the obtained data into the file

fs = open(" My hotel data .txt", "w",encoding='UTF-8') UTF-8 Support Chinese and English .

read : For example, reading a local data table , Local must be available to read , No, I can't read .

w Writable mode : file does not exist , Just create the file and write . File exists , Direct write .

w This mode directly overwrites the contents of the file when writing .

3、 ... and 、 Code

from selenium.webdriver.common.by import By

from selenium import webdriver
import time

#  Open Google browser , Established a session with the browser .
# driver Variable = conversation .
driver = webdriver.Chrome()
driver.get("https://www.elong.com/")  #  After this line of code is executed , Wait until the page is almost loaded before executing the next line of code .
# get() This function will wait until the page is loaded .
#  Sometimes the page is loaded , But the rendering is a little slow .
#  So I want to wait 1 Second is OK .
time.sleep(1)

#  Find elements through xpath Positioning mode .
ele = driver.find_element(By.XPATH, '//input[@data-bindid="city"]')  #  Locate the input field to the destination , Copy the expression you just wrote .
# ele=  The elements I found 
#  Click on the action  --  Click on the destination input box , The city selection box pops up .
ele.click()
time.sleep(2)  #  After running this line of code, it will stay 2 second , Then run the next line of code .
#  Because the next element to be manipulated , It is dynamic ( It's not the first time you visit a website , But you make an action to make others appear dynamically ).
#  It takes time to render on the page . This time you need to wait . Wait a minute , Then go to find this element to operate .


#  Input operation  --ele.send_keys(" Input operation ")
#  Get its properties -- ele.get_attribute(" The attribute name ")
#  Get its text content -- ele.text


#  Choose Guangzhou, one of the most popular cities 
driver.find_element(By.XPATH, '//li[@data="0|15"]').click()
time.sleep(1)  #  Plus the waiting time .sleep Time should not be too long ,7 second 8 second , This time is too long .
#  Run the code without waiting time , You will find that the operation is too fast , And no corresponding date is selected .

#  Select the check-in date 
ele = driver.find_element(By.XPATH, '//input[@data-bindid="checkIn"]')
ele.clear()  #  Before input date , Clear the contents of the input box first .
ele.send_keys("2022-05-27")

time.sleep(1)  #  Waiting time is added to each operation room .

'''
 After entering the date , The date box does not disappear , You have to make the date box disappear . Click on other elements besides it 
( Select a fixed element of the page , Then click the destination element ),
 The date box will disappear . Then go to the next element . Otherwise, the date box will block other elements .
 Next, click search , The search button is obscured by the date box . It will affect your operation effect .
 So I deal with it according to the characteristics of the page .
'''

#  Turn off the pop-up date selection box .
driver.find_element(By.XPATH, '//div[@id="domesticDiv"]//dt[text()=" Destination "]').click()

#  Select check-out date 
b = driver.find_element(By.XPATH, '//input[@data-bindid="checkOut"]')
b.clear()
b.send_keys("2022-05-30")  #  Enter the date 
time.sleep(1)
driver.find_element(By.XPATH, '//div[@id="domesticDiv"]//dt[text()=" Destination "]').click()
time.sleep(1)

# a=driver.find_element(By.XPATH,'//input[@data-bindid="allInOne"]')
# a.clear()
# a.send_keys(" Joy Gate Hotel ( Guangzhou rongchuang Cultural Tourism City store )")
# time.sleep(1)
# driver.find_element(By.XPATH,'//div[@id="domesticDiv"]//dt[text()=" Destination "]').click()

# ========2、 Click the search button =========
# time.sleep(0.5)
driver.find_element(By.XPATH, '//span[@data-bindid="search"]').click()

# ==========3、 Jump to a new page , Wait for new page content to load =========
time.sleep(7)  #  Wait for the new content to load , It takes a long time .

# ================4、 Get the name of the hotel 、 The price of the hotel 、 Hotel Evaluation ===============
#  Get the information about the first hotel 
# hotel_name=driver.find_element(By.XPATH,'//span[@class="name"]').text
# hotel_price=driver.find_element(By.XPATH,'//p[@class="loginToSee"]').text
# hotel_review=driver.find_element(By.XPATH,'//p[@class="score mb5"]').text
# print(" Hotel information :",hotel_name,hotel_review,hotel_price)


# ======================5、 Get the prices of all hotels on the current page 、 score 、 name 
# find_elements(By.XPATH,) ---  Get all the elements that match the expression .names It's a list . In the list are the element objects .
#  All hotel name elements 
total_names = driver.find_elements(By.XPATH, '//span[@class="name"]')  # 20 Elements of the same type .
time.sleep(1)

# python Is used to store multiple data :list/ Dictionaries / Tuples / Assemble these methods .


#  All hotel price elements 
total_prices = driver.find_elements(By.XPATH, '//p[@class="loginToSee"]')
time.sleep(1)
#  All hotel rating elements 
total_previews = driver.find_elements(By.XPATH, '//p[@class="score mb5"]')

#  from 3 Of the lists , Each value must be taken out .
#  In the shop 20 Clothes . From 1 From the beginning to the 20 I have to take a look at everything . A glance is called an interview .
#  This is called traversal / loop . From a to Z , Every member , You have to visit .20 A collection of .
# 20 A hotel . Every hotel , To get a name 、 Price 、 score ------ Traverse .
'''
for  Variable  in  list :#  In the list , Take every member , Given variable .
     Get every member , What you will do .
     Every hotel you get , I have to get the name of the hotel 、 Prices and ratings .

 Traversal is :[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]
'''


#  File operations  --  My hotel data .txt
#  Read and write operations . Create a file , Write data , Then close it .
# open --  File operations .
#  When opening a file , Indicate how to write , And the encoding format is utf-8
fs = open(" My hotel data .txt", "w",encoding='UTF-8')  # write -- w   Writable mode . If the file does not exist, the file will be created and written . File exists , Direct write .
#  There is only the file name , No write path , That's what I'm telling you python, I will generate the file in the current path .
#w  This mode directly overwrites the contents of the file .
# write When writing , No line wrapping .   Line break :\n


# for index in [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]:
for index in range(20):
    print(total_names[index].text, total_prices[index].text, total_previews[index].text)  #  Got the name, price and score of each hotel .
    fs.write(total_names[index].text + "  ")  #  Before I close this file , It can be written continuously .
    fs.write(total_prices[index].text + "  ")
    # fs.write(total_prices[index].get_attribute(" The attribute name ")+"    ")# Get attribute value 
    # fs.write(total_prices[index].get_attribute('class') + "    ")
    fs.write(total_previews[index].text + "\n")

#  Close file 
fs.close()

# for  The following variables can be named casually ,in This can be followed by a list , Besides the list, of course , Many are OK .
time.sleep(10)

#########6. More things : Choose the price first , Then go to see the score .###################
jiage=driver.find_element(By.XPATH,'//li[@class="radio fl"]//span[text()="150 Yuan of the following "]').click()
try:
    pingfens = driver.find_elements(By.XPATH, '//p[@class="score mb5"]')
    fn = open("150 Score data below yuan .txt", "w",encoding='UTF-8')
    for score in range(20):
        print(pingfens[score].text)
        fn.write(pingfens[score].text+ "\n")
    fn.close()
except:
    pingfens = driver.find_elements(By.XPATH, '//p[@class="score mb5"]')
    fn = open("150 Score data below yuan .txt", "w",encoding='UTF-8')
    for score in range(20):
        print(pingfens[score].text)
        fn.write(pingfens[score].text+ "\n")
    fn.close()
# Use the method of catching exceptions to avoid the currently encountered exceptions . Otherwise the code is OK , However, errors will be reported after multiple runs .

# ========7、 Close the browser , Close this session ========
time.sleep(10)
driver.quit()  #  Exit the relevant drive , Close all windows .

The successful running

Four 、 Summary and extension

1. summary

1.find_elements() Used to find all elements , And the result is a list .

2. How lists are handled -- Ergodic value , create a file .

3. Traverse the list -- for loop .

4. Write data to file .

2. Expand

Traverse according to the length of the list : Then master range Function usage . Reference link : Operation list

Run the fs.write(total_prices[index].get_attribute(" The attribute name ")+" ")# Get attribute value

and fs.write(total_prices[index].text + " ") The results are the same .

Operation results of the first mode

The operation result of the second mode

What I learned above , For example, leaders need to look at some data of the platform , You can use this script to access the company's system and take down the data . Sometimes leaders want a report . This is not useless , Use it at the right time .

Crawlers sometimes use this little automation knowledge , But not completely automated knowledge . Reptiles should learn well , I really need to learn deeply .

原网站

版权声明
本文为[QingHan]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/172/202206211535340927.html