当前位置:网站首页>Dodging ice cream assassins?Crawling ice cream prices through crawlers
Dodging ice cream assassins?Crawling ice cream prices through crawlers
2022-07-30 17:44:00 【m0_54850825】
Requirements Analysis
The weather in summer is so hot that people don't want to move. Only staying in an air-conditioned room can bring a little comfort.Of course, there is no need to eat ice cream
However, the price of ice cream is not cheap now. For example, a certain Aido chocolate ice cream has been sold for 5 yuan in retail. How can I remember that it was only 3 yuan before? Anyway, the price is a little expensive.However, what I didn't expect was that a little friend suddenly said to me today, "I was fooled today. I picked up a medium ice cream at the convenience store. I didn't expect that they would charge me 16 yuan! I was attacked by ice cream assassins.!”
Medium ice cream?16 dollars?Oh my God, I asked him, "Then this is so expensive, why don't you put it back? You can eat more than 3 pieces of this one?"
My friend is very helpless, "I have taken it all, I have to pay the bill, I am embarrassed to put it back..."
Alas, I'm afraid this is a matter of life and death. What should I do?Is there any way to help my friends and avoid getting high-priced ice cream next time?Of course, there are already many big guys who have made cheats, such as imported ice cream is more expensive, ice cream with some kind of chocolate is more expensive, etc., but these rules are too complicated and not direct enough, we should take a faster method,Climb down the price of ice cream directly, so that you can see which ice cream is more expensive
Implementing Analysis
This demand is not very difficult, it is to crawl the price of ice cream, just find a store and save the name and price of the ice cream.I also easily found a target and sent a request through requests
If there is no accident, there is another accident, that is, why don't you see the data in the request?like this

It can be seen that the price behind the money symbol should be the price, but there is no price here. This is really strange. Where did the price go?Obviously there is a price on the page, how come there is no price in our requests?What the hell is going on here?
Well, then I can only find it, it's not difficult, if I guessed correctly, I think I have found the price

It can be seen that the price is in this request. There is a p above it. It should be price or something. So what is this?This is a jquery file, that is to say, the price of the ice cream is written in the page through jquery.Not visible in basic requests sent by requests
Okay, that's basically sure, don't think about it, it's time to use selenium again today.There may be some friends who don't understand it very much. Isn't it just a jquery file? Let's crawl this file and then parse it. Why don't we have to use selenium?
It's right to think this way, but if you want to determine the corresponding jquery file according to the page, you may need to go through an encryption parameter test during the period. Think about it and know that it takes too much time on this.If there is no special requirement, you must use selenium directly. The usage method is also very simple. Open a browser, then get the page, and get the code of the page through driver.page_source, which can be obtained as a normal requests request.The response used
Full code demo
from selenium import webdriverfrom lxml import etreefrom base64 import b64decodeurl = b64decode("aHR0cHM6Ly93d3cuamQuY29tL3BoYi8xMjIxODU1MTY0MzIxMmY1MDE5NTkuaHRtbA==").decode()driver = webdriver.Chrome()driver.get(url)html = etree.HTML(driver.page_source)driver.quit()i_name = html.xpath("//div[@class='detail']/a/text()")i_price = html.xpath("//span[@class='price-rmb']/text()")i_comment = html.xpath("//div[@class='evaluate-detail']/a/text()")text = ""for i in range(len(i_name)):text += "Name: " + i_name[i] + ""text += "price:" + i_price[i] + "yuan"text += "Comments: " + i_comment[i] + ""print(text)The result of running the program is as follows

In general, if you encounter a page that needs to be dynamically rendered, or a page that needs to execute js, if there are no special requirements, such as fast execution, or you are willing to pay a high costTo upgrade the program, otherwise, it is recommended to use the application of dynamic rendering directly, such as the use of selenium
In addition, it can also be seen that this program cannot directly calculate the unit price of ice cream because the selected page is general, because it is difficult to extract the quantity of ice cream. If you want to solve this problem, it is better to change to a better one's product page
边栏推荐
猜你喜欢

测试行业干了5年,从只会点点点到了现在的测试开发,总算是证明了自己

Wanhua chemical fine chemical industry innovation product assembly

知识蒸馏2:目标检测中的知识蒸馏

UE5第一人称射击游戏蓝图教程

JMeter笔记3 | JMeter安装和环境说明

宝塔搭建PHP自适应懒人网址导航源码实测

Excel导入和导出

从零开始的Multi-armed Bandit

Prometheus 基本概念

论文阅读之《Underwater scene prior inspired deep underwater image and video Enhancement (UWCNN)》
随机推荐
C陷阱与缺陷 第7章 可移植性缺陷 7.5 移位运算符
Web3时代重要基础设施深度拆解:4EVERLAND
知识蒸馏3:YOLOV5项目准备
592. Fraction Addition and Subtraction
SLIM: Sparse Linear Methods (TopN推荐)
Arranger software FL Studio Chinese version installation tutorial and switching language tutorial
Redis缓存穿透-热点缓存并发重建-缓存与数据库双写不一致-缓存雪崩
Tensorflow中实现正则化
线程同步 控制执行顺序
自动化早已不是那个自动化了,谈一谈自动化测试现状和自我感受……
从零开始的Multi-armed Bandit
Wincc报表教程(SQL数据库的建立,wincc在数据库中保存和查询数据,调用Excel模板把数据保存到指定的位置和打印功能)
bert-base调试心得
un7.30:Linux——如何在docker容器中显示MySQL的中文字符?
Ecplise执行C语言报错:cannot open output file xxx.exe: Permission denied
信息学奥赛一本通 1915:【01NOIP普及组】最大公约数与最小公倍数 | 洛谷 P1029 [NOIP2001 普及组] 最大公约数和最小公倍数问题
windwons 下GPU环境和pytorch安装
Valid bracketed strings [greedy exercise]
【综合类型第 34 篇】喜讯!喜讯!!喜讯!!!,我在 CSDN 的第一个实体铭牌
UE5第一人称射击游戏蓝图教程