当前位置:网站首页>Dodging ice cream assassins?Crawling ice cream prices through crawlers
Dodging ice cream assassins?Crawling ice cream prices through crawlers
2022-07-30 17:44:00 【m0_54850825】
Requirements Analysis
The weather in summer is so hot that people don't want to move. Only staying in an air-conditioned room can bring a little comfort.Of course, there is no need to eat ice cream
However, the price of ice cream is not cheap now. For example, a certain Aido chocolate ice cream has been sold for 5 yuan in retail. How can I remember that it was only 3 yuan before? Anyway, the price is a little expensive.However, what I didn't expect was that a little friend suddenly said to me today, "I was fooled today. I picked up a medium ice cream at the convenience store. I didn't expect that they would charge me 16 yuan! I was attacked by ice cream assassins.!”
Medium ice cream?16 dollars?Oh my God, I asked him, "Then this is so expensive, why don't you put it back? You can eat more than 3 pieces of this one?"
My friend is very helpless, "I have taken it all, I have to pay the bill, I am embarrassed to put it back..."
Alas, I'm afraid this is a matter of life and death. What should I do?Is there any way to help my friends and avoid getting high-priced ice cream next time?Of course, there are already many big guys who have made cheats, such as imported ice cream is more expensive, ice cream with some kind of chocolate is more expensive, etc., but these rules are too complicated and not direct enough, we should take a faster method,Climb down the price of ice cream directly, so that you can see which ice cream is more expensive
Implementing Analysis
This demand is not very difficult, it is to crawl the price of ice cream, just find a store and save the name and price of the ice cream.I also easily found a target and sent a request through requests
If there is no accident, there is another accident, that is, why don't you see the data in the request?like this
It can be seen that the price behind the money symbol should be the price, but there is no price here. This is really strange. Where did the price go?Obviously there is a price on the page, how come there is no price in our requests?What the hell is going on here?
Well, then I can only find it, it's not difficult, if I guessed correctly, I think I have found the price
It can be seen that the price is in this request. There is a p above it. It should be price or something. So what is this?This is a jquery file, that is to say, the price of the ice cream is written in the page through jquery.Not visible in basic requests sent by requests
Okay, that's basically sure, don't think about it, it's time to use selenium again today.There may be some friends who don't understand it very much. Isn't it just a jquery file? Let's crawl this file and then parse it. Why don't we have to use selenium?
It's right to think this way, but if you want to determine the corresponding jquery file according to the page, you may need to go through an encryption parameter test during the period. Think about it and know that it takes too much time on this.If there is no special requirement, you must use selenium directly. The usage method is also very simple. Open a browser, then get the page, and get the code of the page through driver.page_source, which can be obtained as a normal requests request.The response used
Full code demo
from selenium import webdriverfrom lxml import etreefrom base64 import b64decodeurl = b64decode("aHR0cHM6Ly93d3cuamQuY29tL3BoYi8xMjIxODU1MTY0MzIxMmY1MDE5NTkuaHRtbA==").decode()driver = webdriver.Chrome()driver.get(url)html = etree.HTML(driver.page_source)driver.quit()i_name = html.xpath("//div[@class='detail']/a/text()")i_price = html.xpath("//span[@class='price-rmb']/text()")i_comment = html.xpath("//div[@class='evaluate-detail']/a/text()")text = ""for i in range(len(i_name)):text += "Name: " + i_name[i] + ""text += "price:" + i_price[i] + "yuan"text += "Comments: " + i_comment[i] + ""print(text)
The result of running the program is as follows
In general, if you encounter a page that needs to be dynamically rendered, or a page that needs to execute js, if there are no special requirements, such as fast execution, or you are willing to pay a high costTo upgrade the program, otherwise, it is recommended to use the application of dynamic rendering directly, such as the use of selenium
In addition, it can also be seen that this program cannot directly calculate the unit price of ice cream because the selected page is general, because it is difficult to extract the quantity of ice cream. If you want to solve this problem, it is better to change to a better one's product page
边栏推荐
- 592. Fraction Addition and Subtraction
- 一个 15 年 SAP ABAP 开发人员分享的 SAPGUI 一些个性化设置和实用小技巧
- C陷阱与缺陷 第7章 可移植性缺陷 7.1 应对C语言标准变更
- 一个 15 年 SAP ABAP 开发人员分享的 SAPGUI 一些个性化设置和实用小技巧试读版
- (17)[系统调用]追踪系统调用(0环)
- 数据预处理:离散特征编码方法
- 腾讯专家献上技术干货,带你一览腾讯广告召回系统的演进
- How Google earth engine realizes the arrangement and selection of our time list
- 游戏化产品搭建思路的拆解与探究
- Prometheus 基本概念
猜你喜欢
查询表中开始日期与结束日期
[HarekazeCTF2019] Avatar Uploader 1
Daily practice------Generate 13-digit bar, Ean-13 code rule: The thirteenth digit is the check code obtained by the calculation of the first twelve digits.
基于MATLAB的电力系统短路故障分析与仿真
JMeter Notes 4 | JMeter Interface Introduction
从零开始的Multi-armed Bandit
FP6606CMP5 CPC-16L USB类型-C和PD充电控制器 百盛电子代理商
matlab simulink锂离子电池智能充电策略研究
592. Fraction Addition and Subtraction
多年以后「PageHelper」又深深的给我上了一课
随机推荐
How Google earth engine realizes the arrangement and selection of our time list
公司部门来了个00后测试卷王之王,老油条表示真干不过,已经...
Insert data into MySQL in C language
Valid bracketed strings [greedy exercise]
C陷阱与缺陷 第7章 可移植性缺陷 7.5 移位运算符
Metaverse Web 3.0 和 DeFi大师班
中文字符集编码Unicode ,gb2312 , cp936 ,GBK,GB18030
信息学奥赛一本通 1915:【01NOIP普及组】最大公约数与最小公倍数 | 洛谷 P1029 [NOIP2001 普及组] 最大公约数和最小公倍数问题
S7-200SMART中定时器的使用方法和常见注意事项汇总
数据库系统原理与应用教程(069)—— MySQL 练习题:操作题 95-100(十三):分组查询与聚合函数的使用
C陷阱与缺陷 第6章 预处理器 6.3 宏并不是语句
Web3时代重要基础设施深度拆解:4EVERLAND
ERROR 2003 (HY000) Can't connect to MySQL server on 'localhost3306' (10061)Solution
图卷积神经网络的数学原理——谱图理论和傅里叶变换初探
Mongoose module
PyTorch 猫狗分类源代码及数据集
【网络工程】A、B、C、D、E类IP地址划分依据和特殊的IP地址
Tensorflow模型量化(Quantization)原理及其实现方法
bean的生命周期
图注意力机制