当前位置:网站首页>Dodging ice cream assassins?Crawling ice cream prices through crawlers
Dodging ice cream assassins?Crawling ice cream prices through crawlers
2022-07-30 17:44:00 【m0_54850825】
Requirements Analysis
The weather in summer is so hot that people don't want to move. Only staying in an air-conditioned room can bring a little comfort.Of course, there is no need to eat ice cream
However, the price of ice cream is not cheap now. For example, a certain Aido chocolate ice cream has been sold for 5 yuan in retail. How can I remember that it was only 3 yuan before? Anyway, the price is a little expensive.However, what I didn't expect was that a little friend suddenly said to me today, "I was fooled today. I picked up a medium ice cream at the convenience store. I didn't expect that they would charge me 16 yuan! I was attacked by ice cream assassins.!”
Medium ice cream?16 dollars?Oh my God, I asked him, "Then this is so expensive, why don't you put it back? You can eat more than 3 pieces of this one?"
My friend is very helpless, "I have taken it all, I have to pay the bill, I am embarrassed to put it back..."
Alas, I'm afraid this is a matter of life and death. What should I do?Is there any way to help my friends and avoid getting high-priced ice cream next time?Of course, there are already many big guys who have made cheats, such as imported ice cream is more expensive, ice cream with some kind of chocolate is more expensive, etc., but these rules are too complicated and not direct enough, we should take a faster method,Climb down the price of ice cream directly, so that you can see which ice cream is more expensive
Implementing Analysis
This demand is not very difficult, it is to crawl the price of ice cream, just find a store and save the name and price of the ice cream.I also easily found a target and sent a request through requests
If there is no accident, there is another accident, that is, why don't you see the data in the request?like this

It can be seen that the price behind the money symbol should be the price, but there is no price here. This is really strange. Where did the price go?Obviously there is a price on the page, how come there is no price in our requests?What the hell is going on here?
Well, then I can only find it, it's not difficult, if I guessed correctly, I think I have found the price

It can be seen that the price is in this request. There is a p above it. It should be price or something. So what is this?This is a jquery file, that is to say, the price of the ice cream is written in the page through jquery.Not visible in basic requests sent by requests
Okay, that's basically sure, don't think about it, it's time to use selenium again today.There may be some friends who don't understand it very much. Isn't it just a jquery file? Let's crawl this file and then parse it. Why don't we have to use selenium?
It's right to think this way, but if you want to determine the corresponding jquery file according to the page, you may need to go through an encryption parameter test during the period. Think about it and know that it takes too much time on this.If there is no special requirement, you must use selenium directly. The usage method is also very simple. Open a browser, then get the page, and get the code of the page through driver.page_source, which can be obtained as a normal requests request.The response used
Full code demo
from selenium import webdriverfrom lxml import etreefrom base64 import b64decodeurl = b64decode("aHR0cHM6Ly93d3cuamQuY29tL3BoYi8xMjIxODU1MTY0MzIxMmY1MDE5NTkuaHRtbA==").decode()driver = webdriver.Chrome()driver.get(url)html = etree.HTML(driver.page_source)driver.quit()i_name = html.xpath("//div[@class='detail']/a/text()")i_price = html.xpath("//span[@class='price-rmb']/text()")i_comment = html.xpath("//div[@class='evaluate-detail']/a/text()")text = ""for i in range(len(i_name)):text += "Name: " + i_name[i] + ""text += "price:" + i_price[i] + "yuan"text += "Comments: " + i_comment[i] + ""print(text)The result of running the program is as follows

In general, if you encounter a page that needs to be dynamically rendered, or a page that needs to execute js, if there are no special requirements, such as fast execution, or you are willing to pay a high costTo upgrade the program, otherwise, it is recommended to use the application of dynamic rendering directly, such as the use of selenium
In addition, it can also be seen that this program cannot directly calculate the unit price of ice cream because the selected page is general, because it is difficult to extract the quantity of ice cream. If you want to solve this problem, it is better to change to a better one's product page
边栏推荐
- 图卷积神经网络的数学原理——谱图理论和傅里叶变换初探
- Arranger software FL Studio Chinese version installation tutorial and switching language tutorial
- 18.支持向量机(SVM)的介绍
- (17)[系统调用]追踪系统调用(0环)
- 编曲软件FL Studio中文版安装教程及切换语言教程
- 592. Fraction Addition and Subtraction
- 【解决】关于 Unity Hub 获取许可证失败 或 无响应导致无法开发的问题
- Dive deep on Netflix‘s recommender system(Netflix推荐系统是如何实现的?)
- 华为无线设备Mesh配置命令
- Promise入门到精通(1.5w字详解)
猜你喜欢

论文阅读之《Underwater scene prior inspired deep underwater image and video Enhancement (UWCNN)》

Mathematical Principles of Graph Convolutional Neural Networks——A Preliminary Study on Spectral Graph Theory and Fourier Transform

Summary of String Copy, Concatenation, Comparison and Split Functions (1)

Shell implementation based on stm32

一个 15 年 SAP ABAP 开发人员分享的 SAPGUI 一些个性化设置和实用小技巧试读版

知识蒸馏3:YOLOV5项目准备

KDD 2020 | 深入浅出优势特征蒸馏在淘宝推荐中的应用

首发!阿里技术大牛最新耗时半个月整理出最全MySQL性能优化和高可用架构技术宝典,直接封神!

weiit新零售小程序如何探索数字化门店的破局之路

MySQL中的存储过程(详细篇)
随机推荐
数据库系统原理与应用教程(063)—— MySQL 练习题:操作题 39-50(七):SELECT 基本语法联系
知识蒸馏2:目标检测中的知识蒸馏
Express framework connects MySQL and ORM framework
顺通海关查验预约综合管理系统
bean的生命周期
信息学奥赛一本通 1966:【14NOIP普及组】比例简化 | 洛谷 P2118 [NOIP2014 普及组] 比例简化
js中的基础知识点 —— BOM
592. Fraction Addition and Subtraction
Mathematical Principles of Graph Convolutional Neural Networks——A Preliminary Study on Spectral Graph Theory and Fourier Transform
升级Win11后不喜欢怎么退回Win10系统?
mysql刷脏的几种场景以及相关参数
一个 15 年 SAP ABAP 开发人员分享的 SAPGUI 一些个性化设置和实用小技巧
FP6606CMP5 CPC-16L USB类型-C和PD充电控制器 百盛电子代理商
Analysis and Simulation of Short Circuit Fault in Power System Based on MATLAB
C陷阱与缺陷 第6章 预处理器 6.4 宏并不是类型定义
图注意力机制
un7.30:linux——如何在docker容器中安装MySQL?
Research on intelligent charging strategy of matlab simulink lithium-ion battery
C陷阱与缺陷 第7章 可移植性缺陷 7.3 整数的大小
Google earth engine如何实现我们时间列表的排列和选取