当前位置:网站首页>Dodging ice cream assassins?Crawling ice cream prices through crawlers
Dodging ice cream assassins?Crawling ice cream prices through crawlers
2022-07-30 17:44:00 【m0_54850825】
Requirements Analysis
The weather in summer is so hot that people don't want to move. Only staying in an air-conditioned room can bring a little comfort.Of course, there is no need to eat ice cream
However, the price of ice cream is not cheap now. For example, a certain Aido chocolate ice cream has been sold for 5 yuan in retail. How can I remember that it was only 3 yuan before? Anyway, the price is a little expensive.However, what I didn't expect was that a little friend suddenly said to me today, "I was fooled today. I picked up a medium ice cream at the convenience store. I didn't expect that they would charge me 16 yuan! I was attacked by ice cream assassins.!”
Medium ice cream?16 dollars?Oh my God, I asked him, "Then this is so expensive, why don't you put it back? You can eat more than 3 pieces of this one?"
My friend is very helpless, "I have taken it all, I have to pay the bill, I am embarrassed to put it back..."
Alas, I'm afraid this is a matter of life and death. What should I do?Is there any way to help my friends and avoid getting high-priced ice cream next time?Of course, there are already many big guys who have made cheats, such as imported ice cream is more expensive, ice cream with some kind of chocolate is more expensive, etc., but these rules are too complicated and not direct enough, we should take a faster method,Climb down the price of ice cream directly, so that you can see which ice cream is more expensive
Implementing Analysis
This demand is not very difficult, it is to crawl the price of ice cream, just find a store and save the name and price of the ice cream.I also easily found a target and sent a request through requests
If there is no accident, there is another accident, that is, why don't you see the data in the request?like this

It can be seen that the price behind the money symbol should be the price, but there is no price here. This is really strange. Where did the price go?Obviously there is a price on the page, how come there is no price in our requests?What the hell is going on here?
Well, then I can only find it, it's not difficult, if I guessed correctly, I think I have found the price

It can be seen that the price is in this request. There is a p above it. It should be price or something. So what is this?This is a jquery file, that is to say, the price of the ice cream is written in the page through jquery.Not visible in basic requests sent by requests
Okay, that's basically sure, don't think about it, it's time to use selenium again today.There may be some friends who don't understand it very much. Isn't it just a jquery file? Let's crawl this file and then parse it. Why don't we have to use selenium?
It's right to think this way, but if you want to determine the corresponding jquery file according to the page, you may need to go through an encryption parameter test during the period. Think about it and know that it takes too much time on this.If there is no special requirement, you must use selenium directly. The usage method is also very simple. Open a browser, then get the page, and get the code of the page through driver.page_source, which can be obtained as a normal requests request.The response used
Full code demo
from selenium import webdriverfrom lxml import etreefrom base64 import b64decodeurl = b64decode("aHR0cHM6Ly93d3cuamQuY29tL3BoYi8xMjIxODU1MTY0MzIxMmY1MDE5NTkuaHRtbA==").decode()driver = webdriver.Chrome()driver.get(url)html = etree.HTML(driver.page_source)driver.quit()i_name = html.xpath("//div[@class='detail']/a/text()")i_price = html.xpath("//span[@class='price-rmb']/text()")i_comment = html.xpath("//div[@class='evaluate-detail']/a/text()")text = ""for i in range(len(i_name)):text += "Name: " + i_name[i] + ""text += "price:" + i_price[i] + "yuan"text += "Comments: " + i_comment[i] + ""print(text)The result of running the program is as follows

In general, if you encounter a page that needs to be dynamically rendered, or a page that needs to execute js, if there are no special requirements, such as fast execution, or you are willing to pay a high costTo upgrade the program, otherwise, it is recommended to use the application of dynamic rendering directly, such as the use of selenium
In addition, it can also be seen that this program cannot directly calculate the unit price of ice cream because the selected page is general, because it is difficult to extract the quantity of ice cream. If you want to solve this problem, it is better to change to a better one's product page
边栏推荐
- Excel导入和导出
- FastJson反序列化漏洞(复现)
- Analysis and Simulation of Short Circuit Fault in Power System Based on MATLAB
- 2022年杭电多校第2场 1001 Static Query on Tree(树链剖分+哈希表差分
- 数据库系统原理与应用教程(063)—— MySQL 练习题:操作题 39-50(七):SELECT 基本语法联系
- (18)[系统调用]追踪系统调用(服务表)
- Error occurred while trying to proxy request The project suddenly can't get up
- 记者卧底
- C陷阱与缺陷 第7章 可移植性缺陷 7.1 应对C语言标准变更
- KDD‘21推荐系统离散特征表征无embedding table Learning to Embed Categorical Features without Embedding Tables for
猜你喜欢
随机推荐
fast shell porting
Analysis and Simulation of Short Circuit Fault in Power System Based on MATLAB
【云商店公告】关于7月30日帮助中心更新通知
How Google earth engine realizes the arrangement and selection of our time list
WeChat applet picker scroll selector use detailed explanation
知识蒸馏3:YOLOV5项目准备
S7-200SMART中定时器的使用方法和常见注意事项汇总
Summary of String Copy, Concatenation, Comparison and Split Functions (1)
简易的命令行入门教程
Win11如何把d盘空间分给c盘?Win11d盘分盘出来给c盘的方法
自动化早已不是那个自动化了,谈一谈自动化测试现状和自我感受……
Ecplise执行C语言报错:cannot open output file xxx.exe: Permission denied
宽带射频放大器OA4SMM4(1)
Google earth engine如何实现我们时间列表的排列和选取
论文阅读之《Color Constancy Using CNNs》
全球架构师峰会
FP6606CMP5 CPC-16L USB类型-C和PD充电控制器 百盛电子代理商
【网络工程】A、B、C、D、E类IP地址划分依据和特殊的IP地址
ERROR 2003 (HY000) Can‘t connect to MySQL server on ‘localhost3306‘ (10061)解决办法
浅谈在线编辑器中增量编译技术的应用




![(17)[系统调用]追踪系统调用(0环)](/img/d4/aa48745ac918ebfc45c07b587fa86f.png)




