当前位置:网站首页>spark:商品热门品类TOP10统计(案例)
spark:商品热门品类TOP10统计(案例)
2022-08-02 08:28:00 【一个人的牛牛】
目录
介绍
品类是指产品的分类,大型电商网站品类分多级,一般为三级分类,此次项目中品类只有一级。
不同的公司对热门的定义不一样。此次按照每个品类的 点击---->下单---->支付 的量来统计热门品类。先按照点击数排名,数量决定排名;点击数相同,比较下单数;下单数相同,比较支付数。
数据准备
点击链接下载数据(免费下载)
14万条用户行为数据,搜索、点击、下单、支付-spark文档类资源-CSDN下载
数据说明:

时间_用户ID_sessionID_页面ID_动作时间_搜索_点击(品类ID、产品ID)_下单(品类ID、产品ID)_支付(品类ID、产品ID)_城市ID
代码实现
分别统计每个品类点击的次数,下单的次数和支付的次数:
(品类,点击总数)(品类,下单总数)(品类,支付总数)
import org.apache.spark.{SparkConf, SparkContext}
object TopOne {
def main(args: Array[String]): Unit = {
//TODO 创建环境
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("TOP")
val sc = new SparkContext(sparkConf)
//TODO TOP热门商品
//读取日志文件
val rdd = sc.textFile("datas/action.txt")
rdd.cache()
//统计品类点击数量
//数据清洗
val clickRDD = rdd.filter(
action => {
val datas = action.split("_")
datas(6) != "-1"
}
)
//提取点击品类和数量并统计数量
val clickCountRDD = clickRDD.map(
action => {
val datas = action.split("_")
//(品类,数量)
(datas(6),1)
}
).reduceByKey(_+_)
// println(">>>>>>>>>")
// clickCountRDD.collect().foreach(println)
//统计品类下单数量
//数据清洗
val orderRDD = rdd.filter(
action => {
val datas = action.split("_")
datas(8) != "null"
}
)
//提取下单品类和数量并统计数量
val ordercountRDD = orderRDD.flatMap(
action => {
val datas = action.split("_")
val cid = datas(8)
//(品类,数量)
val cids = cid.split(",")
cids.map(id => (id, 1))
}
).reduceByKey(_ + _)
// println(">>>>>>>>")
// ordercountRDD.collect().foreach(println)
//统计品类支付数量
//清洗数据
val payRDD = rdd.filter(
action => {
val datas = action.split("_")
datas(10) != "null"
}
)
//提取支付品类和数量并统计数量
val paycountRDD = payRDD.flatMap(
action => {
val datas = action.split("_")
val cid = datas(10)
//(品类,数量)
val cids = cid.split(",")
cids.map(id => (id, 1))
}
).reduceByKey(_ + _)
// println(">>>>>>>>>>>>")
// paycountRDD.collect().foreach(println)
//排序————排序顺序:先点击-->再下单-->后支付
val cogroupRDD = clickCountRDD.cogroup(ordercountRDD, paycountRDD)
val cogroupRDD2 = cogroupRDD.mapValues {
case (clickIter, orderIter, payIter) => {
var clickCnt = 0
val iter1 = clickIter.iterator
if (iter1.hasNext) {
clickCnt = iter1.next()
}
var orderCnt = 0
val iter2 = orderIter.iterator
if (iter2.hasNext) {
orderCnt = iter2.next()
}
var payCnt = 0
val iter3 = payIter.iterator
if (iter3.hasNext) {
payCnt = iter3.next()
}
(clickCnt, orderCnt, payCnt)
}
}
val resultRDD = cogroupRDD2.sortBy(_._2, false).take(10)
//打印
resultRDD.foreach(println)
//TODO 关闭环境
sc.stop()
}
}
本文为学习笔记记录!
边栏推荐
猜你喜欢

Biotin-C6-amine|N-biotinyl-1,6-hexanediamine|CAS: 65953-56-2

EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network

Application and case analysis of CASA model and CENTURY model

HCIP笔记十六天

How to use postman

PyQt5 (a) PyQt5 installation and configuration, read from the folder and display images, simulation to generate the sketch image
![[ansible] playbook explains the execution steps in combination with the project](/img/fe/82b8562075fef33490d5aae7e809f5.png)
[ansible] playbook explains the execution steps in combination with the project

Redisson实现分布式锁

不用Swagger,那我用啥?

C Language Basics_Union
随机推荐
OneNote Tutorial, How to Create More Spaces in OneNote?
了解下C# 不安全代码
How Engineers Treat Open Source --- A veteran engineer's heartfelt words
PyCharm usage tutorial (detailed version - graphic and text combination)
oracle的sql改成mysql版本
了解下C# 多线程
Mysql Mac版下载安装教程
In a recent build figure SLAM, and locate the progress
WebGPU 导入[1] - 入门常见问题与个人分享
PostgreSQL学习总结(11)—— PostgreSQL 常用的高可用集群方案
A young man with strong blood and energy actually became a housekeeper. How did he successfully turn around and change careers?
大厂外包,值得拥有吗?
A little bit of knowledge - why do not usually cook with copper pots
redis-desktop-manager下载安装
C语言_条件编译
MySQL读写分离与主从延迟
Biotin-C6-amine|N-biotinyl-1,6-hexanediamine|CAS: 65953-56-2
unity pdg 设置隐藏不需要的节点以及实现自动勾选自动加载项
USACO美国信息学奥赛竞赛12月份开赛,中国学生备赛指南
What attributes and methods are available for page directives in JSP pages?