当前位置:网站首页>spark filter
spark filter
2022-07-31 09:02:00 【JAVA becomes a god】
//Filter fields you don't needrdd.filter(x=>{//filterval arr =x.split(",")val jg =arr(3)if(jg.eq("")){//If it is empty then don'tfalse}else{true}})map for grouping
map(x=>{//Group by the same typeval arr =x.split(",")val tp = arr(2)val jg =arr(3).toFloat(tp,jg)Turn to hasmap, followed by a }).collectAsMap() after rdd
}).groupByKey().map(x=>{val tp =x._1 //group groupingvar sum =0.0 //summationfor (y<- x._2){sum+=y//accumulate prices of the same type}val age =sum/x._2.size//Calculate the mean(tp,age)}).collectAsMap()//Convert the tuple to hasmap, the value is the map valuearr(3)=map.get(tp).toString//Remove the corresponding value in the maparr.mkString(",")//Link this array with string commas and returnFill in the blank case for product average value
1,AAA,Beverage,3.5,Wuhan2, BBB, drinks, 2.5, Wuhan3, CCC, Tobacco and Alcohol, 3.5, Shanghai11, OOO, Tobacco and Alcohol, 5.75, Shanghai4, DDD, drinks, 12.5, Shanghai5, EEE, drinks, 22.5, Wuhan6, FFF, Tobacco and Alcohol, 7.5, Shanghai11, OOO, drinks, 8.5, Wuhan7, GGG, drinks, 4.5, Wenzhou8, HHH, Tobacco and Alcohol, 9.5, Guiyang9, JJJ, Tobacco and Alcohol, 2.5, Nanning10, KKK, drinks, 5.5, Nanningpackage sparkimport org.apache.spark.sql.SparkSessionobject mean fill {def main(args: Array[String]): Unit = {val sparkSession =SparkSession.builder().master("local").appName("Mean Fill").getOrCreate()val sc = sparkSession.sparkContextval rdd =sc.textFile("src/product.txt")//1. Calculate the mean of each categoryval map =rdd.filter(x=>{//filterval arr =x.split(",")val jg =arr(3)if(jg.equals("")){//If it is empty then don'tfalse}else{true}}).map(x=>{//Group by the same typeval arr =x.split(",")val tp = arr(2)val jg =arr(3).toFloat(tp,jg)}).groupByKey().map(x=>{val tp =x._1 //group groupingvar sum =0.0 //summationfor (y<- x._2){sum+=y//accumulate prices of the same type}val age =sum/x._2.size//Calculate the mean(tp,age)}).collectAsMap()//Convert the tuple to hasmap, the value is the map value//complete empty value fillingrdd.map(x=>{val arr=x.split(",")val age =arr(3)val tp =arr(2)//Remove item typeif (age.equals("")){arr(3)=map.get(tp).get.toString//Remove the corresponding value in the map}arr.mkString(",")//Link this array with string commas and return}).saveAsTextFile("data/out1")sparkSession.stop()}}边栏推荐
猜你喜欢

Small application project development, jingdong mall 】 【 uni - app custom search component (below) - search history

Cloud server deployment web project

【小程序项目开发-- 京东商城】uni-app之商品列表页面 (上)

【云原生与5G】微服务加持5G核心网
![[MySQL exercises] Chapter 2 Basic operations of databases and data tables](/img/43/73a59a293d4708b6f9aeae990a7029.png)
[MySQL exercises] Chapter 2 Basic operations of databases and data tables

利用frp服务器进行内网穿透ssh访问

【MySQL功法】第4话 · 和kiko一起探索MySQL中的运算符

postgresql 范围查询比索引查询快吗?

[What is the role of auto_increment in MySQL?】

射频电路学习之滤波电路
随机推荐
JSP response,request操作中(中文乱码)-如何解决呢?
[MySQL exercises] Chapter 3 Common data types in MySQL
[转载] Virtual Studio 让系统找到需要的头文件和库
【小程序专栏】总结uniapp开发小程序的开发规范
【Unity】编辑器扩展-02-拓展Hierarchy视图
安装sambe
[MySQL exercises] Chapter 5 · SQL single table query
qt在不同的线程中传递自定义结构体参数
蚂蚁核心科技产品亮相数字中国建设峰会 持续助力企业数字化转型
文件的逻辑结构与物理结构的对比与区别
JSP exception对象简介说明
35-Jenkins-共享库应用
怎样修改MySQL数据库的密码
matlab常用符号用法总结
【云原生与5G】微服务加持5G核心网
如何在 Linux 上安装 MySQL
Job hunting product manager [9] How to write a good resume in job hunting season?
【NLP】Transformer理论解读
期刊会议排名、信息检索网站推荐以及IEEE Latex模板下载
Andoird开发--指南针(基于手机传感器)