当前位置:网站首页>spark filter
spark filter
2022-07-31 09:02:00 【JAVA becomes a god】
//Filter fields you don't needrdd.filter(x=>{//filterval arr =x.split(",")val jg =arr(3)if(jg.eq("")){//If it is empty then don'tfalse}else{true}})map for grouping
map(x=>{//Group by the same typeval arr =x.split(",")val tp = arr(2)val jg =arr(3).toFloat(tp,jg)Turn to hasmap, followed by a }).collectAsMap() after rdd
}).groupByKey().map(x=>{val tp =x._1 //group groupingvar sum =0.0 //summationfor (y<- x._2){sum+=y//accumulate prices of the same type}val age =sum/x._2.size//Calculate the mean(tp,age)}).collectAsMap()//Convert the tuple to hasmap, the value is the map valuearr(3)=map.get(tp).toString//Remove the corresponding value in the maparr.mkString(",")//Link this array with string commas and returnFill in the blank case for product average value
1,AAA,Beverage,3.5,Wuhan2, BBB, drinks, 2.5, Wuhan3, CCC, Tobacco and Alcohol, 3.5, Shanghai11, OOO, Tobacco and Alcohol, 5.75, Shanghai4, DDD, drinks, 12.5, Shanghai5, EEE, drinks, 22.5, Wuhan6, FFF, Tobacco and Alcohol, 7.5, Shanghai11, OOO, drinks, 8.5, Wuhan7, GGG, drinks, 4.5, Wenzhou8, HHH, Tobacco and Alcohol, 9.5, Guiyang9, JJJ, Tobacco and Alcohol, 2.5, Nanning10, KKK, drinks, 5.5, Nanningpackage sparkimport org.apache.spark.sql.SparkSessionobject mean fill {def main(args: Array[String]): Unit = {val sparkSession =SparkSession.builder().master("local").appName("Mean Fill").getOrCreate()val sc = sparkSession.sparkContextval rdd =sc.textFile("src/product.txt")//1. Calculate the mean of each categoryval map =rdd.filter(x=>{//filterval arr =x.split(",")val jg =arr(3)if(jg.equals("")){//If it is empty then don'tfalse}else{true}}).map(x=>{//Group by the same typeval arr =x.split(",")val tp = arr(2)val jg =arr(3).toFloat(tp,jg)}).groupByKey().map(x=>{val tp =x._1 //group groupingvar sum =0.0 //summationfor (y<- x._2){sum+=y//accumulate prices of the same type}val age =sum/x._2.size//Calculate the mean(tp,age)}).collectAsMap()//Convert the tuple to hasmap, the value is the map value//complete empty value fillingrdd.map(x=>{val arr=x.split(",")val age =arr(3)val tp =arr(2)//Remove item typeif (age.equals("")){arr(3)=map.get(tp).get.toString//Remove the corresponding value in the map}arr.mkString(",")//Link this array with string commas and return}).saveAsTextFile("data/out1")sparkSession.stop()}}边栏推荐
猜你喜欢

How to upgrade nodejs version

【云原生与5G】微服务加持5G核心网

【小程序项目开发 -- 京东商城】uni-app 商品分类页面(下)
![[Yellow ah code] Introduction to MySQL - 3. I use select, the boss directly drives me to take the train home, and I still buy a station ticket](/img/7b/f50c5f4b16a376273ba8cd27543676.png)
[Yellow ah code] Introduction to MySQL - 3. I use select, the boss directly drives me to take the train home, and I still buy a station ticket
Doraemon teach you forwarded and redirect page

A, MySQL principle of master-slave replication

【Unity】编辑器扩展-01-拓展Project视图

HTC官方RUU固件提取刷机包rom.zip以及RUU解密教程

二叉树的搜索与回溯问题(leetcode)

How to restore data using mysql binlog
随机推荐
The torch distributed training
云服务器部署 Web 项目
matlab常用符号用法总结
哪些字符串会被FastJson解析为null呢
如何使用mysql binlog 恢复数据
I advise those juniors and juniors who have just started working: If you want to enter a big factory, you must master these core skills!Complete Learning Route!
JSP application对象简介说明
射频电路学习之滤波电路
【Excel】生成随机数字/字符
0730~Mysql优化
来n遍剑指--07. 重建二叉树
刷题《剑指Offer》day06
2022/7/30 考试总结
Pytorch学习记录(七):自定义模型 & Auto-Encoders
来n遍剑指--09. 用两个栈实现队列
HTC官方RUU固件提取刷机包rom.zip以及RUU解密教程
我的创作纪念日
如何在一台机器上(windows)安装两个MYSQL数据库
SQL statement knowledge
Define event types in Splunk Web