当前位置:网站首页>spark filter
spark filter
2022-07-31 09:02:00 【JAVA becomes a god】
//Filter fields you don't needrdd.filter(x=>{//filterval arr =x.split(",")val jg =arr(3)if(jg.eq("")){//If it is empty then don'tfalse}else{true}})map for grouping
map(x=>{//Group by the same typeval arr =x.split(",")val tp = arr(2)val jg =arr(3).toFloat(tp,jg)Turn to hasmap, followed by a }).collectAsMap() after rdd
}).groupByKey().map(x=>{val tp =x._1 //group groupingvar sum =0.0 //summationfor (y<- x._2){sum+=y//accumulate prices of the same type}val age =sum/x._2.size//Calculate the mean(tp,age)}).collectAsMap()//Convert the tuple to hasmap, the value is the map valuearr(3)=map.get(tp).toString//Remove the corresponding value in the maparr.mkString(",")//Link this array with string commas and returnFill in the blank case for product average value
1,AAA,Beverage,3.5,Wuhan2, BBB, drinks, 2.5, Wuhan3, CCC, Tobacco and Alcohol, 3.5, Shanghai11, OOO, Tobacco and Alcohol, 5.75, Shanghai4, DDD, drinks, 12.5, Shanghai5, EEE, drinks, 22.5, Wuhan6, FFF, Tobacco and Alcohol, 7.5, Shanghai11, OOO, drinks, 8.5, Wuhan7, GGG, drinks, 4.5, Wenzhou8, HHH, Tobacco and Alcohol, 9.5, Guiyang9, JJJ, Tobacco and Alcohol, 2.5, Nanning10, KKK, drinks, 5.5, Nanningpackage sparkimport org.apache.spark.sql.SparkSessionobject mean fill {def main(args: Array[String]): Unit = {val sparkSession =SparkSession.builder().master("local").appName("Mean Fill").getOrCreate()val sc = sparkSession.sparkContextval rdd =sc.textFile("src/product.txt")//1. Calculate the mean of each categoryval map =rdd.filter(x=>{//filterval arr =x.split(",")val jg =arr(3)if(jg.equals("")){//If it is empty then don'tfalse}else{true}}).map(x=>{//Group by the same typeval arr =x.split(",")val tp = arr(2)val jg =arr(3).toFloat(tp,jg)}).groupByKey().map(x=>{val tp =x._1 //group groupingvar sum =0.0 //summationfor (y<- x._2){sum+=y//accumulate prices of the same type}val age =sum/x._2.size//Calculate the mean(tp,age)}).collectAsMap()//Convert the tuple to hasmap, the value is the map value//complete empty value fillingrdd.map(x=>{val arr=x.split(",")val age =arr(3)val tp =arr(2)//Remove item typeif (age.equals("")){arr(3)=map.get(tp).get.toString//Remove the corresponding value in the map}arr.mkString(",")//Link this array with string commas and return}).saveAsTextFile("data/out1")sparkSession.stop()}}边栏推荐
猜你喜欢
随机推荐
各位大佬,sqlserver 支持表名正则匹配吗
How to upgrade nodejs version
[MySQL exercises] Chapter 3 Common data types in MySQL
ScheduledExecutorService - 定时周期执行任务
刷题《剑指Offer》day06
SSM框架讲解(史上最详细的文章)
Linux安装mysql
MySQL 高级(进阶) SQL 语句 (一)
哆啦a梦教你页面的转发与重定向
[Mini Program Project Development--Jingdong Mall] Custom Search Component of uni-app (Part 1)--Component UI
服务器上解压文件时提示“gzip: stdin: not in gzip format,tar: Child returned status 1,tar: Error is not recovera“
A, MySQL principle of master-slave replication
基于golang的swagger超贴心、超详细使用指南【有很多坑】
SSM框架简单介绍
MUI获取相机权限
MySQL 操作语句大全(详细)
【机器学习】用特征量重要度(feature importance)解释模型靠谱么?怎么才能算出更靠谱的重要度?
浏览器使用占比js雷达图
SSM framework explanation (the most detailed article in history)
MySQL (2)







