当前位置:网站首页>spark调优(一):从hql转向代码
spark调优(一):从hql转向代码
2022-07-05 10:58:00 【InfoQ】
1. 起因
SELECT id,name,
max(score1),
sum(score2),
avg(score3)
FROM table
GROUP BY id,name
snappy压缩,原始数据500G
280亿条数据
第一步Shuffle Write 800G
接下来的任务预估需要8个小时跑完
2.优化开始
--conf spark.storage.memoryFraction=0.7
--conf spark.executor.heartbeatInterval=240
--conf spark.locality.wait=60
-XX:+UseG1GC
dataset.repartition(20000)
3. 问题解决
Dataset<Row> ds = spark.sql(sql);
dsTag0200.javaRDD().mapPartitionsToPair(
数据转型
分组当key做成tuple2
此处我缓存了一些需要后面聚合的差值
).reduceByKey(
判断最大最小
sum的聚合操作使用差值直接聚合
一遍就可以直接输出最终结果
)
4 总结
结束语
边栏推荐
- PWA (Progressive Web App)
- 关于vray 5.2的使用(自研笔记)
- A usage example that can be compatible with various database transactions
- Broyage · fusion | savoir que le site officiel de chuangyu mobile end est en ligne et commencer le voyage de sécurité numérique!
- 关于vray 5.2的使用(自研笔记)
- 购买小间距LED显示屏的三个建议
- Review the whole process of the 5th Polkadot Hackathon entrepreneurship competition, and uncover the secrets of the winning projects!
- 中职组网络安全2021年江苏省省赛题目5套题目环境+解析全有需要的私信我
- [JS learning notes 54] BFC mode
- The first product of Sepp power battery was officially launched
猜你喜欢

Stop saying that microservices can solve all problems!

Lombok makes ⽤ @data and @builder's pit at the same time. Are you hit?

Implement the rising edge in C #, and simulate the PLC environment to verify the difference between if statement using the rising edge and not using the rising edge

如何将 DevSecOps 引入企业?

About the use of Vray 5.2 (self research notes)

【Oracle】使用DataGrip连接Oracle数据库

Repair animation 1K to 8K

关于vray 5.2的使用(自研笔记)(二)

LSTM applied to MNIST dataset classification (compared with CNN)

Go-3-第一个Go程序
随机推荐
不要再说微服务可以解决一切问题了!
Nuxt//
Go project practice - Gorm format time field
关于vray 5.2的使用(自研笔记)
Data type
Scaffold development foundation
使用bat命令一键启动常用浏览器
The first product of Sepp power battery was officially launched
Applet framework taro
9、 Disk management
Array
About the use of Vray 5.2 (self research notes)
数据类型 ntext 和 varchar 在not equal to 运算符中不兼容 -九五小庞
2022 chemical automation control instrument examination questions and online simulation examination
Implement the rising edge in C #, and simulate the PLC environment to verify the difference between if statement using the rising edge and not using the rising edge
华为设备配置信道切换业务不中断
上拉加载原理
PWA (Progressive Web App)
Wechat nucleic acid detection appointment applet system graduation design completion (8) graduation design thesis template
Stop saying that microservices can solve all problems!