当前位置:网站首页>spark调优(一):从hql转向代码
spark调优(一):从hql转向代码
2022-07-05 10:58:00 【InfoQ】
1. 起因
SELECT id,name,
max(score1),
sum(score2),
avg(score3)
FROM table
GROUP BY id,name
snappy压缩,原始数据500G
280亿条数据
第一步Shuffle Write 800G
接下来的任务预估需要8个小时跑完
2.优化开始
--conf spark.storage.memoryFraction=0.7
--conf spark.executor.heartbeatInterval=240
--conf spark.locality.wait=60
-XX:+UseG1GC
dataset.repartition(20000)
3. 问题解决
Dataset<Row> ds = spark.sql(sql);
dsTag0200.javaRDD().mapPartitionsToPair(
数据转型
分组当key做成tuple2
此处我缓存了一些需要后面聚合的差值
).reduceByKey(
判断最大最小
sum的聚合操作使用差值直接聚合
一遍就可以直接输出最终结果
)
4 总结
结束语
边栏推荐
- Ddrx addressing principle
- A usage example that can be compatible with various database transactions
- About the use of Vray 5.2 (self research notes)
- 在C# 中实现上升沿,并模仿PLC环境验证 If 语句使用上升沿和不使用上升沿的不同
- How to make full-color LED display more energy-saving and environmental protection
- Go-3-第一个Go程序
- Codeforces Round #804 (Div. 2)
- Lombok makes ⽤ @data and @builder's pit at the same time. Are you hit?
- 2022 t elevator repair operation certificate examination questions and answers
- Three suggestions for purchasing small spacing LED display
猜你喜欢
关于vray 5.2的使用(自研笔记)
基础篇——REST风格开发
Go-3-第一个Go程序
[JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
[Oracle] use DataGrid to connect to Oracle Database
关于 “原型” 的那些事你真的理解了吗?【上篇】
Three paradigms of database
购买小间距LED显示屏的三个建议
Web3基金会「Grant计划」赋能开发者,盘点四大成功项目
Question bank and answers of special operation certificate examination for main principals of hazardous chemical business units in 2022
随机推荐
Wechat nucleic acid detection appointment applet system graduation design completion (8) graduation design thesis template
关于vray 5.2的使用(自研笔记)(二)
Go语言-1-开发环境配置
磨礪·聚變|知道創宇移動端官網煥新上線,開啟數字安全之旅!
C language current savings account management system
How to introduce devsecops into enterprises?
PWA (Progressive Web App)
[JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
BOM//
Process control
[advertising system] parameter server distributed training
Taro advanced
Golang application topic - channel
在C# 中实现上升沿,并模仿PLC环境验证 If 语句使用上升沿和不使用上升沿的不同
A mining of edu certificate station
Explanation of message passing in DGL
Go language-1-development environment configuration
Buried point 111
关于vray 5.2的使用(自研笔记)
基于昇腾AI丨以萨技术推出视频图像全目标结构化解决方案,达到业界领先水平