当前位置:网站首页>spark调优(一):从hql转向代码
spark调优(一):从hql转向代码
2022-07-05 10:58:00 【InfoQ】
1. 起因
SELECT id,name,
max(score1),
sum(score2),
avg(score3)
FROM table
GROUP BY id,name
snappy压缩,原始数据500G
280亿条数据
第一步Shuffle Write 800G
接下来的任务预估需要8个小时跑完
2.优化开始
--conf spark.storage.memoryFraction=0.7
--conf spark.executor.heartbeatInterval=240
--conf spark.locality.wait=60
-XX:+UseG1GC
dataset.repartition(20000)
3. 问题解决
Dataset<Row> ds = spark.sql(sql);
dsTag0200.javaRDD().mapPartitionsToPair(
数据转型
分组当key做成tuple2
此处我缓存了一些需要后面聚合的差值
).reduceByKey(
判断最大最小
sum的聚合操作使用差值直接聚合
一遍就可以直接输出最终结果
)
4 总结
结束语
边栏推荐
- Based on shengteng AI Aibi intelligence, we launched a digital solution for bank outlets to achieve full digital coverage of information from headquarters to outlets
- Applet framework taro
- [JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
- How to introduce devsecops into enterprises?
- A usage example that can be compatible with various database transactions
- Wechat nucleic acid detection appointment applet system graduation design completion (8) graduation design thesis template
- Codeforces Round #804 (Div. 2)
- [advertising system] incremental training & feature access / feature elimination
- FreeRTOS 中 RISC-V-Qemu-virt_GCC 的调度时机
- Function///
猜你喜欢
Detailed explanation of DDR4 hardware schematic design
DDR4硬件原理图设计详解
DGL中的消息传递相关内容的讲解
基于昇腾AI丨以萨技术推出视频图像全目标结构化解决方案,达到业界领先水平
赛克瑞浦动力电池首台产品正式下线
如何让全彩LED显示屏更加节能环保
2022 mobile crane driver examination question bank and simulation examination
Crawler (9) - scrape framework (1) | scrape asynchronous web crawler framework
【广告系统】Parameter Server分布式训练
修复动漫1K变8K
随机推荐
Go project practice - Gorm format time field
LDAP overview
图片懒加载的方案
Cron表达式(七子表达式)
华为设备配置信道切换业务不中断
Intelligent metal detector based on openharmony
Basic part - basic project analysis
【广告系统】增量训练 & 特征准入/特征淘汰
Broyage · fusion | savoir que le site officiel de chuangyu mobile end est en ligne et commencer le voyage de sécurité numérique!
MFC pet store information management system
DGL中的消息传递相关内容的讲解
磨砺·聚变|知道创宇移动端官网焕新上线,开启数字安全之旅!
Three paradigms of database
Go-3-the first go program
2021年山东省赛题库题目抓包
GBase 8c数据库如何查看登录用户的登录信息,如上一次登录认证通过的日期、时间和IP等信息?
关于vray5.2怎么关闭日志窗口
Dspic33ep clock initialization program
购买小间距LED显示屏的三个建议
Data types ntext and varchar are incompatible in the not equal to operator - 95 small pang