当前位置:网站首页>Spark Tuning (I): from HQL to code
Spark Tuning (I): from HQL to code
2022-07-05 11:13:00 【InfoQ】
1. cause
SELECT id,name,
max(score1),
sum(score2),
avg(score3)
FROM table
GROUP BY id,name
snappy Compress , Raw data 500G
280 Billion data
First step Shuffle Write 800G
The next task is estimated to need 8 Run in an hour
2. Optimization starts
--conf spark.storage.memoryFraction=0.7
--conf spark.executor.heartbeatInterval=240
--conf spark.locality.wait=60
-XX:+UseG1GC
dataset.repartition(20000)
3. Problem solving
Dataset<Row> ds = spark.sql(sql);
dsTag0200.javaRDD().mapPartitionsToPair(
Transformation data
Group when key Make it tuple2
Here I cache some differences that need to be aggregated later
).reduceByKey(
Judge the maximum and minimum
sum The aggregation operation of uses difference to aggregate directly
You can directly output the final result once
)
4 summary
Conclusion
边栏推荐
猜你喜欢
数据库三大范式
Lombok 同时使⽤@Data和@Builder 的坑,你中招没?
Honing · fusion | know that the official website of Chuangyu mobile terminal is newly launched, and start the journey of digital security!
关于 “原型” 的那些事你真的理解了吗?【上篇】
【Oracle】使用DataGrip连接Oracle数据库
In the last process before the use of the risk control model, 80% of children's shoes are trampled here
关于vray 5.2的使用(自研笔记)
Some understandings of heterogeneous graphs in DGL and the usage of heterogeneous graph convolution heterographconv
华为设备配置信道切换业务不中断
如何将 DevSecOps 引入企业?
随机推荐
regular expression
Go-3-the first go program
DOM//
Msfconsole command encyclopedia and instructions
iframe
Ddrx addressing principle
Huawei equipment configures channel switching services without interruption
Three suggestions for purchasing small spacing LED display
Summary of websites of app stores / APP markets
NAS and San
Go language learning notes - first acquaintance with go language
技术分享 | 常见接口协议解析
R3Live系列学习(四)R2Live源码阅读(2)
【全网首发】(大表小技巧)有时候 2 小时的 SQL 操作,可能只要 1 分钟
websocket
使用GBase 8c数据库过程中报错:80000305,Host ips belong to different cluster ,怎么解决?
vite//
C language current savings account management system
spark调优(一):从hql转向代码
Use bat command to launch common browsers with one click