当前位置:网站首页>Spark Tuning (I): from HQL to code
Spark Tuning (I): from HQL to code
2022-07-05 11:13:00 【InfoQ】
1. cause
SELECT id,name,
max(score1),
sum(score2),
avg(score3)
FROM table
GROUP BY id,name
snappy Compress , Raw data 500G
280 Billion data
First step Shuffle Write 800G
The next task is estimated to need 8 Run in an hour
2. Optimization starts
--conf spark.storage.memoryFraction=0.7
--conf spark.executor.heartbeatInterval=240
--conf spark.locality.wait=60
-XX:+UseG1GC
dataset.repartition(20000)
3. Problem solving
Dataset<Row> ds = spark.sql(sql);
dsTag0200.javaRDD().mapPartitionsToPair(
Transformation data
Group when key Make it tuple2
Here I cache some differences that need to be aggregated later
).reduceByKey(
Judge the maximum and minimum
sum The aggregation operation of uses difference to aggregate directly
You can directly output the final result once
)
4 summary
Conclusion
边栏推荐
- 32:第三章:开发通行证服务:15:浏览器存储介质,简介;(cookie,Session Storage,Local Storage)
- 9、 Disk management
- 修复动漫1K变8K
- regular expression
- 2022 mobile crane driver examination question bank and simulation examination
- C # to obtain the filtered or sorted data of the GridView table in devaexpress
- 7.2每日学习4
- 如何将 DevSecOps 引入企业?
- Huawei equipment configures channel switching services without interruption
- Wechat nucleic acid detection appointment applet system graduation design completion (8) graduation design thesis template
猜你喜欢

2022 Pengcheng cup Web

【DNS】“Can‘t resolve host“ as non-root user, but works fine as root

Basics - rest style development

About the use of Vray 5.2 (self research notes)

华为设备配置信道切换业务不中断

修复动漫1K变8K

Go-3-the first go program

Modulenotfounderror: no module named 'scratch' ultimate solution
![[advertising system] incremental training & feature access / feature elimination](/img/14/ac596fa4d92e7b245e08cea014a4ab.png)
[advertising system] incremental training & feature access / feature elimination

Implement the rising edge in C #, and simulate the PLC environment to verify the difference between if statement using the rising edge and not using the rising edge
随机推荐
String
【Oracle】使用DataGrip连接Oracle数据库
Wechat nucleic acid detection appointment applet system graduation design completion (8) graduation design thesis template
R3live series learning (IV) r2live source code reading (2)
9、 Disk management
Lazy loading scheme of pictures
购买小间距LED显示屏的三个建议
[JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
居家办公那些事|社区征文
-26374 and -26377 errors during coneroller execution
我用开天平台做了一个城市防疫政策查询系统【开天aPaaS大作战】
Implement the rising edge in C #, and simulate the PLC environment to verify the difference between if statement using the rising edge and not using the rising edge
谈谈对Flink框架中容错机制及状态的一致性的理解
deepfake教程
基于昇腾AI丨爱笔智能推出银行网点数字化解决方案,实现从总部到网点的信息数字化全覆盖
Go project practice - parameter binding, type conversion
Three suggestions for purchasing small spacing LED display
DGL中异构图的一些理解以及异构图卷积HeteroGraphConv的用法
【全网首发】(大表小技巧)有时候 2 小时的 SQL 操作,可能只要 1 分钟
关于vray5.2怎么关闭日志窗口