当前位置:网站首页>Spark Tuning (I): from HQL to code
Spark Tuning (I): from HQL to code
2022-07-05 11:13:00 【InfoQ】
1. cause
SELECT id,name,
max(score1),
sum(score2),
avg(score3)
FROM table
GROUP BY id,name
snappy Compress , Raw data 500G
280 Billion data
First step Shuffle Write 800G
The next task is estimated to need 8 Run in an hour
2. Optimization starts
--conf spark.storage.memoryFraction=0.7
--conf spark.executor.heartbeatInterval=240
--conf spark.locality.wait=60
-XX:+UseG1GC
dataset.repartition(20000)
3. Problem solving
Dataset<Row> ds = spark.sql(sql);
dsTag0200.javaRDD().mapPartitionsToPair(
Transformation data
Group when key Make it tuple2
Here I cache some differences that need to be aggregated later
).reduceByKey(
Judge the maximum and minimum
sum The aggregation operation of uses difference to aggregate directly
You can directly output the final result once
)
4 summary
Conclusion
边栏推荐
- Codeforces Round #804 (Div. 2)
- 不要再说微服务可以解决一切问题了!
- String
- Four departments: from now on to the end of October, carry out the "100 day action" on gas safety
- msfconsole命令大全,以及使用说明
- About the use of Vray 5.2 (self research notes)
- How to make full-color LED display more energy-saving and environmental protection
- NAS and San
- 2022 t elevator repair operation certificate examination questions and answers
- Nuxt//
猜你喜欢
Talk about the understanding of fault tolerance mechanism and state consistency in Flink framework
Modulenotfounderror: no module named 'scratch' ultimate solution
Three paradigms of database
Operation of simulated examination platform of special operation certificate examination question bank for safety production management personnel of hazardous chemical production units in 2022
In the last process before the use of the risk control model, 80% of children's shoes are trampled here
Go language learning notes - first acquaintance with go language
DDR4硬件原理图设计详解
Review the whole process of the 5th Polkadot Hackathon entrepreneurship competition, and uncover the secrets of the winning projects!
【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
Go-3-the first go program
随机推荐
MFC pet store information management system
【Office】Excel中IF函数的8种用法
购买小间距LED显示屏的三个建议
32: Chapter 3: development of pass service: 15: Browser storage media, introduction; (cookie,Session Storage,Local Storage)
The first product of Sepp power battery was officially launched
Detailed explanation of MATLAB cov function
Explanation of full vulnerability script of network security C module of secondary vocational group script containing 4 vulnerabilities
Go-3-the first go program
Basic part - basic project analysis
使用GBase 8c数据库过程中报错:80000305,Host ips belong to different cluster ,怎么解决?
DDR4的特性与电气参数
Cron表达式(七子表达式)
关于 “原型” 的那些事你真的理解了吗?【上篇】
[JS] extract the scores in the string, calculate the average score after summarizing, compare with each score, and output
【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
Web3 Foundation grant program empowers developers to review four successful projects
华为设备配置信道切换业务不中断
R3live series learning (IV) r2live source code reading (2)
About the use of Vray 5.2 (self research notes)
Codeforces Round #804 (Div. 2)