当前位置:网站首页>Once spark reported an error: failed to allocate a page (67108864 bytes), try again
Once spark reported an error: failed to allocate a page (67108864 bytes), try again
2022-07-25 15:15:00 【The south wind knows what I mean】
Project scenario :
There is a demand from the business side , We need two tables to complete join operation , Watch (4800 Ten thousand ) The big table (26 Billion bars ). Typical small and large watches join, The first thing that comes to mind Broadcast Join Make the best of it .
Problem description
1, Open the door .
//sc It's a small table.
select /*+ BROADCASTJOIN(sc) */
sc.courseid,
csc.courseid
from sale_course sc join course_shopping_cart csc
on sc.courseid=csc.courseid
2, Pack cluster run, Start to bug
2022-06-22 19:36:56 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:36:57 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:36:59 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:00 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:00 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:01 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:03 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:03 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:04 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:05 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:05 WARN memory.TaskMemoryManager: Failed to allocate a page (67108864 bytes), try again.
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 2 with no recent heartbeats: 139818 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 5 with no recent heartbeats: 178273 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 7 with no recent heartbeats: 162256 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 WARN spark.HeartbeatReceiver: Removing executor 3 with no recent heartbeats: 154289 ms exceeds timeout 120000 ms
2022-06-22 19:37:05 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) 2
3, After reading it, I think there is insufficient memory , A print GC Look at the log again
2022-06-22T19:32:04.731+0800: [GC (Allocation Failure) [PSYoungGen: 994157K->47291K(1377280K)] 1061069K->240591K(4076032K), 0.2125657 secs] [Times: user=4.51 sys=0.35, real=0.21 secs]
2022-06-22T19:32:12.667+0800: [GC (Allocation Failure) [PSYoungGen: 1298524K->69107K(1380352K)] 1491823K->776885K(4079104K), 0.4118997 secs] [Times: user=12.93 sys=1.20, real=0.41 secs]
2022-06-22T19:32:30.661+0800: [GC (Allocation Failure) [PSYoungGen: 1363073K->305779K(1643520K)] 2070852K->1248436K(4342272K), 0.2067380 secs] [Times: user=6.53 sys=0.68, real=0.21 secs]
2022-06-22T19:32:49.327+0800: [GC (Allocation Failure) [PSYoungGen: 1583420K->380843K(1685504K)] 2526077K->1558689K(4384256K), 0.2134726 secs] [Times: user=6.50 sys=1.14, real=0.21 secs]
2022-06-22T19:32:57.628+0800: [GC (Allocation Failure) [PSYoungGen: 1677943K->386985K(1469440K)] 2855790K->1938110K(4168192K), 0.1938505 secs] [Times: user=6.17 sys=0.87, real=0.19 secs]
2022-06-22T19:33:10.943+0800: [GC (Allocation Failure) [PSYoungGen: 1424669K->489773K(1547776K)] 2975793K->2158027K(4246528K), 0.1824065 secs] [Times: user=6.34 sys=0.27, real=0.19 secs]
2022-06-22T19:33:18.556+0800: [GC (Allocation Failure) [PSYoungGen: 1523628K->501866K(1313280K)] 4240457K->3578994K(5061120K), 0.1838270 secs] [Times: user=5.74 sys=0.84, real=0.18 secs]
2022-06-22T19:33:19.956+0800: [GC (Allocation Failure) [PSYoungGen: 1214502K->632842K(1397248K)] 4291630K->3972122K(5145088K), 0.2161871 secs] [Times: user=7.20 sys=0.64, real=0.21 secs]
2022-06-22T19:33:20.172+0800: [Full GC (Ergonomics) [PSYoungGen: 632842K->0K(1397248K)] [ParOldGen: 3339280K->3514303K(4194304K)] 3972122K->3514303K(5591552K), [Metaspace: 136487K->136476K(1177600K)], 0.6284626 secs] [Times: user=6.74 sys=3.98, real=0.63 secs]
2022-06-22T19:33:22.153+0800: [GC (Allocation Failure) [PSYoungGen: 726892K->459232K(1398272K)] 4241195K->3973535K(5592576K), 0.0348947 secs] [Times: user=0.96 sys=0.00, real=0.04 secs]
2022-06-22T19:33:23.347+0800: [GC (Allocation Failure) [PSYoungGen: 1158624K->656153K(1398272K)] 4672927K->4367065K(5592576K), 0.1967581 secs] [Times: user=6.70 sys=0.44, real=0.19 secs]
2022-06-22T19:33:23.544+0800: [Full GC (Ergonomics) [PSYoungGen: 656153K->131072K(1398272K)] [ParOldGen: 3710911K->4169346K(4194304K)] 4367065K->4300418K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 1.7445365 secs] [Times: user=46.91 sys=10.81, real=1.75 secs]
2022-06-22T19:33:26.442+0800: [Full GC (Ergonomics) [PSYoungGen: 830464K->524355K(1398272K)] [ParOldGen: 4169346K->4169283K(4194304K)] 4999810K->4693638K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 0.5643075 secs] [Times: user=14.75 sys=0.14, real=0.57 secs]
2022-06-22T19:33:27.323+0800: [Full GC (Ergonomics) [PSYoungGen: 664059K->589892K(1398272K)] [ParOldGen: 4169283K->4169282K(4194304K)] 4833342K->4759175K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 0.3743719 secs] [Times: user=10.16 sys=0.05, real=0.38 secs]
2022-06-22T19:33:27.909+0800: [Full GC (Ergonomics) [PSYoungGen: 699392K->655430K(1398272K)] [ParOldGen: 4169282K->4169282K(4194304K)] 4868674K->4824713K(5592576K), [Metaspace: 136485K->136485K(1177600K)], 0.4272478 secs] [Times: user=11.16 sys=0.05, real=0.43 secs]
2022-06-22T19:33:28.382+0800: [Full GC (Ergonomics) [PSYoungGen: 668779K->655430K(1398272K)] [ParOldGen: 4169282K->4169282K(4194304K)] 4838062K->4824713K(5592576K), [Metaspace: 136486K->136486K(1177600K)], 0.2751700 secs] [Times: user=6.67 sys=0.03, real=0.28 secs]
2022-06-22T19:33:28.657+0800: [Full GC (Allocation Failure) [PSYoungGen: 655430K->655430K(1398272K)] [ParOldGen: 4169282K->4162677K(4194304K)] 4824713K->4818107K(5592576K), [Metaspace: 136486K->135746K(1177600K)], 0.6008903 secs] [Times: user=17.76 sys=0.08, real=0.60 secs]
2022-06-22T19:33:29.260+0800: [Full GC (Ergonomics) [PSYoungGen: 659800K->655438K(1398272K)] [ParOldGen: 4162677K->4162674K(4194304K)] 4822477K->4818112K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 1.4037111 secs] [Times: user=46.99 sys=0.27, real=1.40 secs]
2022-06-22T19:33:30.664+0800: [Full GC (Allocation Failure) [PSYoungGen: 655438K->655431K(1398272K)] [ParOldGen: 4162674K->4162674K(4194304K)] 4818112K->4818105K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 0.1268273 secs] [Times: user=1.35 sys=0.02, real=0.13 secs]
2022-06-22T19:33:30.792+0800: [Full GC (Ergonomics) [PSYoungGen: 658317K->655447K(1398272K)] [ParOldGen: 4162674K->4162674K(4194304K)] 4820992K->4818121K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 1.2769239 secs] [Times: user=42.48 sys=0.27, real=1.28 secs]
2022-06-22T19:33:32.069+0800: [Full GC (Allocation Failure) [PSYoungGen: 655447K->655440K(1398272K)] [ParOldGen: 4162674K->4162674K(4194304K)] 4818121K->4818114K(5592576K), [Metaspace: 135746K->135746K(1177600K)], 0.2098295 secs] [Times: user=2.81 sys=0.02, real=0.21 secs]
2022-06-22T19:33:32.282+0800: [Full GC (Ergonomics) [PSYoungGen: 657391K->655457K(1398272K)] [ParOldGen: 4162674K->4162673
Cause analysis :
In fact, seeing this, I know where the problem is , Out of memory , Under adjustment executor Memory and driver Memory , Generally, it can be solved
But I'm still reviewing the radio join Well
1. radio broadcast join principle
Spark join Strategy , If a small table is small enough and can be cached into memory first , Then you can use Broadcast Hash Join, The principle is to
Aggregate small tables into driver End, Then broadcast to each large table partition , So do it again join When , Compare the data of each partition of the large table with the small table locally join, Thus avoiding shuffle.
#1, Specify auto broadcast by parameter
radio broadcast join The default value is 10MB, from spark.sql.autoBroadcastJoinThreshold Parameter control .
SparkConf().set("spark.sql.autoBroadcastJoinThreshold","10m") // Turn on
SparkConf().set("spark.sql.autoBroadcastJoinThreshold","-1") // Ban
#2, Forcibly turn on the broadcast join
#SQL Hint The way
#sc Must be join My little watch
select /*+ BROADCASTJOIN(sc) */ or /*+ BROADCAST(sc) */ or /*+ MAPJOIN(sc) */
2, Tell me about my problem
It says radio join The data of the small table Pull to driver paragraph , therefore driver Memory cannot be too small , If you give too little, you will report an error
however , I put driver The problem is still unsolved after the memory is increased
Because my small table has too much data , We can't give too much memory to the cluster , but
Solution :
To do that ?
Then don't broadcast join 了 , Just ordinary join Well, it's slower But the hardware resources are there. There is no way
The last two tables join For two hours QAQ
边栏推荐
- 【微信小程序】小程序宿主环境详解
- 如何解决Visual Stuido2019 30天体验期过后的登陆问题
- ESXI6.7.0 升级到7.0U3f(2022年7月12 更新)
- API health status self inspection
- spark分区算子partitionBy、coalesce、repartition
- Implement a simple restful API server
- 瀑布流布局
- Maxcompute SQL 的查询结果条数受限1W
- 打开虚拟机时出现VMware Workstation 未能启动 VMware Authorization Service
- Nacos2.1.0 cluster construction
猜你喜欢

防抖(debounce)和节流(throttle)

Application of object detection based on OpenCV and yolov3

oracle_ 12505 error resolution

"Ask every day" briefly talk about JMM / talk about your understanding of JMM

How much memory can a program use at most?

如何解决Visual Studio中scanf编译报错的问题

流程控制(上)

SPI传输出现数据与时钟不匹配延后问题分析与解决

打开虚拟机时出现VMware Workstation 未能启动 VMware Authorization Service

JS 同步、异步,宏任务、微任务概述
随机推荐
Maxcompute SQL 的查询结果条数受限1W
String type time comparison method with error string.compareto
记一次Yarn Required executor memeory is above the max threshold(8192MB) of this cluster!
mysql heap表_MySQL内存表heap使用总结-九五小庞
"Ask every day" reentrantlock locks and unlocks
TypeScript学习2——接口
Iframe nested other website page full screen settings
spark中saveAsTextFile如何最终生成一个文件
Spark002 --- spark task submission, pass JSON as a parameter
打开虚拟机时出现VMware Workstation 未能启动 VMware Authorization Service
Simulate setinterval timer with setTimeout
防抖(debounce)和节流(throttle)
了解一下new的过程发生了什么
Nacos2.1.0 cluster construction
MySQL之事务与MVCC
Scala111-map、flatten、flatMap
Example of password strength verification
Automatically set the template for VS2010 and add header comments
C#,C/S升级更新
安装EntityFramework方法