当前位置:网站首页>Spark统一内存划分
Spark统一内存划分
2022-07-26 16:53:00 【InfoQ】
1. Executor内存逻辑架构

- 堆内存,由JVM分配和回收,由spark.executor.memory控制大小,JVM中序列化的对象是以字节流形式,其占用内存大小可直接计算,对于非序列化对象,其占用的内存是通过周期性地采样近似估算,且被spark标记为释放的对象实例也有可能并没有被JVM回收,所以spark并不能准确记录实际可用堆内存,也就无法避免内存溢出
- 非堆内存,不受JVM管理,有两部分,其中一部分通常是yarn模式中通过spark.executor.memoryOverhead配置,该部分内存用于虚拟机自身的开销(字符串、NIO和其它一些本地开销);另一部分通过spark.memory.offHeap.enable/size结合配置,该部分由spark直接使用于存储内存和任务内存,从2.0开始不再依赖第三方内存系统Tachyon,而是基于JDK自带的Unsafe API实现堆外内存管理,堆外内存可以精确地申请和释放,减少了不必要的额外开销。
- 系统内存(systemMemory):这里指的是JVM可用的最大内存,可通过Runtime.getRuntime.maxMemory获得该值,系统内存并不等于分配的堆内存,由于年轻代GC采用复制算法,所以有一块survivor内存区需要保留,即
systemMemory=堆内存-survivor
- 可用内存(usableMemory):这部分内存是用户代码能直接影响到的,
可用内存=系统内存 - Reserved,其中Reverved为固定300M的保留内存,用于spark系统内部使用。
- 应用内存:主要用于存储用户代码生成的数据对象,这些数据对象被缓存之前就是处于应用内存空间
- 存储内存与执行内存:存储内存用于缓存数据,执行内存主要用于满足 Shuffle、 Join、 Sort、 Aggregation 等计算过程中对内存的需求,通过spark.memory.storageFraction控制两者比例,默认平分,两部分内存之间还可以进行动态占用:
- 执行内存的空间被对方占用后,可让对方将占用的部分转存到硬盘,然后归还借用空间
- 存储内存空间被对应占用后,无法让对方归还,因为shuffle过程中的很多因素无法实现
2. Executor 界面内存计算
显示总内存 =(堆内存储内存+堆内执行内存)+(堆外存储内存+堆外执行内存)
=(可用内存-应用内存)+ offHeap
= 可用内存 * spark.memory.fraction + offHeap
=(系统内存-300M)* 0.6 + offHeap
=(spark.executor.memory – survivor – 300M)* 0.6 + offHeap
=(Runtime.getRuntime.maxMemory – 300M)* 0.6 + offHeap
3. UnrollMemory理解
acquireStorageMemory
acquireUnrollMemory
acquireExecutionMemory
UnifiedMemoryManager.acquireUnrollMemory4. 参考
边栏推荐
- Advantages of time series database and traditional database
- Application of machine vision in service robot
- Performance tuning bugs emerge in endlessly? These three documents can easily handle JVM tuning
- Kudu design tablet
- 敏捷开发与DevOps的对比
- Heavy! Zeng Xuezhong was promoted to vice chairman and CEO of zhanrui, and Chu Qingren was appointed as co CEO!
- (24)Blender源码分析之顶层菜单显示代码分析
- 第16周OJ实践1 计算该日在本年中是第几天
- 基本的SELECT语句
- pip安装模块,报错
猜你喜欢

重磅公布!ICML2022奖项:15篇杰出论文,复旦、厦大、上交大研究入选

机器学习-什么是机器学习、监督学习和无监督学习

Summer Challenge openharmony greedy snake based on JS
![[300 opencv routines] 240. Shi Tomas corner detection in opencv](/img/3a/0b81fb06e91e681ccc928e67297188.png)
[300 opencv routines] 240. Shi Tomas corner detection in opencv

In depth exploration of ribbon load balancing

Merge multiple row headers based on apache.poi operation

(24) the top menu of blender source code analysis shows code analysis

6-19 vulnerability exploitation -nsf to obtain the target password file

得不偿失!博士骗领210万元、硕士骗领3万元人才补贴,全被判刑了!

kudu设计-tablet
随机推荐
Xiaomi Wuhan headquarters building starts today! Lei Jun: planned according to the scale of 10000 people
Leetcode:1206. design jump table [jump table board]
Comparison between agile development and Devops
A collection of commonly used shortcut keys for office software
Brief introduction to CUDA image construction
(25)Blender源码分析之顶层菜单Blender菜单
Is the rolling update of pod similar to Canary deployment or blue-green deployment?
CCS TM4C123新建工程
MySQL foundation - basic database operation
[virtual machine data recovery] data recovery cases in which XenServer virtual machine is unavailable due to accidental power failure and virtual disk files are lost
Come on developer! Not only for the 200000 bonus, try the best "building blocks" for a brainstorming
What is a test case? How to design?
CCS tm4c123 new project
使用 Dired 快速移动文件
Use replace regexp to add a sequence number at the beginning of a line
图解用户登录验证流程,写得太好了!
Concepts and differences of DQL, DML, DDL and DCL
Stand aside with four and five rear cameras, LG or push the 16 rear camera mobile phone!
pip安装模块,报错
Establishment of Eureka registration center Eureka server