当前位置:网站首页>Spark存储体系底层架构剖析-Spark商业环境实战
Spark存储体系底层架构剖析-Spark商业环境实战
2022-06-29 19:58:00 【全栈程序员站长】
本套系列博客从真实商业环境抽取案例进行总结和分享,并给出Spark源码解读及商业实战指导,请持续关注本套博客。版权声明:本套Spark源码解读及商业实战归作者(秦凯新)所有,禁止转载,欢迎学习。
Spark商业环境实战及调优进阶系列
- Spark商业环境实战-Spark内置框架rpc通讯机制及RpcEnv基础设施
- Spark商业环境实战-Spark事件监听总线流程分析
- Spark商业环境实战-Spark存储体系底层架构剖析
- Spark商业环境实战-Spark底层多个MessageLoop循环线程执行流程分析
- Spark商业环境实战-Spark二级调度系统Stage划分算法和最佳任务调度细节剖析
- Spark商业环境实战-Spark任务延迟调度及调度池Pool架构剖析
- Spark商业环境实战-Task粒度的缓存聚合排序结构AppendOnlyMap详细剖析
- Spark商业环境实战-ExternalSorter 排序器在Spark Shuffle过程中设计思路剖析
- Spark商业环境实战-StreamingContext启动流程及Dtream 模板源码剖析
- Spark商业环境实战-ReceiverTracker与BlockGenerator数据流接收过程剖析
1. Spark存储体系组件关系解释
BlockInfoManger 主要提供读写锁控制,层级仅仅位于BlockManger之下,通常Spark读写操作都先调用BlockManger,然后咨询BlockInfoManger是否存在锁竞争,然后才会调用DiskStore和MemStore,进而调用DiskBlockManger来确定数据与位置映射,或者调用 MemoryManger来确定内存池的软边界和内存使用申请。
1.1 Driver 与 Executor 与 SparkEnv 与 BlockManger 组件关系:
Driver与 Executor 组件各自拥有任务执行的SparkEnv环境,而每一个SparkEnv 中都有一个BlockManger负责存储服务,作为高层抽象,BlockManger 之间需要通过 RPCEnv,ShuffleClient,及BlocakTransferService相互通讯。
1.1 BlockInfoManger 与 BlockInfo 共享锁和排它锁读写控制关系:
BlockInfo中具有读写锁的标志,通过标志可以判断是否进行写控制
val NO_WRITER: Long = -1
val NON_TASK_WRITER: Long = -1024
* The task attempt id of the task which currently holds the write lock for this block, or
* [[BlockInfo.NON_TASK_WRITER]] if the write lock is held by non-task code, or
* [[BlockInfo.NO_WRITER]] if this block is not locked for writing.
def writerTask: Long = _writerTask
def writerTask_=(t: Long): Unit = {
_writerTask = t
checkInvariants()
复制代码BlockInfoManager具有BlockId与BlockInfo的映射关系以及任务id与BlockId的锁映射:
private[this] val infos = new mutable.HashMap[BlockId, BlockInfo]
*Tracks the set of blocks that each task has locked for writing.
private[this] val writeLocksByTask = new mutable.HashMap[TaskAttemptId, mutable.Set[BlockId]]
with mutable.MultiMap[TaskAttemptId, BlockId]
*Tracks the set of blocks that each task has locked for reading, along with the number of times
*that a block has been locked (since our read locks are re-entrant).
private[this] val readLocksByTask =
new mutable.HashMap[TaskAttemptId, ConcurrentHashMultiset[BlockId]]
复制代码1.3 DiskBlockManager 与 DiskStore 组件关系:
可以看到DiskStore内部会调用DiskBlockManager来确定Block的读写位置:
1.3 MemManager 与 MemStore 与 MemoryPool 组件关系:
在这里要强调的是:第一代大数据框架hadoop只将内存作为计算资源,而Spark不仅将内存作为计算资源外,还将内存的一部分纳入存储体系:
- 内存池模型 :逻辑上分为堆内存和堆外内存,然后堆内存(或堆外内存)内部又分为StorageMemoryPool和ExecutionMemoryPool。
- MemManager是抽象的,定义了内存管理器的接口规范,方便以后扩展,比如:老版的StaticMemoryManager和新版的UnifiedMemoryManager.
- MemStore 依赖于UnifiedMemoryManager进行内存的申请和软边界变化或内存释放。
- MemStore 内部同时负责存储真实的对象,比如内部成员变量:entries ,建立了内存中的BlockId与MemoryEntry(Block的内存的形式)之间的映射。
- MemStore 内部的“占座”行为,如:内部变量offHeapUnrollMemoryMap 和onHeapUnrollMemoryMap。
1.4 BlockManagerMaster 与 BlockManager 组件关系:
- BlockManagerMaster的作用就是对存在于Dirver或Executor上的BlockManger进行统一管理,这简直是代理行为,因为他持有BlockManagerMasterEndpointREf,进而和BlockManagerMasterEndpoint进行通讯。
2. Spark存储体系组件BlockTransferServic传输服务
未完待续
3. 总结
存储体系是Spark的基石,我争取把每一块细微的知识点进行剖析,和大部分博客不同的是,我会尽量采用最平实的语言,毕竟技术就是一层窗户纸。
秦凯新 20181031 凌晨
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/101341.html原文链接:https://javaforall.cn
边栏推荐
- JVM (4) Bytecode Technology + Runtime Optimization
- MySQL remote connection
- 2022年理财利率都降了,那该如何选择理财产品?
- 4-1 port scanning technology
- Physical verification LVS process and Technology (Part I)
- Measures to support the development of advanced manufacturing industry in Futian District of Shenzhen in 2022
- 童年经典蓝精灵之百变蓝爸爸数字藏品中奖名单公布
- Classic illustration of K-line diagram (Collection Edition)
- 画虎国手孟祥顺数字藏品限量发售,随赠虎年茅台
- 数据安全解决方案的大时代
猜你喜欢

畫虎國手孟祥順數字藏品限量發售,隨贈虎年茅臺

Classic illustration of K-line diagram (Collection Edition)

@Sneakythlows annotation

Game maker Foundation presents: Valley of belonging

Linux Installation mysql5
MSYQL, redis, mongodb visual monitoring tool grafana

数据链路层

Jmeter之BeanShell详解和夸线程调用

一小时构建示例场景 声网发布灵隼物联网云平台

Finally, Amazon~
随机推荐
Talk about the delta configuration of Eureka
Flutter calls Baidu map app to realize location search and route planning
JMeter BeanShell explanation and thread calling
JVM (4) bytecode technology + runtime optimization
Flume配置2——监控之Ganglia
Flume theory
nacos 问题
7.取消与关闭
Kdd 2022 | prise en compte de l'alignement et de l'uniformité des représentations dans le Filtrage collaboratif
Classic illustration of K-line diagram (Collection Edition)
The list of winners in the classic Smurfs of childhood: bluedad's digital collection was announced
Measures to support the development of advanced manufacturing industry in Futian District of Shenzhen in 2022
PHP implementation extracts non repeated integers (programming topics can be the fastest familiar functions)
static静态成员变量使用@Value注入方式
Flume理论
How to use filters in jfinal to monitor Druid for SQL execution?
数据链路层
One hour to build a sample scenario sound network to release lingfalcon Internet of things cloud platform
软件工程—原理、方法与应用
mysql远程连接