当前位置:网站首页>Analysis of the underlying architecture of spark storage system - spark business environment practice
Analysis of the underlying architecture of spark storage system - spark business environment practice
2022-06-29 20:19:00 【Full stack programmer webmaster】
This series of blogs draw cases from the real business environment to summarize and share , And give Spark Source code interpretation and business practice guidance , Please keep an eye on this blog . Copyright notice : This set Spark Source code interpretation and commercial practice belong to the author ( Qin Kaixin ) all , Prohibited reproduced , Welcome to learn .
Spark Advanced series of business environment practice and optimization
- Spark Business environment practice -Spark Built in frame rpc Communication mechanism and RpcEnv infrastructure
- Spark Business environment practice -Spark Event monitoring bus process analysis
- Spark Business environment practice -Spark Analysis of the underlying architecture of the storage system
- Spark Business environment practice -Spark There are many at the bottom MessageLoop Loop threads perform process analysis
- Spark Business environment practice -Spark Two level dispatching system Stage Analysis of partition algorithm and optimal task scheduling details
- Spark Business environment practice -Spark Task delay scheduling and scheduling pool Pool Architecture analysis
- Spark Business environment practice -Task Granular cache aggregate sort structure AppendOnlyMap Analyze in detail
- Spark Business environment practice -ExternalSorter The sorter is Spark Shuffle Analysis of design ideas in the process
- Spark Business environment practice -StreamingContext Start up process and Dtream Template source code analysis
- Spark Business environment practice -ReceiverTracker And BlockGenerator Analysis of data stream receiving process
1. Spark Explanation of storage system component relationships
BlockInfoManger It mainly provides read / write lock control , The hierarchy is only located in BlockManger under , Usually Spark Read and write operations are called first BlockManger, Then consult BlockInfoManger Is there lock competition , And then call DiskStore and MemStore, And then call DiskBlockManger To determine the data and location mapping , Or call MemoryManger To determine the soft boundary of the memory pool and the memory usage request .
1.1 Driver And Executor And SparkEnv And BlockManger Component relationships :
Driver And Executor Each component has its own task execution SparkEnv Environmental Science , And every one SparkEnv One of them BlockManger Responsible for storage services , As a high-level abstraction ,BlockManger Between the need to pass RPCEnv,ShuffleClient, And BlocakTransferService Communicate with each other .
1.1 BlockInfoManger And BlockInfo Read / write control relationship between shared lock and exclusive lock :
BlockInfo Flag with read / write lock in , The flag can be used to determine whether to perform write control
val NO_WRITER: Long = -1
val NON_TASK_WRITER: Long = -1024
* The task attempt id of the task which currently holds the write lock for this block, or
* [[BlockInfo.NON_TASK_WRITER]] if the write lock is held by non-task code, or
* [[BlockInfo.NO_WRITER]] if this block is not locked for writing.
def writerTask: Long = _writerTask
def writerTask_=(t: Long): Unit = {
_writerTask = t
checkInvariants()
Copy code BlockInfoManager have BlockId And BlockInfo Mapping relationships and tasks id And BlockId Lock mapping for :
private[this] val infos = new mutable.HashMap[BlockId, BlockInfo]
*Tracks the set of blocks that each task has locked for writing.
private[this] val writeLocksByTask = new mutable.HashMap[TaskAttemptId, mutable.Set[BlockId]]
with mutable.MultiMap[TaskAttemptId, BlockId]
*Tracks the set of blocks that each task has locked for reading, along with the number of times
*that a block has been locked (since our read locks are re-entrant).
private[this] val readLocksByTask =
new mutable.HashMap[TaskAttemptId, ConcurrentHashMultiset[BlockId]]
Copy code 1.3 DiskBlockManager And DiskStore Component relationships :
You can see DiskStore Internally it will call DiskBlockManager To make sure Block Read and write position of :
1.3 MemManager And MemStore And MemoryPool Component relationships :
What I want to emphasize here is : The first generation big data framework hadoop Just use memory as a computing resource , and Spark Not only is memory used as a computing resource , It also includes a part of the memory into the storage system :
- Memory pool model : Logically, it is divided into heap memory and off heap memory , Then heap memory ( Or out of heap memory ) The interior is divided into StorageMemoryPool and ExecutionMemoryPool.
- MemManager It is abstract. , Defines the interface specification of the memory manager , Easy to expand later , such as : The original StaticMemoryManager And the new version UnifiedMemoryManager.
- MemStore Depend on UnifiedMemoryManager Apply for memory and change soft boundary or release memory .
- MemStore Internal is also responsible for storing real objects , For example, internal member variables :entries , Established in memory BlockId And MemoryEntry(Block Form of memory ) Mapping between .
- MemStore Inside “ Occupied seat ” Behavior , Such as : Internal variables offHeapUnrollMemoryMap and onHeapUnrollMemoryMap.
1.4 BlockManagerMaster And BlockManager Component relationships :
- BlockManagerMaster The function of is to exist in Dirver or Executor Upper BlockManger Unified management , This is simply an act of agency , Because he holds BlockManagerMasterEndpointREf, Further and BlockManagerMasterEndpoint To communicate .
2. Spark Storage system components BlockTransferServic Transport services
To be continued
3. summary
The storage system is Spark Cornerstone , I try to dissect every tiny piece of knowledge , Unlike most blogs , I will try to use the most plain language , After all, technology is a layer of window paper .
Qin Kaixin 20181031 In the morning
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/101341.html Link to the original text :https://javaforall.cn
边栏推荐
- Ovirt database modify delete node
- Summary of swift optional values
- How to set a pod to run on a specified node
- Three.js开发:粗线的画法
- . NETCORE unified authentication authorization learning - run (1)
- Jmeter之BeanShell详解和夸线程调用
- A great open source image watermarking solution
- 3-2 host discovery - layer 3 discovery
- 第二章(物理层)
- Deficiencies and optimization schemes in Dao
猜你喜欢

liunx指令

Cmake开发-多目录工程
![[compilation principle] semantic analysis](/img/2e/9c17da3dbc758b2985e55201c73f99.png)
[compilation principle] semantic analysis
![[fishing artifact] code tool for lowering the seconds of UI Library -- form part (I) design](/img/ad/0efd744334bf648b149aa1841b58af.png)
[fishing artifact] code tool for lowering the seconds of UI Library -- form part (I) design

【编译原理】语法分析

通过MeterSphere和DataEase实现项目Bug处理进展实时跟进

【Try to Hack】vulnhub narak

18. `bs object Node name next_ sibling` previous_ Sibling get sibling node

CorelDRAW最新24.1.0.360版本更新介绍讲解

Bigder:自动化测试工程师
随机推荐
Oracle reserved word query
Zotero journal Automatic Matching Update Influencing Factors
Comparable比较器写法&ClassCastExcption类转换异常
How to use filters in jfinal to monitor Druid for SQL execution?
Software engineering - principles, methods and Applications
thinkphp5中的配置如何使用
Command execution (RCE) vulnerability
data link layer
3-2 host discovery - layer 3 discovery
Sentinel的快速入门,三分钟带你体验流量控制
社区访谈丨一个IT新人眼中的JumpServer开源堡垒机
[compilation principle] type check
深入Go底层原理,重写Redis中间件实战无密
Freemaker template framework generates images
Spark存储体系底层架构剖析-Spark商业环境实战
mapbox-gl开发教程(十二):加载面图层数据
18. `bs object Node name next_ sibling` previous_ Sibling get sibling node
XSS vulnerability
Tag based augmented reality using OpenCV
偶然发现了另一种跨域方式,不知道有没有人这么玩过