当前位置：网站首页>Analysis of the underlying architecture of spark storage system - spark business environment practice

Analysis of the underlying architecture of spark storage system - spark business environment practice

2022-06-29 20:19:00 【Full stack programmer webmaster】

This series of blogs draw cases from the real business environment to summarize and share , And give Spark Source code interpretation and business practice guidance , Please keep an eye on this blog . Copyright notice ： This set Spark Source code interpretation and commercial practice belong to the author （ Qin Kaixin ） all , Prohibited reproduced , Welcome to learn .

Spark Advanced series of business environment practice and optimization

1. Spark Explanation of storage system component relationships

BlockInfoManger It mainly provides read / write lock control , The hierarchy is only located in BlockManger under , Usually Spark Read and write operations are called first BlockManger, Then consult BlockInfoManger Is there lock competition , And then call DiskStore and MemStore, And then call DiskBlockManger To determine the data and location mapping , Or call MemoryManger To determine the soft boundary of the memory pool and the memory usage request .

1.1 Driver And Executor And SparkEnv And BlockManger Component relationships ：

Driver And Executor Each component has its own task execution SparkEnv Environmental Science , And every one SparkEnv One of them BlockManger Responsible for storage services , As a high-level abstraction ,BlockManger Between the need to pass RPCEnv,ShuffleClient, And BlocakTransferService Communicate with each other .

1.1 BlockInfoManger And BlockInfo Read / write control relationship between shared lock and exclusive lock ：

BlockInfo Flag with read / write lock in , The flag can be used to determine whether to perform write control

  val NO_WRITER: Long = -1
  val NON_TASK_WRITER: Long = -1024
  
 * The task attempt id of the task which currently holds the write lock for this block, or
 * [[BlockInfo.NON_TASK_WRITER]] if the write lock is held by non-task code, or
 * [[BlockInfo.NO_WRITER]] if this block is not locked for writing.
 
 def writerTask: Long = _writerTask
 def writerTask_=(t: Long): Unit = {
 _writerTask = t
    checkInvariants()
 Copy code

BlockInfoManager have BlockId And BlockInfo Mapping relationships and tasks id And BlockId Lock mapping for ：

 private[this] val infos = new mutable.HashMap[BlockId, BlockInfo]  
 
 *Tracks the set of blocks that each task has locked for writing.
 private[this] val writeLocksByTask = new mutable.HashMap[TaskAttemptId, mutable.Set[BlockId]]
                                       with mutable.MultiMap[TaskAttemptId, BlockId]
 
 *Tracks the set of blocks that each task has locked for reading, along with the number of times
 *that a block has been locked (since our read locks are re-entrant).
 private[this] val readLocksByTask =
 new mutable.HashMap[TaskAttemptId, ConcurrentHashMultiset[BlockId]]
 Copy code

1.3 DiskBlockManager And DiskStore Component relationships ：

You can see DiskStore Internally it will call DiskBlockManager To make sure Block Read and write position of ：

1.3 MemManager And MemStore And MemoryPool Component relationships ：

What I want to emphasize here is ： The first generation big data framework hadoop Just use memory as a computing resource , and Spark Not only is memory used as a computing resource , It also includes a part of the memory into the storage system ：

Memory pool model ： Logically, it is divided into heap memory and off heap memory , Then heap memory （ Or out of heap memory ） The interior is divided into StorageMemoryPool and ExecutionMemoryPool.
MemManager It is abstract. , Defines the interface specification of the memory manager , Easy to expand later , such as ： The original StaticMemoryManager And the new version UnifiedMemoryManager.
MemStore Depend on UnifiedMemoryManager Apply for memory and change soft boundary or release memory .
MemStore Internal is also responsible for storing real objects , For example, internal member variables ：entries , Established in memory BlockId And MemoryEntry(Block Form of memory ) Mapping between .
MemStore Inside “ Occupied seat ” Behavior , Such as ： Internal variables offHeapUnrollMemoryMap and onHeapUnrollMemoryMap.

1.4 BlockManagerMaster And BlockManager Component relationships ：

BlockManagerMaster The function of is to exist in Dirver or Executor Upper BlockManger Unified management , This is simply an act of agency , Because he holds BlockManagerMasterEndpointREf, Further and BlockManagerMasterEndpoint To communicate .

2. Spark Storage system components BlockTransferServic Transport services

To be continued

3. summary

The storage system is Spark Cornerstone , I try to dissect every tiny piece of knowledge , Unlike most blogs , I will try to use the most plain language , After all, technology is a layer of window paper .

Qin Kaixin 20181031 In the morning

Publisher ： Full stack programmer stack length , Reprint please indicate the source ：https://javaforall.cn/101341.html Link to the original text ：https://javaforall.cn

原网站

版权声明
本文为[Full stack programmer webmaster]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/180/202206291956462516.html