当前位置:网站首页>Understand an article: Spark operation mode
Understand an article: Spark operation mode
2022-06-13 10:29:00 【TiAmo zhang】
Spark Performance tuning and principle analysis
01、Spark Operation mode
function Spark Applications for , In fact, only two roles are needed ,Driver and Executor.Driver Responsible for dividing the user's application into multiple Job, Divided into multiple Task, take Task Submitted to the Executor Run in .Executor Responsible for running these Task And return the running result to Driver Program .Driver and Executor It doesn't really care where it runs , As long as you can start Java process , take Driver Procedure and Executor Run up , And can make Driver and Executor Just communicate . So according to Driver and Executor Various deployment modes are divided according to the different operation locations of . Run in different environments Executor, It's all through SchedulerBackend Interfaces implemented by different implementation classes .SchedulerBackend Through different cluster managers (Cluster Manager) Interact , Implement resource scheduling in different clusters . Its architecture is shown in the figure 1 Shown .
▍ chart 1 Spark Deployment mode
Spark The operation of the task , The operation mode can be divided into two categories . That is, local operation and cluster operation .Spark Running in local mode is usually used when developing tests , This mode is implemented through a local JVM Running simultaneously in the process driver and 1 individual executor process , Realization Spark Local running of tasks . When running in a cluster ,Spark At present, it can be done in Spark Standalone colony 、YARN colony 、Mesos colony 、Kubernetes Running in cluster . The essence of its implementation is to consider how to Spark Of Driver The process and Executor Processes are scheduled in the cluster , And implement Dirver and Executor communicate . If these two problems are solved , That's the solution Spark Most problems with running tasks in a cluster . every last Spark Of Application There will be one Driver And one or more Executor. When running in a cluster , Multiple Executor It must be running in a cluster . and Driver Program , Can run in a cluster , You can also run outside the cluster , I.e. submit Spark The task runs on the machine . When Driver When the program is running in a cluster , go by the name of cluster Pattern , When Driver When the program is running outside the cluster , be called client Pattern .Spark The running mode in the cluster is shown in the figure 2 Shown .
▍ chart 2 Spark Cluster operation mode
Submitting Spark When the task , Can pass spark-submit Script pass --mater Parameter specifies the resource manager of the cluster , adopt --deploy-mode Parameters are specified with client Mode is still running in cluster mode . You can also hard - code to specify Mater.Spark Supported by Mater Common parameters are shown in the following table .
1.Local Pattern
stay Local In operation mode ,Driver and Executor Running on the same node JVM in . stay Local In mode , Only one has been activated Executor. According to Master URL,Executor You can start different worker threads in , Used to perform Task.Local Pattern Driver and Executor The relationship is as follows 3 Shown .
▍ chart 3 Local Pattern
2.Spark Standalone
Spark In addition to providing Spark Outside the computing framework of the application , It also provides a simple set of resource managers . The resource manager consists of Master and Worker form .Master Responsible for all Worker Running state management , Such as Worker Available in the CPU、 Available memory and other information ,Master Also responsible for Spark Application registration , When there is a new Spark The application is submitted to Spark In a cluster ,Master Be responsible for dividing the resources required by the application , notice Worker start-up Driver or Executor.Worker Be responsible for..., on this node Driver or Executor Start and stop of , towards Master Send heartbeat information, etc .Spark Standalone The cluster operation is shown in the figure 4 Shown .
▍ chart 4 Spark Cluster operation diagram
Spark The task runs in Spark In a cluster , stay client In mode , User execution spark-submit After script , It will run user written directly on the execution node main function . In user written main Function will execute SparkContext The initialization . stay SparkConext During initialization , The process will move forward Spark Clustered Master Nodes send messages , towards Spark The cluster registers a Spark Applications ,Master After the node receives the message , According to the requirements of the application , Notice on Worker Start the corresponding Executor,Executor After starting , Will reverse register to Driver In progress . here Driver You can know all the available Executor, In execution Task when , take Task Submit to registered Executor On . The flow chart of its operation is as follows 5 Shown .
▍ chart 5 Spark colony client Mode submission process
stay cluster In mode , User written main The function is Driver The process is not executing spark-submit On the node of , But in spark-submit A process is temporarily started on the node , This process goes to Master The node sends a notification ,Master Nodes in the Worker Start in node Driver process , Run a user written program , When Driver After the process runs in the cluster ,spark-submit The process started on the node will exit automatically , Its subsequent registration Application The process of , And client The pattern is exactly the same .cluster In mode , Submit Spark The flow of the application is as follows 6 Shown .
▍ chart 6 Spark colony cluster Mode submission process
3. Other resource management clusters
It has been stated repeatedly in the previous article ,Driver The process and Executor Processes don't really care where they are running , As long as there is CPU And memory , Can guarantee Driver The process and Executor The process can communicate normally , It runs the same everywhere . stay client and cluster In both operating modes , This feature is well reflected , First of all, we need to put Driver The process runs , The subsequent process is the same .Spark Applications can run in different resource management clusters , This feature is also well presented .Spark Application's Driver The process and Executor Processes can be scheduled in different resource managers, such as YARN、Mesos、Kubernetes. Its scheduling process is similar to Spark Standalone Clusters are similar , No more details here .
边栏推荐
- ThingsBoard教程(二一):使用消息类型和数据处理节点对数据处理后保存
- [image denoising] image denoising based on MATLAB Gaussian + mean + median + bilateral filtering [including Matlab source code 1872]
- Matlab hub motor analysis fuzzy PID control vertical vibration analysis
- 技术管理进阶——管理者可以使用哪些管理工具
- Solution to qt5.12 unable to input Chinese (unable to switch Chinese input method) in deepin system
- 记几次略有意思的 XSS 漏洞发现
- Install Kubernetes 1.24
- vivo大规模 Kubernetes 集群自动化运维实践
- Modification of string class object
- Webrtc server engineering practice and optimization exploration
猜你喜欢
计算循环冗余码--摘录
Consolas-with-Yahei
Apple zoom! It's done so well
Spark source code (I) how spark submit submits jars and configuration parameters to spark server
Talk about the bottom playing method of C # method overloading
C# 11 新特性:接口中的静态抽象成员
Redundancy code question type -- the difference between adding 0 after
deepin系统中Qt5.12无法输入中文(无法切换中文输入法)解决办法
Smart210 uses SD card to burn uboot
Index query list injects MySQL and executes Oracle
随机推荐
Interrupt handling mechanism
六月集训(第13天) —— 双向链表
ThingsBoard教程(二十):使用规则链过滤遥测数据
deepin系统中Qt5.12无法输入中文(无法切换中文输入法)解决办法
Node red series (25): integrate Gaode map and realize 3D map and track playback
Thingsboard tutorial (21): save data after processing using message types and data processing nodes
Vivo large scale kubernetes cluster automation operation and maintenance practice
Introduction to knowledge map
类文件结构和类加载过程执行引擎简述
Blue Bridge Cup group 2021a - two way sorting
Node red series (27): instructions for S7 node of the extension node
关于指令集位数,指令构架位数简述
全栈开发实战|SSM框架整合开发
Wait for someone with "source" | openharmony growth plan student challenge registration to start
测试人员必须掌握的测试用例
【轴承故障分解】基于matlab ITD轴承故障信号分解【含Matlab源码 1871期】
Double carbon in every direction: green demand and competition focus in the calculation from the east to the West
递归想法和实现介绍,消除递归
Modification of string class object
聊聊 C# 方法重载的底层玩法