当前位置:网站首页>Understand an article: Spark operation mode
Understand an article: Spark operation mode
2022-06-13 10:29:00 【TiAmo zhang】

Spark Performance tuning and principle analysis
01、Spark Operation mode
function Spark Applications for , In fact, only two roles are needed ,Driver and Executor.Driver Responsible for dividing the user's application into multiple Job, Divided into multiple Task, take Task Submitted to the Executor Run in .Executor Responsible for running these Task And return the running result to Driver Program .Driver and Executor It doesn't really care where it runs , As long as you can start Java process , take Driver Procedure and Executor Run up , And can make Driver and Executor Just communicate . So according to Driver and Executor Various deployment modes are divided according to the different operation locations of . Run in different environments Executor, It's all through SchedulerBackend Interfaces implemented by different implementation classes .SchedulerBackend Through different cluster managers (Cluster Manager) Interact , Implement resource scheduling in different clusters . Its architecture is shown in the figure 1 Shown .

▍ chart 1 Spark Deployment mode
Spark The operation of the task , The operation mode can be divided into two categories . That is, local operation and cluster operation .Spark Running in local mode is usually used when developing tests , This mode is implemented through a local JVM Running simultaneously in the process driver and 1 individual executor process , Realization Spark Local running of tasks . When running in a cluster ,Spark At present, it can be done in Spark Standalone colony 、YARN colony 、Mesos colony 、Kubernetes Running in cluster . The essence of its implementation is to consider how to Spark Of Driver The process and Executor Processes are scheduled in the cluster , And implement Dirver and Executor communicate . If these two problems are solved , That's the solution Spark Most problems with running tasks in a cluster . every last Spark Of Application There will be one Driver And one or more Executor. When running in a cluster , Multiple Executor It must be running in a cluster . and Driver Program , Can run in a cluster , You can also run outside the cluster , I.e. submit Spark The task runs on the machine . When Driver When the program is running in a cluster , go by the name of cluster Pattern , When Driver When the program is running outside the cluster , be called client Pattern .Spark The running mode in the cluster is shown in the figure 2 Shown .

▍ chart 2 Spark Cluster operation mode
Submitting Spark When the task , Can pass spark-submit Script pass --mater Parameter specifies the resource manager of the cluster , adopt --deploy-mode Parameters are specified with client Mode is still running in cluster mode . You can also hard - code to specify Mater.Spark Supported by Mater Common parameters are shown in the following table .

1.Local Pattern
stay Local In operation mode ,Driver and Executor Running on the same node JVM in . stay Local In mode , Only one has been activated Executor. According to Master URL,Executor You can start different worker threads in , Used to perform Task.Local Pattern Driver and Executor The relationship is as follows 3 Shown .

▍ chart 3 Local Pattern
2.Spark Standalone
Spark In addition to providing Spark Outside the computing framework of the application , It also provides a simple set of resource managers . The resource manager consists of Master and Worker form .Master Responsible for all Worker Running state management , Such as Worker Available in the CPU、 Available memory and other information ,Master Also responsible for Spark Application registration , When there is a new Spark The application is submitted to Spark In a cluster ,Master Be responsible for dividing the resources required by the application , notice Worker start-up Driver or Executor.Worker Be responsible for..., on this node Driver or Executor Start and stop of , towards Master Send heartbeat information, etc .Spark Standalone The cluster operation is shown in the figure 4 Shown .

▍ chart 4 Spark Cluster operation diagram
Spark The task runs in Spark In a cluster , stay client In mode , User execution spark-submit After script , It will run user written directly on the execution node main function . In user written main Function will execute SparkContext The initialization . stay SparkConext During initialization , The process will move forward Spark Clustered Master Nodes send messages , towards Spark The cluster registers a Spark Applications ,Master After the node receives the message , According to the requirements of the application , Notice on Worker Start the corresponding Executor,Executor After starting , Will reverse register to Driver In progress . here Driver You can know all the available Executor, In execution Task when , take Task Submit to registered Executor On . The flow chart of its operation is as follows 5 Shown .

▍ chart 5 Spark colony client Mode submission process
stay cluster In mode , User written main The function is Driver The process is not executing spark-submit On the node of , But in spark-submit A process is temporarily started on the node , This process goes to Master The node sends a notification ,Master Nodes in the Worker Start in node Driver process , Run a user written program , When Driver After the process runs in the cluster ,spark-submit The process started on the node will exit automatically , Its subsequent registration Application The process of , And client The pattern is exactly the same .cluster In mode , Submit Spark The flow of the application is as follows 6 Shown .

▍ chart 6 Spark colony cluster Mode submission process
3. Other resource management clusters
It has been stated repeatedly in the previous article ,Driver The process and Executor Processes don't really care where they are running , As long as there is CPU And memory , Can guarantee Driver The process and Executor The process can communicate normally , It runs the same everywhere . stay client and cluster In both operating modes , This feature is well reflected , First of all, we need to put Driver The process runs , The subsequent process is the same .Spark Applications can run in different resource management clusters , This feature is also well presented .Spark Application's Driver The process and Executor Processes can be scheduled in different resource managers, such as YARN、Mesos、Kubernetes. Its scheduling process is similar to Spark Standalone Clusters are similar , No more details here .
边栏推荐
- Double carbon in every direction: green demand and competition focus in the calculation from the east to the West
- deepin系统中Qt5.12无法输入中文(无法切换中文输入法)解决办法
- Simple query cost estimation [Gauss is not a mathematician this time]
- 修饰模式和代理模式的异同
- 2022甘肃省安全员C证上岗证题目及在线模拟考试
- Smart210 uses SD card to burn uboot
- ADG standby mrp0 status wait_ FOR_ GAP
- Thingsboard tutorial (20): filtering telemetry data using regular chains
- 记一次水平越权漏洞的利用
- 36 krypton launched | built domestic actuarial forecasting engine and other products, and "Shenzhen light technology" completed three consecutive rounds of financing
猜你喜欢

On the exploitation of a horizontal ultra vires vulnerability

SQL Server创建windows登录账户找不到用户或组

go-zero微服务实战系列(三、API定义和表结构设计)

China SaaS industry panorama

基于SSM实现水果商城批发平台
![[image denoising] image denoising based on MATLAB Gaussian + mean + median + bilateral filtering [including Matlab source code 1872]](/img/8d/3c2664738ad5ab11a35b7aadc8eb88.png)
[image denoising] image denoising based on MATLAB Gaussian + mean + median + bilateral filtering [including Matlab source code 1872]

MySQL利用E-R模型的数据库概念设计

Idea life extension plug-in

Blue Bridge Cup group 2021a - two way sorting

WIN7无法被远程桌面问题
随机推荐
Node red series (24): use MySQL nodes in node red to add, delete, modify and query databases
聊聊 C# 方法重载的底层玩法
单例模式的实现
IDEA 续命插件
LeetCode 2016. Maximum difference between incremental elements
Double carbon in every direction: green demand and competition focus in the calculation from the east to the West
QTcpServer. QTcpSocket. QUdpSocket之间的区别
Install Kubernetes 1.24
递归想法和实现介绍,消除递归
技术管理进阶——管理者可以使用哪些管理工具
Index query list injects MySQL and executes Oracle
vivo大规模 Kubernetes 集群自动化运维实践
Actual combat simulation │ real time error alarm of enterprise wechat robot
实战模拟│企业微信机器人实时报错预警
Advanced technology management - what management tools can managers use
电解电容、钽电容、普通电容
记一次水平越权漏洞的利用
Implementation of fruit mall wholesale platform based on SSM
【ELM分类】基于粒子群优化卷积神经网络CNN结合极限学习机ELM实现数据分类附matlab代码
关于#数据库#的问题:反复检查过了查不出来