当前位置:网站首页>Understand an article: Spark operation mode

Understand an article: Spark operation mode

2022-06-13 10:29:00 TiAmo zhang

Spark Performance tuning and principle analysis

01、Spark Operation mode

function Spark Applications for , In fact, only two roles are needed ,Driver and Executor.Driver Responsible for dividing the user's application into multiple Job, Divided into multiple Task, take Task Submitted to the Executor Run in .Executor Responsible for running these Task And return the running result to Driver Program .Driver and Executor It doesn't really care where it runs , As long as you can start Java process , take Driver Procedure and Executor Run up , And can make Driver and Executor Just communicate . So according to Driver and Executor Various deployment modes are divided according to the different operation locations of . Run in different environments Executor, It's all through SchedulerBackend Interfaces implemented by different implementation classes .SchedulerBackend Through different cluster managers (Cluster Manager) Interact , Implement resource scheduling in different clusters . Its architecture is shown in the figure 1 Shown .

▍ chart 1 Spark Deployment mode

Spark The operation of the task , The operation mode can be divided into two categories . That is, local operation and cluster operation .Spark Running in local mode is usually used when developing tests , This mode is implemented through a local JVM Running simultaneously in the process driver and 1 individual executor process , Realization Spark Local running of tasks . When running in a cluster ,Spark At present, it can be done in Spark Standalone colony 、YARN colony 、Mesos colony 、Kubernetes Running in cluster . The essence of its implementation is to consider how to Spark Of Driver The process and Executor Processes are scheduled in the cluster , And implement Dirver and Executor communicate . If these two problems are solved , That's the solution Spark Most problems with running tasks in a cluster . every last Spark Of Application There will be one Driver And one or more Executor. When running in a cluster , Multiple Executor It must be running in a cluster . and Driver Program , Can run in a cluster , You can also run outside the cluster , I.e. submit Spark The task runs on the machine . When Driver When the program is running in a cluster , go by the name of cluster Pattern , When Driver When the program is running outside the cluster , be called client Pattern .Spark The running mode in the cluster is shown in the figure 2 Shown .

▍ chart 2 Spark Cluster operation mode

Submitting Spark When the task , Can pass spark-submit Script pass --mater Parameter specifies the resource manager of the cluster , adopt --deploy-mode Parameters are specified with client Mode is still running in cluster mode . You can also hard - code to specify Mater.Spark Supported by Mater Common parameters are shown in the following table .

1.Local Pattern

stay Local In operation mode ,Driver and Executor Running on the same node JVM in . stay Local In mode , Only one has been activated Executor. According to Master URL,Executor You can start different worker threads in , Used to perform Task.Local Pattern Driver and Executor The relationship is as follows 3 Shown .

▍ chart 3 Local Pattern

2.Spark Standalone

Spark In addition to providing Spark Outside the computing framework of the application , It also provides a simple set of resource managers . The resource manager consists of Master and Worker form .Master Responsible for all Worker Running state management , Such as Worker Available in the CPU、 Available memory and other information ,Master Also responsible for Spark Application registration , When there is a new Spark The application is submitted to Spark In a cluster ,Master Be responsible for dividing the resources required by the application , notice Worker start-up Driver or Executor.Worker Be responsible for..., on this node Driver or Executor Start and stop of , towards Master Send heartbeat information, etc .Spark Standalone The cluster operation is shown in the figure 4 Shown .

▍ chart 4 Spark Cluster operation diagram

Spark The task runs in Spark In a cluster , stay client In mode , User execution spark-submit After script , It will run user written directly on the execution node main function . In user written main Function will execute SparkContext The initialization . stay SparkConext During initialization , The process will move forward Spark Clustered Master Nodes send messages , towards Spark The cluster registers a Spark Applications ,Master After the node receives the message , According to the requirements of the application , Notice on Worker Start the corresponding Executor,Executor After starting , Will reverse register to Driver In progress . here Driver You can know all the available Executor, In execution Task when , take Task Submit to registered Executor On . The flow chart of its operation is as follows 5 Shown .

▍ chart 5 Spark colony client Mode submission process

stay cluster In mode , User written main The function is Driver The process is not executing spark-submit On the node of , But in spark-submit A process is temporarily started on the node , This process goes to Master The node sends a notification ,Master Nodes in the Worker Start in node Driver process , Run a user written program , When Driver After the process runs in the cluster ,spark-submit The process started on the node will exit automatically , Its subsequent registration Application The process of , And client The pattern is exactly the same .cluster In mode , Submit Spark The flow of the application is as follows 6 Shown .

 

▍ chart 6 Spark colony cluster Mode submission process

3. Other resource management clusters

It has been stated repeatedly in the previous article ,Driver The process and Executor Processes don't really care where they are running , As long as there is CPU And memory , Can guarantee Driver The process and Executor The process can communicate normally , It runs the same everywhere . stay client and cluster In both operating modes , This feature is well reflected , First of all, we need to put Driver The process runs , The subsequent process is the same .Spark Applications can run in different resource management clusters , This feature is also well presented .Spark Application's Driver The process and Executor Processes can be scheduled in different resource managers, such as YARN、Mesos、Kubernetes. Its scheduling process is similar to Spark Standalone Clusters are similar , No more details here .

原网站

版权声明
本文为[TiAmo zhang]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/164/202206130929309229.html