当前位置：网站首页>Spark kernel (execution principle) environment preparation /spark job submission process

Spark kernel (execution principle) environment preparation /spark job submission process

2022-06-13 03:34:00 【TRX1024】

understand Spark Task submission to running process , There are two stages ：

The first stage is in Yarn Execute outside the cluster , Mainly the submission of homework , Submit the assignment to Yarn Until cluster .

The second stage is Yarn The cluster to perform , Involving resource allocation 、AM、Driver、Executor And other components .

One 、 The first stage ： Homework submission

1. Submit script

 Job submission script 
 Generally submit a spark The mode of operation is spark-submit To submit 
# Run on a Spark standalone cluster
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

2. Use spark-submit Order to commit , Look at the command script

#!/usr/bin/env bash
# 、 notes 
if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi
# disable randomized hash for string in Python 3.3+
export PYTHONHASHSEED=0
exec "${SPARK_HOME}"/bin/spark-class org.apache.spark.deploy.SparkSubmit "[email protected]"

3. In fact, it's called org.apache.spark.deploy.SparkSubmit, This is the starting point for submitting the job

def doSubmit(args: Array[String]): Unit = {
    // Initialize logging if it hasn't been done yet. Keep track of whether logging needs to
    // be reset before the application starts.
    val uninitLog = initializeLogIfNecessary(true, silent = true)

    val appArgs = parseArguments(args)
    if (appArgs.verbose) {
      logInfo(appArgs.toString)
    }
    appArgs.action match {
      case SparkSubmitAction.SUBMIT => submit(appArgs, uninitLog)
      case SparkSubmitAction.KILL => kill(appArgs)
      case SparkSubmitAction.REQUEST_STATUS => requestStatus(appArgs)
      case SparkSubmitAction.PRINT_VERSION => printVersion()
    }
  }

Did two things ：

Parsing command parameters
call submit()

4.submit() towards yarn Submit tasks

//  By setting the appropriate classpath , System properties and application parameters to prepare the boot environment , To run sub main classes based on cluster management and deployment mode .
    val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)
     Method will be spark Prepare for submission , Prepare the operating environment

Ready to start environment
YarnClusterApplication Class creation client call submitApplication() Submit task to yarn

5. Now the task has been submitted to Yarn Of ResourcesManager The first phase ends

Two 、 The second stage Yarn colony

Paste the flow chart again ：

explain ：

【RM】 It means that RM The operating

【AM】 It means that AM The operating

【Executor】 It means that NM Upper Executor Process operation

step ：

1.【RM】ResourcesManager start-up ApplicationManager（AM） process

client Will apply to RM One Container, To start up AM process

stay ApplicationManager in

There is one yarnRmClinet To follow RM communicate
There is one nmClient Used to communicate with other nodes

2.【AM】AM Start according to the parameters Driver Threads , And initialization SparkContext

3.【AM】 adopt yarnRmClinet And RM Connect , register AM, Application resources

4.【RM】 Return to the list of available resources

according to Preferred location And other related parameter information to allocate container resources

5.【AM】 Handling allocatable containers

At this time, the container information assigned to has been obtained

Run the allocated container （ To specify the NM Start the container ）

6.【AM】AM Start in the allocated running container in the thread pool of Executor process

7.【Executor】 Get into Executor

Important attributes ：

driver object , And AM Medium Driver communicate
ExecutorEnv Executor Environmental Science
MassageLoop Inbox outbox

8.【Executor】driver Object to Driver register Executor（ Reverse registration ）

9.【AM】Driver reply Executor Registered successfully

10.【Executor】 Create and launch Executor Calculation object

3、 ... and 、Spark General operation process

The picture above shows Spark General operation flow chart , Embodies the basic Spark The basic submission process of an application in deployment .

This process follows the following core steps ：

1) After the task is submitted , Will start first Driver Program ;

2) And then Driver Register the application with the cluster manager ;

3) After that, the cluster manager allocates Executor And start the ;

4) Driver Start execution main function ,Spark Query is lazy execution , When executed Action Operator starts pushing backwards

count , According to wide dependence Stage Division , And then every one of them Stage Corresponding to one Taskset,Taskset How much is in it

individual Task, Find available resources Executor To schedule ;

5) According to the principle of localization ,Task Will be distributed to the designated Executor To carry out , During the execution of the task ,

Executor And will continue to work with Driver communicate , Report on task operation .

原网站

版权声明
本文为[TRX1024]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202280529582615.html

当前位置：网站首页>Spark kernel (execution principle) environment preparation /spark job submission process

Spark kernel (execution principle) environment preparation /spark job submission process

One 、 The first stage ： Homework submission

Two 、 The second stage Yarn colony

3、 ... and 、Spark General operation process

边栏推荐

猜你喜欢

随机推荐