当前位置：网站首页>Spark core concepts: Master, worker, driver program, executor, RDDS

Spark core concepts: Master, worker, driver program, executor, RDDS

2022-06-13 03:29:00 【TRX1024】

One 、Master

Spark Design of special resource scheduling system Leader, In charge of the resource information of the whole cluster （Standalone Pattern ）, Be similar to Yarn In the cluster ResourceManager.

The main function ：

monitor Worker, Look at the Worker Whether it works properly ;
management Worker、Application（ receive Worker Register and manage all Worker; receive Client The submitted Application, Scheduling waiting Application And to Worker Submit ）.

Two 、Worker

Spark Design of special resource scheduling system Slaver, There are multiple in a cluster Slaver（Standalone）, Every Slaver In charge of the resource information of the node , Be similar to Yarn In the framework NodeManager.

The main function ：

adopt RegisterWorker Sign up to Master;
Send heartbeat to Master;
according to Master Sent Application Configure the process environment , And start the ExecutorBackend（ perform Task The required temporary process ）.

3、 ... and 、Driver Program（ The driver ）

Every Spark Applications contain a driver , The driver is responsible for publishing parallel operations to the cluster . The driver contains Spark In the application main() function . stay WordCount In the case ,spark-shell It's our driver , So we can type in whatever we want , It is then responsible for publishing . The driver passes SparkContext Object to access Spark,SparkContext Object is equivalent to a to Spark Cluster connectivity （ Use it to connect Spark colony ）.

Four 、Executor( actuator )

SparkContext Once the object is successfully connected to the cluster manager , You can get the executors on each node in the cluster (Executor). The actuator is a process （ Process name : ExecutorBackend, Running on the Worker Node ）, Used to perform calculations and store intermediate data for applications .

Spark Will send the application code （ such as ：jar package ） To each actuator , Last ,SparkContext Object sends a task to the executor to start executing the program .

Master、Worker、Driver Program、Executor The relationship between ：

5、 ... and 、RDD(Resilient Distributed Dataset) Elastic distributed data sets

RDD yes Spark The core abstraction of data , It's actually a collection of distributed elements . stay Spark in , All operations on data are nothing more than creation RDD、 Transformation already exists RDD And call RDD Operation to evaluate . And behind all this , Spark Will automatically RDD The data in is distributed to the cluster , And parallelize the operation .

Spark in RDD Is an immutable collection of distributed objects , therefore , Yes RDD Each operation of the generates a new RDD.

6、 ... and 、 List of technical terms

The term	meaning
Application	be based on Spark Built user programs , Generally, it includes one on the cluster driver Programs with multiple executor.
Application jar	Contains the user's Spark application One of the jar package .
Driver program	function application Of main() Functions and create SparkContext The process of .
Cluster manager	An external service to obtain cluster resources , such as standalone Manager , Mesos and YARN.
Deploy mode	distinguish driver Where the process runs . stay “cluster” In mode , The framework runs inside the cluster driver. stay “client” In mode , The submitter runs outside the cluster driver.
Worker node	Any one in the cluster can run application The nodes of the code
Executor	stay worker node On application A process started , The process is running task And save the data in memory or disk . Every application Have their own executor.
Task	Send to a executor A series of work
Job	By multiple task Composed of a parallel computation , these task From a Spark action ( such as , save, collect) operation .
Stage	Every job It's broken down into multiple stage, Every stage In fact, it's just some task Set , these stage Interdependence between ( And MapReduce Medium map And reduce stage similar ), In the process of execution , Every time I come across one shuffle It's just one. stage.