当前位置:网站首页>Spark core concepts: Master, worker, driver program, executor, RDDS
Spark core concepts: Master, worker, driver program, executor, RDDS
2022-06-13 03:29:00 【TRX1024】
One 、Master
Spark Design of special resource scheduling system Leader, In charge of the resource information of the whole cluster (Standalone Pattern ), Be similar to Yarn In the cluster ResourceManager.
The main function :
- monitor Worker, Look at the Worker Whether it works properly ;
- management Worker、Application( receive Worker Register and manage all Worker; receive Client The submitted Application, Scheduling waiting Application And to Worker Submit ).
Two 、Worker
Spark Design of special resource scheduling system Slaver, There are multiple in a cluster Slaver(Standalone), Every Slaver In charge of the resource information of the node , Be similar to Yarn In the framework NodeManager.
The main function :
- adopt RegisterWorker Sign up to Master;
- Send heartbeat to Master;
- according to Master Sent Application Configure the process environment , And start the ExecutorBackend( perform Task The required temporary process ).
3、 ... and 、Driver Program( The driver )
Every Spark Applications contain a driver , The driver is responsible for publishing parallel operations to the cluster . The driver contains Spark In the application main() function . stay WordCount In the case ,spark-shell It's our driver , So we can type in whatever we want , It is then responsible for publishing . The driver passes SparkContext Object to access Spark,SparkContext Object is equivalent to a to Spark Cluster connectivity ( Use it to connect Spark colony ).
Four 、Executor( actuator )
SparkContext Once the object is successfully connected to the cluster manager , You can get the executors on each node in the cluster (Executor). The actuator is a process ( Process name : ExecutorBackend, Running on the Worker Node ), Used to perform calculations and store intermediate data for applications .
Spark Will send the application code ( such as :jar package ) To each actuator , Last ,SparkContext Object sends a task to the executor to start executing the program .
Master、Worker、Driver Program、Executor The relationship between :

5、 ... and 、RDD(Resilient Distributed Dataset) Elastic distributed data sets
RDD yes Spark The core abstraction of data , It's actually a collection of distributed elements . stay Spark in , All operations on data are nothing more than creation RDD、 Transformation already exists RDD And call RDD Operation to evaluate . And behind all this , Spark Will automatically RDD The data in is distributed to the cluster , And parallelize the operation .
Spark in RDD Is an immutable collection of distributed objects , therefore , Yes RDD Each operation of the generates a new RDD.
6、 ... and 、 List of technical terms
| The term | meaning |
|---|---|
Application | be based on Spark Built user programs , Generally, it includes one on the cluster driver Programs with multiple executor. |
Application jar | Contains the user's Spark application One of the jar package . |
Driver program | function application Of main() Functions and create SparkContext The process of . |
Cluster manager | An external service to obtain cluster resources , such as standalone Manager , Mesos and YARN. |
Deploy mode | distinguish driver Where the process runs . stay “cluster” In mode , The framework runs inside the cluster driver. stay “client” In mode , The submitter runs outside the cluster driver. |
Worker node | Any one in the cluster can run application The nodes of the code |
Executor | stay worker node On application A process started , The process is running task And save the data in memory or disk . Every application Have their own executor. |
Task | Send to a executor A series of work |
Job | By multiple task Composed of a parallel computation , these task From a Spark action ( such as , save, collect) operation . |
Stage | Every job It's broken down into multiple stage, Every stage In fact, it's just some task Set , these stage Interdependence between ( And MapReduce Medium map And reduce stage similar ), In the process of execution , Every time I come across one shuffle It's just one. stage. |
边栏推荐
- 【同步功能】2.0.16-19 版本都有同步功能修复的更新,但未解决问题
- 2000-2019 enterprise registration data of provinces, cities and counties in China (including longitude and latitude, number of registrations and other multi indicator information)
- (九)详解广播机制
- The most complete ongdb and neo4j resource portal in history
- Masa auth - overall design from the user's perspective
- 2-year experience summary to tell you how to do a good job in project management
- 技术博客,经验分享宝典
- This article takes you to learn DDD, basic introduction
- Solve the error in CONDA installation PyMOL
- [azure data platform] ETL tool (8) - ADF dataset and link service
猜你喜欢
![[azure data platform] ETL tool (3) - azure data factory copy from local data source to azure](/img/c3/e4b118a378ce8d884163aa1709a7f5.jpg)
[azure data platform] ETL tool (3) - azure data factory copy from local data source to azure

Simulink代码生成: 简单状态机及其代码
![[azure data platform] ETL tool (6) -- re understanding azure data factory](/img/b5/da5dc9815fb9729fb367f2482913b7.jpg)
[azure data platform] ETL tool (6) -- re understanding azure data factory

Panel for measuring innovation efficiency of 31 provinces in China (using Malmquist method)

(九)详解广播机制

2000-2019 enterprise registration data of all provinces, cities and counties in China (including longitude and latitude, registration number and other multi indicator information)

MASA Auth - SSO與Identity設計

MySQL transaction isolation level experiment
![[azure data platform] ETL tool (7) - detailed explanation of ADF copy data](/img/d1/7c35e77a2b4f06dd9cef918da1104e.jpg)
[azure data platform] ETL tool (7) - detailed explanation of ADF copy data

Sparksql of spark
随机推荐
Common command records of redis client
Sparksql of spark
CXGRID keeps the original display position after refreshing the data
The most complete ongdb and neo4j resource portal in history
Graph data modeling tool
C method parameter: in
[azure data platform] ETL tool (3) - azure data factory copy from local data source to azure
Azure SQL db/dw series (11) -- re understanding the query store (4) -- Query store maintenance
Technology blog, a treasure trove of experience sharing
[azure data platform] ETL tool (1) -- Introduction to azure data factory
Large attachment fragment upload and breakpoint continuation
C# . NET ASP. Net relationships and differences
[JVM Series 5] performance testing tool
在JDBC连接数据库时报错:Connection to 139.9.130.37:15400 refused.
Level II C preparation -- basic concepts of program design
MySQL learning summary 6: data type, integer, floating point number, fixed-point number, text string, binary string
Solve the error in CONDA installation PyMOL
On the career crisis of programmers at the age of 35
[azure data platform] ETL tool (9) -- ADF performance optimization case sharing (1)
Neo4j auradb free, the world's leading map database