当前位置:网站首页>Spark core concepts: Master, worker, driver program, executor, RDDS
Spark core concepts: Master, worker, driver program, executor, RDDS
2022-06-13 03:29:00 【TRX1024】
One 、Master
Spark Design of special resource scheduling system Leader, In charge of the resource information of the whole cluster (Standalone Pattern ), Be similar to Yarn In the cluster ResourceManager.
The main function :
- monitor Worker, Look at the Worker Whether it works properly ;
- management Worker、Application( receive Worker Register and manage all Worker; receive Client The submitted Application, Scheduling waiting Application And to Worker Submit ).
Two 、Worker
Spark Design of special resource scheduling system Slaver, There are multiple in a cluster Slaver(Standalone), Every Slaver In charge of the resource information of the node , Be similar to Yarn In the framework NodeManager.
The main function :
- adopt RegisterWorker Sign up to Master;
- Send heartbeat to Master;
- according to Master Sent Application Configure the process environment , And start the ExecutorBackend( perform Task The required temporary process ).
3、 ... and 、Driver Program( The driver )
Every Spark Applications contain a driver , The driver is responsible for publishing parallel operations to the cluster . The driver contains Spark In the application main() function . stay WordCount In the case ,spark-shell It's our driver , So we can type in whatever we want , It is then responsible for publishing . The driver passes SparkContext Object to access Spark,SparkContext Object is equivalent to a to Spark Cluster connectivity ( Use it to connect Spark colony ).
Four 、Executor( actuator )
SparkContext Once the object is successfully connected to the cluster manager , You can get the executors on each node in the cluster (Executor). The actuator is a process ( Process name : ExecutorBackend, Running on the Worker Node ), Used to perform calculations and store intermediate data for applications .
Spark Will send the application code ( such as :jar package ) To each actuator , Last ,SparkContext Object sends a task to the executor to start executing the program .
Master、Worker、Driver Program、Executor The relationship between :
5、 ... and 、RDD(Resilient Distributed Dataset) Elastic distributed data sets
RDD yes Spark The core abstraction of data , It's actually a collection of distributed elements . stay Spark in , All operations on data are nothing more than creation RDD、 Transformation already exists RDD And call RDD Operation to evaluate . And behind all this , Spark Will automatically RDD The data in is distributed to the cluster , And parallelize the operation .
Spark in RDD Is an immutable collection of distributed objects , therefore , Yes RDD Each operation of the generates a new RDD.
6、 ... and 、 List of technical terms
The term | meaning |
---|---|
Application | be based on Spark Built user programs , Generally, it includes one on the cluster driver Programs with multiple executor. |
Application jar | Contains the user's Spark application One of the jar package . |
Driver program | function application Of main() Functions and create SparkContext The process of . |
Cluster manager | An external service to obtain cluster resources , such as standalone Manager , Mesos and YARN. |
Deploy mode | distinguish driver Where the process runs . stay “cluster” In mode , The framework runs inside the cluster driver. stay “client” In mode , The submitter runs outside the cluster driver. |
Worker node | Any one in the cluster can run application The nodes of the code |
Executor | stay worker node On application A process started , The process is running task And save the data in memory or disk . Every application Have their own executor. |
Task | Send to a executor A series of work |
Job | By multiple task Composed of a parallel computation , these task From a Spark action ( such as , save, collect) operation . |
Stage | Every job It's broken down into multiple stage, Every stage In fact, it's just some task Set , these stage Interdependence between ( And MapReduce Medium map And reduce stage similar ), In the process of execution , Every time I come across one shuffle It's just one. stage. |
边栏推荐
- Polymorphism in golang
- Three ways to start WPF project
- Use of compact, extract and list functions in PHP
- Masa Auth - SSO and Identity Design
- Masa auth - overall design from the user's perspective
- Differences between XAML and XML
- Technology blog, a treasure trove of experience sharing
- [azure data platform] ETL tool (7) - detailed explanation of ADF copy data
- MySQL transaction isolation level experiment
- 年金险产品保险期满之后能领多少钱?
猜你喜欢
Loading process of [JVM series 3] classes
MASA Auth - 从用户的角度看整体设计
[azure data platform] ETL tool (4) - azure data factory debug pipeline
Azure SQL db/dw series (10) -- re understanding the query store (3) -- configuring the query store
[JVM series 8] overview of JVM knowledge points
brew工具-“fatal: Could not resolve HEAD to a revision”错误解决
Unified scheduling and management of dataX tasks through web ETL
(九)详解广播机制
YoloV5-Face+TensorRT:基于WIN10+TensorRT8.2+VS2019得部署
look on? What is the case between neo4j and ongdb?
随机推荐
2000-2019 enterprise registration data of provinces, cities and counties in China (including longitude and latitude, number of registrations and other multi indicator information)
MASA Auth - SSO與Identity設計
KITTI数据集无法下载的解决方法
Sparksql of spark
PHP import classes in namespace
PHP uses the header function to download files
Understanding the ongdb open source map data foundation from the development of MariaDB
Isolation level, unreal read, gap lock, next key lock
Aggregation analysis of research word association based on graph data
C语言程序设计——从键盘任意输入一个字符串(可以包含:字母、数字、标点符号,以及空格字符),计算其实际字符个数并打印输出,即不使用字符串处理函数strlen()编程,但能实现strlen()的功能。
Parallel one degree relation query
Nuggets new oil: financial knowledge map data modeling and actual sharing
Economic panel topic 1: panel data of all districts and counties in China - more than 70 indicators such as population, pollution and agriculture (2000-2019)
Simulink代码生成: 简单状态机及其代码
Use PHP to count command line calls on your computer
年金险产品保险期满之后能领多少钱?
Peking University HP financial index - matching enterprise green innovation index 2011-2020: enterprise name, year, industry classification and other multi indicator data
Union, intersection and difference sets of different MySQL databases
A personal understanding of interpreted and compiled languages
look on? What is the case between neo4j and ongdb?