当前位置：网站首页>Flink deployment mode and runtime architecture (session mode, single job mode, application mode, jobmanager, taskmanager, yarn mode deployment and runtime architecture)

Flink deployment mode and runtime architecture (session mode, single job mode, application mode, jobmanager, taskmanager, yarn mode deployment and runtime architecture)

2022-06-11 12:10:00 【But don't ask about your future】

1. Deployment mode （ Abstract concept ）

1.1 Conversational mode （Session Mode）

Conversational mode , You need to start a cluster first , Keep a conversation , In this session, submit the job through the client . All resources have been determined when the cluster starts , Therefore, all submitted jobs compete for resources in the cluster .
Insert picture description here
The life cycle of the cluster is beyond the job , Release resources when the job is over , The cluster is still running . Of course ： Because resources are shared , So there are not enough resources , Submitting a new job will fail . in addition , The same TaskManager There may be a lot of jobs running on , If one of them fails, it leads to TaskManager Downtime , Then all homework will be affected
Conversational mode Suitable for single small scale 、 Large number of jobs with short execution time

1.2 Single operation mode （Per-Job Mode）

The single operation mode is strictly one-to-one , The cluster is only born for this job . Run the application by the client , Then start the cluster , The assignment is submitted to JobManager, And then distribute it to TaskManager perform . When the job is finished , The cluster will shut down , All resources will be released . Each assignment has its own JobManager management , Occupy exclusive resources , Even in case of failure , its TaskManager Downtime will not affect other jobs .
The single operation mode is more stable in the production environment , yes Preferred mode for practical application

notes ：
Flink Single job mode cannot be run directly by itself , Some resource management frameworks are needed to start the cluster , Such as YARN

1.3 Application mode （Application Mode）

The application code executes on the client , Then the client submits it to JobManager, The client needs to occupy a large amount of network bandwidth to download dependencies and send binary data to JobManager; And the same client is used to submit the job , It will increase the resource consumption of the node where the client is located .
The solution is , Submit the application directly to JobManger Up operation . Launch a separate for each submitted application JobManager, That is to create a cluster . This JobManager Only exist to execute this application , After execution JobManager It's closed
Insert picture description here

1.4 summary

In conversation mode , The life cycle of the cluster is independent of the life cycle of any job running on the cluster , And all jobs submitted share resources . The single job mode creates a cluster for each submitted job , It brings better resource isolation , At this time, the life cycle of the cluster is bound to the life cycle of the job . Last , Application mode creates a session cluster for each application , stay JobManager Call the application directly on main() Method

2. System architecture

2.1 The whole structure

Flink In the runtime architecture of , The most important thing is the two components ： Job manager （JobManger） And task manager （TaskManager）. For a job submitted for execution ,JobManager In the true sense “ managers ”（Master）, Responsible for managing and scheduling , Therefore, without considering high availability, there can only be one ; and TaskManager yes “ Worker ”（Worker、Slave）, Responsible for performing tasks and processing data , So there can be one or more .
Insert picture description here

notes ：

attached mode (default): The yarn-session.sh client submits the Flink cluster to YARN, but the client keeps running, tracking the state of the cluster. If the cluster fails, the client will show the error. If the client gets terminated, it will signal the cluster to shut down as well.

detached mode (-d or --detached): The yarn-session.sh client submits the Flink cluster to YARN, then the client returns. Another invocation of the client, or YARN tools is needed to stop the Flink cluster.

The client is not part of the processing system , It is only responsible for job submission . say concretely , It's the calling program main Method , Convert the code to “ Data flow diagram ”（Dataflow Graph）, And finally generate the job diagram （JobGraph）, Send it to JobManager. After submission , The execution of the task has nothing to do with the client ; You can choose to disconnect from JobManager The connection of , You can also keep connected . When you command to submit a job , Plus -d Parameters , It means separation mode （detached mode), That is, disconnect

2.1.1 Job manager （JobManager）

JobManger contain 3 Different components :
Insert picture description here

1. JobMaster
JobMaster yes JobManager The core component of , Responsible for handling individual operations （Job）. therefore JobMaster And concrete Job It's one-to-one , Multiple Job It can run in one at the same time Flink In the cluster , Every Job All have their own JobMaster. When the job is submitted ,JobMaster The application to be executed will be received first , Include ：Jar package , Data flow diagram （dataflow graph）, And working drawings （JobGraph）.JobMaster Will be able to JobGraph Convert to a physical level data flow diagram , This picture is called “ Execution diagram ”（ExecutionGraph）, It contains all tasks that can be executed concurrently . JobMaster To the explorer （ResourceManager） Request , Apply for the necessary resources to perform the task . Once it gets enough resources , The execution diagrams are distributed to the... That actually run them TaskManager On . And in the process of running ,JobMaster Will be responsible for all operations that need central coordination , Check points, for example （checkpoints） The coordination of

2. Explorer （ResourceManager）
ResourceManager Mainly responsible for resource allocation and management , stay Flink Only one in the cluster . So-called “ resources ”, Mainly refers to TaskManager Task slot （task slots）. The task slot is Flink The provisioning unit in the cluster , Contains a set of... That the machine uses to perform calculations CPU And memory resources . Every task （Task） All need to be assigned to a slot On the implementation . Pay attention to Flink Built in ResourceManager And other resource management platforms （ such as YARN） Of ResourceManager Differentiate .Flink Of ResourceManager, There are different specific implementations for different environments and resource management platforms . stay Standalone When the deployment , because TaskManager It started separately （ No, Per-Job Pattern ）, therefore ResourceManager Only available TaskManager Task slot , You cannot start a new server alone TaskManager. When there is a resource management platform , Not subject to this restriction . When a new job requests resources ,ResourceManager There will be free slots TaskManager Assigned to JobMaster. If ResourceManager Not enough task slots , It can also initiate a session to the resource provider platform , Request to provide startup TaskManager Process container . in addition ,ResourceManager Also responsible for stopping idle TaskManager, Free computing resources

3. The dispenser （Dispatcher）
Dispatcher Mainly responsible for providing a REST Interface , Used to submit applications , And be responsible for starting a new job for each newly submitted job JobMaster Components .Dispatcher It will also activate a Web UI, Information used to easily display and monitor job execution .Dispatcher Not required in the architecture , It may be ignored in different deployment modes .

2.1.2 Task manager （TaskManager）

TaskManager yes Flink The process of work in , The specific calculation of data flow is done by it , So it's also called “Worker”.Flink There must be at least one in the cluster TaskManager; Of course, due to the consideration of Distributed Computing , There are usually more than one TaskManager function , every last TaskManager Both contain a certain number of task slots （task slots）.Slot Is the smallest unit of resource scheduling ,slot The number of is limited TaskManager Number of tasks that can be processed in parallel . After starting ,TaskManager Will register it with the Explorer slots; After receiving the instruction from the resource manager ,TaskManager One or more slots will be provided to JobMaster call ,JobMaster You can assign tasks to perform . In the process of execution ,TaskManager Can buffer data , You can also run the same application with other TaskManager Exchange data

2.2 High level abstract perspective

Insert picture description here

（1） In general , By the client （App） Provided through the distributor REST Interface , Submit the assignment to JobManager
（2） Started by distributor JobMaster, And put the homework （ contain JobGraph） Submit to JobMaster
（3）JobMaster take JobGraph Resolve to executable ExecutionGraph, Get the amount of resources needed , Then request resources from the resource manager （slots）.
（4） The resource manager determines whether there are enough available resources ; without , Start a new TaskManager
（5）TaskManager After starting , towards ResourceManager Register your own available task slots （slots）
（6） Explorer notification TaskManager Provide... For new jobs slots
（7）TaskManager Connect to the corresponding JobMaster, Provide slots
（8）JobMaster Distribute the tasks to be performed to TaskManager
（9）TaskManager Perform tasks , Data can be exchanged with each other

3. Independent mode （Standalone）

3.1 Concept

Independent mode （Standalone） It's deployment Flink The most basic and simplest way ： All that is needed Flink Components , Are just one running on the operating system JVM process .
Independent mode does not depend on any external resource management platform ; If resources are insufficient , Or something goes wrong , There is no guarantee of automatic expansion or reallocation of resources , It has to be handled manually . Therefore, the independent mode is generally only used in the scenario with very few development tests or jobs .

3.2 Session mode deployment 、 Single job mode deployment （ I won't support it ）、 Application mode deployment

flink 1.15 Independent mode deployment

4. YARN Pattern deployment and running architecture

4.1 Concept

YARN The process of deployment is ： The client puts Flink The application is submitted to Yarn Of ResourceManager, Yarn Of ResourceManager Will send to Yarn Of NodeManager Apply for containers . On these containers ,Flink Will deploy JobManager and TaskManager Example , To start the cluster .Flink Will run according to JobManger Required for homework on Slot Quantity dynamic allocation TaskManager resources
yarn Deployment mode

Check hadoop Environmental Science

echo $HADOOP_CLASSPATH

4.2 Conversational mode

4.2.1 Deploy

Start cluster ：
（1） start-up hadoop colony (HDFS, YARN).
（2） Execute script commands to YARN Cluster application resources , To start a YARN conversation , start-up Flink colony

bin/yarn-session.sh  -d  -nm cz -qu hello

Can be interpreted with parameters ：

-d： Separation mode ,Flink YARN The client runs in the background
-jm(–jobManagerMemory)： To configure JobManager Memory required , Default unit MB
-nm(–name)： Configure in YARN UI The task name displayed on the interface
-qu(–queue)： Appoint YARN Team name
-tm(–taskManager)： Configure each TaskManager Memory used

notes ：
Flink1.11.0 Version no longer uses -n Parameters and -s Parameters are specified separately TaskManager Quantity and sum slot Number ,YARN It will be dynamically allocated according to the demand TaskManager and slot

YARN Session After startup, a will be given web UI Address and a YARN application ID
Insert picture description here

Above initiated yarn Pattern flink colony ：
web UI by http://hadoop104:42947
Stop the cluster echo "stop" | ./bin/yarn-session.sh -id application_1654053140138_0001
Force to stop yarn application -kill application_1654053140138_0001

Submit the assignment ：
（1） adopt Web UI Submit the assignment
Insert picture description here

（2） Submit the job through the command line

 bin/flink run [OPTIONS] <jar-file> <arguments>

4.2.2 Runtime schema

Insert picture description here

（1） Client pass REST Interface , Submit the job to the distributor
（2） The distributor starts JobMaster, And put the homework （ contain JobGraph） Submit to JobMaster
（3）JobMaster Request resources from the resource manager （slots）
（4） Explorer to YARN Resource manager request for container resources
（5）YARN Start a new TaskManager Containers
（6）TaskManager After starting , towards Flink The resource manager registers its own available task slots
（7） Explorer notification TaskManager Provide... For new jobs slots
（8）TaskManager Connect to the corresponding JobMaster, Provide slots
（9）JobMaster Distribute the tasks to be performed to TaskManager, Perform tasks

4.3 Single operation mode

4.3.1 Deploy

stay YARN Environment , With an external platform for resource scheduling , So you can go directly to YARN Submit a separate assignment , To start a Flink colony
（1） Execute the command and submit the job

bin/flink run -d -t yarn-per-job -creview.part2.StreamWordCount libexec/FlinkTutorial-1.0-SNAPSHOT-jar-with-dependencies.jar

（2） stay YARN Of ResourceManager View the implementation status in the interface
Insert picture description here

Insert picture description here
Click on Tracking URL

（3） Use the command line to view or cancel jobs

bin/flink list -t yarn-per-job -Dyarn.application.id=application_XXXX_YY

Insert picture description here

bin/flink cancel -t yarn-per-job -Dyarn.application.id=application_XXXX_YY   <jobId>

Insert picture description here

4.3.2 Runtime schema

Insert picture description here

（1） The client submits the job to YARN Explorer for , In this step, we will also Flink Of Jar Upload packages and configurations to HDFS, For subsequent startup Flink Containers for related components
（2）YARN Resource manager allocation for Container resources , start-up Flink JobManager, And submit the assignment to JobMaster. Omit here Dispatcher Components
（3）JobMaster Request resources from the resource manager （slots）
（4） Explorer to YARN Resource manager request for container resources
（5）YARN Start a new TaskManager Containers
（6）TaskManager After starting , towards Flink The resource manager registers its own available task slots
（7） Explorer notification TaskManager Provide... For new jobs slots
（8）TaskManager Connect to the corresponding JobMaster, Provide slots
（9）JobMaster Distribute the tasks to be performed to TaskManager, Perform tasks

4.4 Application mode

4.4.1 Deploy

（1） Execute the command and submit the job

bin/flink run-application -t yarn-application -c review.part2.StreamWordCount libexec/FlinkTutorial-1.0-SNAPSHOT-jar-with-dependencies.jar

Insert picture description here

（2） View or cancel jobs on the command line

bin/flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY

Insert picture description here

$ bin/flink cancel -t yarn-application 
-Dyarn.application.id=application_XXXX_YY <jobId>

Insert picture description here

（3） It can be done by yarn.provided.lib.dirs The configuration option specifies the location , take jar Upload to remote

4.4.2 Runtime schema

The application mode is very similar to the submission process of single job mode , Just the initial submission to YARN The of the resource manager is no longer a specific job , It's the whole application . An application may contain multiple jobs , These assignments will be in Flink Start the corresponding in the cluster JobMaster

4.5 High availability

Standalone In the pattern , Start multiple... At the same time JobManager, One for leader, For the other standby, When leader Hang up , One of the others will become leader. and YARN High availability is to start only one Jobmanager, When this Jobmanager After hanging up , YARN Will start another , So it's actually used YARN The number of retries to achieve high availability

（1） stay yarn-site.xml Middle configuration

<property>
 <name>yarn.resourcemanager.am.max-attempts</name>
 <value>4</value>
 <description>
 The maximum number of application master execution attempts.
 </description>
</property>

（2） stay flink-conf.yaml Middle configuration .

yarn.application-attempts: 3
high-availability: zookeeper
high-availability.storageDir: hdfs Catalog 
high-availability.zookeeper.quorum: hadoop102:2181,hadoop103:2181,hadoop104:2181
high-availability.zookeeper.path.root: /flink-yarn

（3） start-up yarn-session
（4） Kill JobManager, Check the resurrection
Be careful :
yarn-site.xml The configuration in is JobManager Maximum number of restarts , flink-conf.xml The number of times in should be less than this value

ps: Reference books pdf：
link ： Baidu SkyDrive
Extraction code ：1256

原网站

版权声明
本文为[But don't ask about your future]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/162/202206111206048434.html