当前位置:网站首页>Flink Yarn Per Job - RM启动SlotManager
Flink Yarn Per Job - RM启动SlotManager
2022-08-05 11:05:00 【hyunbar】

ResourceManager
public final void onStart() throws Exception {
try {
startResourceManagerServices();
} catch (Throwable t) {
final ResourceManagerException exception = new ResourceManagerException(String.format("Could not start the ResourceManager %s", getAddress()), t);
onFatalError(exception);
throw exception;
}
}
private void startResourceManagerServices() throws Exception {
try {
leaderElectionService = highAvailabilityServices.getResourceManagerLeaderElectionService();
// 创建了Yarn的RM和NM的客户端,初始化并启动
initialize();
// 通过选举服务,启动ResourceManager
leaderElectionService.start(this);
jobLeaderIdService.start(new JobLeaderIdActionsImpl());
registerTaskExecutorMetrics();
} catch (Exception e) {
handleStartResourceManagerServicesException(e);
}
}
创建了Yarn的RM和NM的客户端,初始化并启动
通过选举服务,启动ResourceManager
创建了Yarn的RM和NM的客户端
ActiveResourceManager
@Override
protected void initialize() throws ResourceManagerException {
try {
resourceManagerDriver.initialize(
this,
new GatewayMainThreadExecutor(),
ioExecutor);
} catch (Exception e) {
throw new ResourceManagerException("Cannot initialize resource provider.", e);
}
}
AbstractResourceManagerDriver
@Override
public final void initialize(
ResourceEventHandler<WorkerType> resourceEventHandler,
ScheduledExecutor mainThreadExecutor,
Executor ioExecutor) throws Exception {
this.resourceEventHandler = Preconditions.checkNotNull(resourceEventHandler);
this.mainThreadExecutor = Preconditions.checkNotNull(mainThreadExecutor);
this.ioExecutor = Preconditions.checkNotNull(ioExecutor);
// 下追
initializeInternal();
}
protected abstract void initializeInternal() throws Exception;
YarnResourceManagerDriver
@Override
protected void initializeInternal() throws Exception {
final YarnContainerEventHandler yarnContainerEventHandler = new YarnContainerEventHandler();
try {
// 创建Yarn的ResourceManager的客户端,并且初始化和启动
resourceManagerClient = yarnResourceManagerClientFactory.createResourceManagerClient(
yarnHeartbeatIntervalMillis,
yarnContainerEventHandler);
resourceManagerClient.init(yarnConfig);
resourceManagerClient.start();
final RegisterApplicationMasterResponse registerApplicationMasterResponse = registerApplicationMaster();
getContainersFromPreviousAttempts(registerApplicationMasterResponse);
taskExecutorProcessSpecContainerResourcePriorityAdapter =
new TaskExecutorProcessSpecContainerResourcePriorityAdapter(
registerApplicationMasterResponse.getMaximumResourceCapability(),
ExternalResourceUtils.getExternalResources(flinkConfig, YarnConfigOptions.EXTERNAL_RESOURCE_YARN_CONFIG_KEY_SUFFIX));
} catch (Exception e) {
throw new ResourceManagerException("Could not start resource manager client.", e);
}
// 创建yarn的 NodeManager的客户端,并且初始化和启动
nodeManagerClient = yarnNodeManagerClientFactory.createNodeManagerClient(yarnContainerEventHandler);
nodeManagerClient.init(yarnConfig);
nodeManagerClient.start();
}
创建Yarn的ResourceManager的客户端,并且初始化和启动
创建yarn的 NodeManager的客户端,并且初始化和启动
启动SlotManager
StandaloneLeaderElectionService
@Override
public void start(LeaderContender newContender) throws Exception {
if (contender != null) {
// Service was already started
throw new IllegalArgumentException("Leader election service cannot be started multiple times.");
}
contender = Preconditions.checkNotNull(newContender);
// directly grant leadership to the given contender
contender.grantLeadership(HighAvailabilityServices.DEFAULT_LEADER_ID);
}
ResourceManager
public void grantLeadership(final UUID newLeaderSessionID) {
final CompletableFuture<Boolean> acceptLeadershipFuture = clearStateFuture
// 下追
.thenComposeAsync((ignored) -> tryAcceptLeadership(newLeaderSessionID), getUnfencedMainThreadExecutor());
... ...
}
private CompletableFuture<Boolean> tryAcceptLeadership(final UUID newLeaderSessionID) {
if (leaderElectionService.hasLeadership(newLeaderSessionID)) {
... ...
startServicesOnLeadership();
return prepareLeadershipAsync().thenApply(ignored -> true);
} else {
return CompletableFuture.completedFuture(false);
}
}
private void startServicesOnLeadership() { // 启动心跳服务:TaskManager、JobMaster startHeartbeatServices(); // 启动slotManager slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl()); onLeadership(); } |
启动心跳服务:TaskManager、JobMaster
启动slotManager
private void startHeartbeatServices() {
taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new TaskManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
resourceId,
new JobManagerHeartbeatListener(),
getMainThreadExecutor(),
log);
}
作为资源的老大,肯定要跟task小弟和job去通信
SlotManagerImpl
public void start(ResourceManagerId newResourceManagerId, Executor newMainThreadExecutor, ResourceActions newResourceActions) {
LOG.info("Starting the SlotManager.");
this.resourceManagerId = Preconditions.checkNotNull(newResourceManagerId);
mainThreadExecutor = Preconditions.checkNotNull(newMainThreadExecutor);
resourceActions = Preconditions.checkNotNull(newResourceActions);
started = true;
taskManagerTimeoutsAndRedundancyCheck = scheduledExecutor.scheduleWithFixedDelay(
() -> mainThreadExecutor.execute(
// 检查超时和多余的TaskManager
() -> checkTaskManagerTimeoutsAndRedundancy()),
0L,
taskManagerTimeout.toMilliseconds(),
TimeUnit.MILLISECONDS);
slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
() -> mainThreadExecutor.execute(
() -> checkSlotRequestTimeouts()),
0L,
slotRequestTimeout.toMilliseconds(),
TimeUnit.MILLISECONDS);
registerSlotManagerMetrics();
}
检查超时和多余的TaskManager
void checkTaskManagerTimeoutsAndRedundancy() {
if (!taskManagerRegistrations.isEmpty()) {
long currentTime = System.currentTimeMillis();
ArrayList<TaskManagerRegistration> timedOutTaskManagers = new ArrayList<>(taskManagerRegistrations.size());
// first retrieve the timed out TaskManagers
for (TaskManagerRegistration taskManagerRegistration : taskManagerRegistrations.values()) {
if (currentTime - taskManagerRegistration.getIdleSince() >= taskManagerTimeout.toMilliseconds()) {
// we collect the instance ids first in order to avoid concurrent modifications by the
// ResourceActions.releaseResource call
timedOutTaskManagers.add(taskManagerRegistration);
}
}
int slotsDiff = redundantTaskManagerNum * numSlotsPerWorker - freeSlots.size();
if (freeSlots.size() == slots.size()) {
// No need to keep redundant taskManagers if no job is running.
// 如果没有job在运行,释放taskmanager
releaseTaskExecutors(timedOutTaskManagers, timedOutTaskManagers.size());
} else if (slotsDiff > 0) {
// Keep enough redundant taskManagers from time to time.
// 保证随时有足够额taskmanager
int requiredTaskManagers = MathUtils.divideRoundUp(slotsDiff, numSlotsPerWorker);
allocateRedundantTaskManagers(requiredTaskManagers);
} else {
// second we trigger the release resource callback which can decide upon the resource release
int maxReleaseNum = (-slotsDiff) / numSlotsPerWorker;
releaseTaskExecutors(timedOutTaskManagers, Math.min(maxReleaseNum, timedOutTaskManagers.size()));
}
}
}

边栏推荐
- .NET深入解析LINQ框架(六:LINQ执行表达式)
- 硅谷来信:快速行动,Facebook、Quora等成功的“神器”!
- 机器学习——集成学习
- Latex如何控制表格的宽度和高度
- shell编程流程控制练习
- 苹果Meta都在冲的Pancake技术,中国VR团队YVR竟抢先交出产品答卷
- PCB layout must know: teach you to correctly lay out the circuit board of the op amp
- Header file search rules when compiling with GCC
- 用KUSTO查询语句(KQL)在Azure Data Explorer Database上查询LOG实战
- hdu2097 nyoj414 sky数 (进制转换)
猜你喜欢

Login function and logout function (St. Regis Takeaway)

PostgreSQL 2022 Report: Rising popularity, open source, reliability and scaling key

硅谷来信:快速行动,Facebook、Quora等成功的“神器”!

Opencv图像缩放和平移

Support Vector Machine SVM

CenOS MySQL入门及安装

Mathcad 15.0软件安装包下载及安装教程

今天告诉你界面控件DevExpress WinForms为何弃用经典视觉样式

软件测试之集成测试

手把手教你定位线上MySQL慢查询问题,包教包会
随机推荐
如何修改管理工具client_encoding
大佬们 我是新手,我根据文档用flinksql 写个简单的用户访问量的count 但是执行一次就结束
软件测试之集成测试
poj2935 Basic Wall Maze (2016xynu暑期集训检测 -----D题)
The fuse: OAuth 2.0 four authorized login methods must read
【深度学习】mmclassification mmcls 实战多标签分类任务教程,分类任务
PostgreSQL 2022 报告:流行度上涨,开源、可靠性和扩展是关键
L2-042 老板的作息表
MySQL 中 auto_increment 自动插入主键值
金融业“限薪令”出台/ 软银出售过半阿里持仓/ DeepMind新实验室成立... 今日更多新鲜事在此...
学生信息管理系统(第一次.....)
hdu4545 Magic String
反射修改jsessionid实现Session共享
hdu2097 nyoj414 sky数 (进制转换)
拓朴排序例题
API 网关简述
ECCV 2022 | 视听分割:全新任务,助力视听场景像素级精细化理解
GPU-CUDA-图形渲染分析
STM32 entry development: write XPT2046 resistive touch screen driver (analog SPI)
Go学习笔记(篇二)初识Go