当前位置:网站首页>Yarn重启applications记录恢复
Yarn重启applications记录恢复
2022-07-01 13:00:00 【fanxl12】
Yarn重启applications记录恢复
Yarn重启applications记录恢复
修改yarn-core.xml配置文件
ResourceManager重启恢复
将yarn-site.xml中的
yarn.resourcemanager.recovery.enabled配置项设为true(默认是false)<property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property>配置
yarn.resourcemanager.store.class参数,该参数用来指定RM在重启之前将自己的状态保存在何种存储媒介上,目前有3种存储可选org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
默认值,是基于文件系统的存储(本地存储或者HDFS)。可以指定yarn.resourcemanager.fs.state-store.uri作为存储路径,如果指定这个yarn.resourcemanager.fs.state-store.uri必须要设置。org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
基于ZooKeeper的存储,当启用RM高可用时,只能选择这种方式。因为两个RM都有可能是活跃的(认为自己才是真正的RM),进而发生脑裂。基于ZK的存储可以通过隔离(fence)状态数据防止脑裂。可以指定hadoop.zk.address(ZK节点地址列表)和yarn.resourcemanager.zk-state-store.parent-path(状态数据的根节点路径)参数。org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore
基于LevelDB的存储。它比前两种方式都更轻量级,占用的存储空间和I/O要小得多,并且支持更好的原子性操作。对性能有极致要求时采用。可以指定yarn.resourcemanager.leveldb-state-store.path作为存储路径。<property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value> </property>
配置yarn.resourcemanager.fs.state-store.uri,如果yarn.resourcemanager.store.class是org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore需要配置,这里配置HDFS存储
<property> <name>yarn.resourcemanager.fs.state-store.uri</name> <value>hdfs://hadoop-master:9010/rmstore</value> </property>最后配置yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms,它表示从RM重启后从各个NM同步Container信息的等待时长,在此之后才会分配新的Container。默认值是10000(10秒),一般不需要改动。
<property> <name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name> <value>10000</value> </property>
配置NodeManager重启自动恢复
将yarn-site.xml中的
yarn.nodemanager.recovery.enabled配置项设为true(默认是false)<property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property>配置
yarn.nodemanager.recovery.dir参数,指定NM在重启之前,将Container的状态写入此本地路径。默认值为${hadoop.tmp.dir}/yarn-nm-recovery<property> <name>yarn.nodemanager.recovery.dir</name> <value>/opt/topology/db_data/hadoop-data/yarn-nm-recovery</value> </property>配置
yarn.nodemanager.address参数,该参数为NM的RPC地址,默认为${yarn.nodemanager.hostname}:0,即随机使用临时端口。一定要指定为一个固定端口(如8041),否则NM重启之后会更换端口,就无法恢复Container的状态了<property> <name>yarn.nodemanager.address</name> <value>hadoop-master:45454</value> </property>
边栏推荐
- R language builds a binary classification model based on H2O package: using H2O GBM build gradient hoist model GBM, use H2O AUC value of AUC calculation model
- Fundamentals of number theory and its code implementation
- Jenkins+webhooks-多分支参数化构建-
- Judea pearl, Turing prize winner: 19 causal inference papers worth reading recently
- Idea of [developing killer]
- Which securities company has a low, safe and reliable account opening commission
- 题目 2612: 蓝桥杯2021年第十二届省赛真题-最少砝码(枚举找规律+递推)
- MySQL gap lock
- leetcode:226. 翻转二叉树【dfs翻转】
- 机器学习—性能度量
猜你喜欢

Feign & Eureka & zuul & hystrix process

Feign & Eureka & Zuul & Hystrix 流程
![leetcode:329. The longest incremental path in the matrix [DFS + cache + no backtracking + elegance]](/img/10/acd162c3adf9d6f14fa5a551dc0d25.png)
leetcode:329. The longest incremental path in the matrix [DFS + cache + no backtracking + elegance]

codeforces -- 4B. Before an Exam

软件测试中功能测试流程
Reasons for MySQL reporting 1040too many connections and Solutions

Manage nodejs with NVM (downgrade the high version to the low version)

How can genetic testing help patients fight disease?

leetcode:241. 为运算表达式设计优先级【dfs + eval】

图灵奖得主Judea Pearl:最近值得一读的19篇因果推断论文
随机推荐
Three questions about scientific entrepreneurship: timing, pain points and important decisions
Class initialization and instantiation
The future of game guild in decentralized games
79. Word search [DFS + backtracking visit + traversal starting point]
Topic 1004: the story of cows (recursion)
图灵奖得主Judea Pearl:最近值得一读的19篇因果推断论文
基于mysql乐观锁实现秒杀的示例代码
Look at the sky at dawn and the clouds at dusk, and enjoy the beautiful pictures
题目 1004: 母牛的故事(递推)
[today in history] July 1: the father of time sharing system was born; Alipay launched barcode payment; The first TV advertisement in the world
Feign & Eureka & zuul & hystrix process
Different test techniques
R语言基于h2o包构建二分类模型:使用h2o.gbm构建梯度提升机模型GBM、使用h2o.auc计算模型的AUC值
基于.NetCore开发博客项目 StarBlog - (13) 加入友情链接功能
【牛客刷题-SQL大厂面试真题】NO2.用户增长场景(某度信息流)
软件测试中功能测试流程
PG基础篇--逻辑结构管理(触发器)
Operator-1 first acquaintance with operator
Manage nodejs with NVM (downgrade the high version to the low version)
Reasons for MySQL reporting 1040too many connections and Solutions