当前位置:网站首页>proxmox集群节点崩溃处理
proxmox集群节点崩溃处理
2022-06-29 19:57:00 【全栈程序员站长】
问题描述
在现有集群加入一个物理节点,接着再此节点创建ceph监视器、创建OSD。从宿主机系统执行ceph osd tree查看状态,创建起来的几个OSD状态都正常(up),从proxmox管理界面看也是这样。
突然不知道什么原因,刚加入的节点就突然不能从集群中失效了。
再进宿主机系统查OSD状态,居然自己从up变成down。新增节点没数据,于是就试试重启,看能不能正常。重启以后,网络能通,ssh不能连接,web管理界面也不能访问。接下来,需要先把故障节点从集群中撤离出来,恢复以后,再加入集群。
从集群中删除故障节点
按操作顺序分两个步骤:从集群中删除故障ceph和从集群中删除物理节点。
ü 从集群中删除故障ceph
1. 登录集群任意物理正常节点系统,执行如下命令查看ceph osd状态:
[email protected]:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 18.00357 root default
-3 4.91006 host pve48
0 hdd 1.63669 osd.0 up 1.00000 1.00000
1 hdd 1.63669 osd.1 up 1.00000 1.00000
2 hdd 1.63669 osd.2 up 1.00000 1.00000
-5 4.91006 host pve49
3 hdd 1.63669 osd.3 up 1.00000 1.00000
4 hdd 1.63669 osd.4 up 1.00000 1.00000
5 hdd 1.63669 osd.5 up 1.00000 1.00000
-7 4.91006 host pve50
6 hdd 1.63669 osd.6 up 1.00000 1.00000
7 hdd 1.63669 osd.7 up 1.00000 1.00000
8 hdd 1.63669 osd.8 up 1.00000 1.00000
-9 3.27338 host pve51
9 hdd 1.63669 osd.9 down 0 1.00000
10 hdd 1.63669 osd.10 down 0 1.00000从输出可知物理节点pve51的两个OSD有问题,需要删除。
2. 离线有问题的ceph osd,执行的操作如下:
[email protected]:~# ceph osd out osd.9
osd.9 is already out.
[email protected]:~# ceph osd out osd.10
osd.10 is already out.操作时要仔细,别把正常的osd离线了。
3. 删除已经离线osd认证信息,执行的操作如下:
[email protected]:~# ceph auth del osd.9
updated
[email protected]:~# ceph auth del osd.10
updated4. 彻底删除故障osd,操作如下:
[email protected]:~# ceph osd rm 9
removed osd.9
[email protected]:~# ceph osd rm 10
removed osd.10注意:此操作ceph最后一列参数与前边的不同,是纯数字格式!!!
5. 查看集群osd状态,操作如下:
[email protected]:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 18.00357 root default
-3 4.91006 host pve48
0 hdd 1.63669 osd.0 up 1.00000 1.00000
1 hdd 1.63669 osd.1 up 1.00000 1.00000
2 hdd 1.63669 osd.2 up 1.00000 1.00000
-5 4.91006 host pve49
3 hdd 1.63669 osd.3 up 1.00000 1.00000
4 hdd 1.63669 osd.4 up 1.00000 1.00000
5 hdd 1.63669 osd.5 up 1.00000 1.00000
-7 4.91006 host pve50
6 hdd 1.63669 osd.6 up 1.00000 1.00000
7 hdd 1.63669 osd.7 up 1.00000 1.00000
8 hdd 1.63669 osd.8 up 1.00000 1.00000
-9 3.27338 host pve51
9 hdd 1.63669 osd.9 DNE 0
10 hdd 1.63669 osd.10 DNE 0 操作完成后,故障节点的osd状态从down变成了DNE
6. 删除故障节点的ceph磁盘,操作如下:
[email protected]:~# ceph osd crush rm osd.9
removed item id 9 name ‘osd.9’ from crush map
[email protected]:~# ceph osd crush rm osd.10
removed item id 10 name ‘osd.10’ from crush map7. 从ceph集群中删除物理节点,操作如下:
[email protected]:~# ceph osd crush rm pve51
removed item id -9 name ‘pve51’ from crush map8. 执行指令 ceph osd tree 查看状态,看是否把故障节点[email protected]:~# ceph osd crush rm pve51 removed item id -9 name ‘pve51’ from crush map从ceph集群清理出去。
ü 从集群中删除故障节点
Ø 集群上的操作
登录集群中任意正常节点,执行如下指令进行驱逐操作:
[email protected]:~# pvecm delnode pve51
Killing node 4Ø 故障机恢复操作
最好全部干掉,重新安装系统,并用新的ip地址,加入集群。
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/101292.html原文链接:https://javaforall.cn
边栏推荐
- 苹果iPhone手机升级系统内存空间变小不够如何解决?
- Go: how to write a correct UDP server
- Zotero journal Automatic Matching Update Influencing Factors
- JVM (3) class loading
- Flume配置4——自定義Source+Sink
- Flume理论
- 2022年理财利率都降了,那该如何选择理财产品?
- ArrayList< Integer> Use = = to compare whether the values are equal, and -129=- 129 situation thinking
- Deficiencies and optimization schemes in Dao
- Tiger painter mengxiangshun's digital collection is on sale in limited quantities and comes with Maotai in the year of the tiger
猜你喜欢

Linux Installation mysql5

MySQL remote connection

A keepalived high availability accident made me learn it again!

Connaissance générale des paramètres de sécurité du serveur Cloud

Koa 源码剖析

【U盘检测】为了转移压箱底的资料,买了个2T U盘检测仅仅只有47G~

Introduction to the latest version 24.1.0.360 update of CorelDRAW

Flume configuration 1 - basic case

童年经典蓝精灵之百变蓝爸爸数字藏品中奖名单公布

La collection numérique Meng xiangshun, artiste national du tigre peint, est disponible en quantité limitée et est offerte avec Maotai de l'année du tigre
随机推荐
Sword finger offer 59 - I. maximum value of sliding window
一小时构建示例场景 声网发布灵隼物联网云平台
Flume配置2——监控之Ganglia
并查集(Union-Find)
lock4j--分布式锁中间件--自定义获取锁失败的逻辑
如何设置 Pod 到指定节点运行
JVM(3) 类加载
There are more than 20 databases in a MySQL with 3306 ports. How can I backup more than 20 databases with one click and do system backup to prevent data from being deleted by mistake?
Shell bash script note: there must be no other irrelevant characters after the escape character \ at the end of a single line (multi line command)
There is no small green triangle on the method in idea
Flume configuration 2 - ganglia for monitoring
Tiger painter mengxiangshun's digital collection is on sale in limited quantities and comes with Maotai in the year of the tiger
Classic illustration of K-line diagram (Collection Edition)
QC protocol + Huawei fcp+ Samsung AFC fast charging 5v9v chip fs2601 application
JVM(2) 垃圾回收
freemarker模板框架生成图片
JVM (4) Bytecode Technology + Runtime Optimization
static静态成员变量使用@Value注入方式
[USB flash disk test] in order to transfer the data at the bottom of the pressure box, I bought a 2T USB flash disk, and the test result is only 47g~
Lock4j -- distributed lock Middleware -- customize the logic of lock acquisition failure