当前位置:网站首页>ETCD Single-Node Fault Emergency Recovery
ETCD Single-Node Fault Emergency Recovery
2022-08-11 07:04:00 【!Nine thought & & gentleman!】
系列文章目录
ETCDContainerized to build clusters
文章目录
前言
生产环境中,经常遇到etcd集群出现单节点故障或者集群故障.针对这两种情况,进行故障修复.本文介绍etcd的单节点故障时,Emergency recovery manual
一、总体恢复流程
由于etcd的raft协议,The number of failed nodes that the entire cluster can tolerate is (n-1)/ 2,So in the event of a single node failure,A single cluster is still available,It will not affect the reading and writing of the business.
整体的恢复流程如下
二、Detailed recovery instructions
2.1 环境信息
使用本地的vmstation创建3个虚拟机,信息如下
| 节点名称 | 节点IP | 节点配置 | 操作系统 | Etcd版本 | Docker版本 |
|---|---|---|---|---|---|
| etcd1 | 192.168.82.128 | 1c1g 20g | CentOS7.4 | v3.5 | 13.1 |
| etcd2 | 192.168.82.129 | 1c1g 20g | CentOS7.4 | v3.5 | 13.1 |
| etcd3 | 192.168.82.130 | 1c1g 20g | CentOS7.4 | v3.5 | 13.1 |
假设etcd2节点异常,And the local data has been corrupted.
2.2 The cluster deletes the abnormal node
通过member removeCommand to delete abnormal nodes,At this point the entire cluster has only 2个节点,不会触发master重新选主,集群正常运行.
查看当前集群状态
export ETCDCTL_API=3
export ETCD_ENDPOINTS=192.168.92.128:2379,192.168.92.129:2379,192.168.92.130:2379
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table member list
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table endpoint status

2.2 Delete abnormal node data
2.2.1 删除异常member
docker stop etcd2
2.2.2 删除数据
由于数据通过-v /data/etcd:/data/etcd的方式挂载,Therefore delete the corresponding data,会清理etcd数据.
rm -rf /data/etcd/*
2.3 Re-add nodes to the cluster
通过如下命令,Add the abnormal node to the cluster,Wait for the corresponding node to start,Cluster data synchronization and master selection are automatically completed
export ETCDCTL_API=3
export ETCD_ENDPOINTS=192.168.92.128:2379,192.168.92.129:2379,192.168.92.130:2379
etcdctl --endpoints=$ETCD_ENDPOINTS member add etcd2 --peer-urls=http://192.168.92.129:2380

2.4 启动节点
2.4.1 The complete startup script is
[[email protected] ~]#
[[email protected] ~]# cat start_etcd.sh
/bin/sh
name="etcd2"
host="192.168.92.129"
cluster="etcd1=http://192.168.92.128:2380,etcd2=http://192.168.92.129:2380,etcd3=http://192.168.92.130:2380"
docker run -d --privileged=true -p 2379:2379 -p 2380:2380 -v /data/etcd:/data/etcd --name $name --net=host quay.io/coreos/etcd:v3.5.0 /usr/local/bin/etcd --name $name --data-dir /data/etcd --listen-client-urls http://$host:2379 --advertise-client-urls http://$host:2379 --listen-peer-urls http://$host:2380 --initial-advertise-peer-urls http://$host:2380 --initial-cluster $cluster --initial-cluster-token tkn --initial-cluster-state existing --log-level info --logger zap --log-outputs stderr
注意,由于etcd的数据已经被删除,So when the current node restarts,Get data from other nodes,因此需要调整参数–initial-cluster-state,从new改成existing
--initial-cluster-state existing
2.4.2 查看日志
docker logs 8bf31834f8ce
2.4 Wait for the cluster data to finish syncing and recover
查看当前集群的member信息
export ETCDCTL_API=3
export ETCD_ENDPOINTS=192.168.92.128:2379,192.168.92.129:2379,192.168.92.130:2379
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table member list
etcdctl --endpoints=$ETCD_ENDPOINTS --write-out=table endpoint status

总结
Because the overall cluster has multiple copies,So when a single node is abnormal,It does not cause the entire cluster to be abnormal,It can be recovered as long as the corresponding node is started normally and the data is synchronized.
边栏推荐
- Login error in mysql: ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES)ERROR
- iptables的状态
- 本地yum源搭建
- SECURITY DAY01 (Monitoring Overview, Zabbix Basics, Zabbix Monitoring Services)
- Threatless Technology-TVD Daily Vulnerability Intelligence-2022-7-27
- slurm集群搭建
- 无胁科技-TVD每日漏洞情报-2022-7-20
- SATA、SAS、SSD三种硬盘存储性能数据
- 防火墙-0-管理地址
- Solve the problem that port 8080 is occupied
猜你喜欢
随机推荐
SECURITY DAY01(监控概述 、 Zabbix基础 、 Zabbix监控服 )
SECURITY DAY04 (Prometheus server, Prometheus monitored terminal, Grafana, monitoring database)
CLUSTER DAY02 (Keepalived Hot Standby, Keepalived+LVS, HAProxy Server)
arcgis填坑_4
arcgis填坑_2
Windos10专业版开启远程桌面协助
ETCD容器化搭建集群
命令输出给变量
Raspberry Pi set static IP address
ansible批量安装zabbix-agent
【LeetCode】2034. 股票价格波动(思路+题解)双map
No threat of science and technology - TVD vulnerability information daily - 2022-8-4
Threatless Technology-TVD Daily Vulnerability Intelligence-2022-7-25
Threatless Technology-TVD Daily Vulnerability Intelligence-2022-7-29
kill 命令
空间点模式方法_一阶效应和二阶效应
Threatless Technology-TVD Daily Vulnerability Intelligence-2022-7-22
ssh中的密码登录和密钥登录
查看可执行文件依赖的库ldd
CLUSTER DAY04(块存储应用案例 、 分布式文件系统 、 对象存储)









