当前位置:网站首页>Runtime reconfiguration of etcd
Runtime reconfiguration of etcd
2022-06-11 11:25:00 【Cotton wool】
original text :Etcd Runtime reconfiguration of
Run time reconfiguration
etcd Designed to withstand machine failure .etcd The cluster automatically fails from temporary ( for example , Machine restart ) To recover , And for someone who has N A cluster of members can allow (N-1)/2 Continued failure of . When a member continues to fail , Whether it's due to hardware failure or disk damage , It loses access to the cluster . If the cluster continues to lose more than (N-1)/2 Members of , Then it can only fail miserably , Hopeless loss of quorum (quorum). Once the quorum is lost , The cluster cannot reach consistency and therefore cannot continue to receive updates .
etcd Built in support for progressive runtime reconfiguration , This allows users to update cluster members at run time .
Reconfiguration requests can only be processed when most of the cluster members are working properly . Strongly recommend In the product, the cluster size is always larger than 2, It is not safe to remove a member from a two member cluster . If there is any failure in the removal process , The cluster may not be able to move forward and needs to restart from a major failure .
Reconfigure use cases
Let's go through some common reasons to reconfigure a cluster , Most of them simply involve adding or removing members from a combination .
Cycle or upgrade multiple machines
If there are multiple cluster members due to planned maintenance ( Hardware upgrade , Network downtime ) You need to move , It is recommended to modify multiple members one at a time .
remove leader Is safe , But there was a brief downtime during the election process . If the cluster saves more than 50MB, recommend Migrate members' data directories .
Modify the cluster size
Increasing the cluster size can improve Tolerance of failure And provide better read performance . Because the client can read from any member , Increasing the number of members can improve the overall read throughput .
Reducing the cluster size can improve the write performance of the cluster , In exchange, it reduces elasticity . Writing to the cluster requires copying to the majority of the cluster members in order to be considered as committed . Reducing the cluster size reduces most of the number , So each write can be committed faster .
Replace the failed machine
If the machine has a hardware failure , The data directory is corrupt , Or some other fatal situation , It should be replaced as soon as possible . Machines that have failed but have not been removed have an adverse effect on the quorum and reduce tolerance for additional failures .
To replace the machine , Follow from the cluster Remove Members The advice of , And then again Add a new member The unknown that replaces it . If the cluster saves more than 50MB, And it can also access , recommend Migrate the data directory of the failed member .
Restart the cluster from most failures
If the majority of the cluster has been lost or all the nodes have been modified IP Address , You need manual action to recover safely .
The basic steps in the recovery process include Create a new cluster using old data , Force a single member to survive , And eventually use the runtime configuration to one at a time Add new members To this new cluster .
Cluster reconfiguration operation
Before any change ,etcd A simple majority of members (quorum) Must be available . For any other to etcd Writing , This is also a fundamental requirement .
All cluster changes are done one at a time :
- To update a single member peerURLs, Do an update operation
- To replace a single member , Do an add and then a delete operation
- To remove a member from 3 Add to 5, Do two additions
- To remove a member from 5 Reduced to 3, Do two delete operations
All of these cases will use etcd Self contained etcdctl Command line tools .
If not etcdctl Modify members , have access to v2 HTTP members API perhaps v3 gRPC members API.
Test environment
| name | IP | state |
|---|---|---|
| etcd1 | 192.168.4.10 | The original |
| etcd2 | 192.168.4.20 | The original |
| etcd3 | 192.168.4.30 | newly added 、 Update or delete |
etcd1 Example in /usr/lib/systemd/system/etcd.service Startup file :
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target
[Service]
EnvironmentFile=/etc/etcd/etcd.conf
ExecStart=/usr/bin/etcd
Restart=always
[Install]
WantedBy=multi-user.target
etcd1 Example in /etc/etcd/etcd.conf The configuration file :
ETCD_NAME=etcd1
ETCD_DATA_DIR=/etc/etcd/data
ETCD_LISTEN_CLIENT_URLS=http://192.168.4.10:2379
ETCD_LISTEN_PEER_URLS=http://192.168.4.10:2380
ETCD_ADVERTISE_CLIENT_URLS=http://192.168.4.10:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.4.10:2380
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
ETCD_INITIAL_CLUSTER=etcd1=http://192.168.4.10:2380,etcd2=http://192.168.4.20:2380
ETCD_ENABLE_V2=true
Check member information :
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member list
39d7dd629f95330e, started, etcd2, http://192.168.4.20:2380, http://192.168.4.20:2379, false
f9b6e5803038fabb, started, etcd1, http://192.168.4.10:2380, http://192.168.4.10:2379, false
see etcd1 and etcd2 Original test data in :
[[email protected] ~]# ETCDCTL_API=2 etcdctl --endpoints="http://192.168.4.10:2379" get /docker-flannel/network/config
{
"Network": "10.0.0.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "vxlan"
}
}
[[email protected] ~]# ETCDCTL_API=2 etcdctl --endpoints="http://192.168.4.20:2379" get /docker-flannel/network/config
{
"Network": "10.0.0.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "vxlan"
}
}
Add a new member
There are two steps to adding members :
- adopt HTTP members API Add new members to the cluster , gRPC members API, perhaps etcdctl member add command .
- Start the new member with the original configuration of the new layer , Include updated member list ( Add new members to future members )
Use etcdctl Appoint name and advertised peer URLs To add new members etcd3:192.168.4.30 To the cluster ( On any one etcd The implementation is OK ):
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member add etcd3 --peer-urls=http://192.168.4.30:2380
Member 9825b911c2558475 added to cluster b9b6bab8c2110fd
ETCD_NAME="etcd3"
ETCD_INITIAL_CLUSTER="etcd2=http://192.168.4.20:2380,etcd3=http://192.168.4.30:2380,etcd1=http://192.168.4.10:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.4.30:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
here , Look again etcd Cluster information , You can see http://192.168.4.30:2380 be in unstarted state :
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member list
39d7dd629f95330e, started, etcd2, http://192.168.4.20:2380, http://192.168.4.20:2379, false
9825b911c2558475, unstarted, , http://192.168.4.30:2380, , false
f9b6e5803038fabb, started, etcd1, http://192.168.4.10:2380, http://192.168.4.10:2379, false
etcdctl The cluster information about the new member has been given and the environment variables required to successfully start it have been printed out , Complete to etcd3 In the machine /etc/etcd/etcd.conf file :
ETCD_NAME="etcd3"
ETCD_DATA_DIR=/etc/etcd/data
ETCD_LISTEN_CLIENT_URLS=http://192.168.4.30:2379
ETCD_LISTEN_PEER_URLS=http://192.168.4.30:2380
ETCD_ADVERTISE_CLIENT_URLS=http://192.168.4.30:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=http://192.168.4.30:2380
ETCD_INITIAL_CLUSTER="etcd2=http://192.168.4.20:2380,etcd3=http://192.168.4.30:2380,etcd1=http://192.168.4.10:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
ETCD_INITIAL_CLUSTER_TOKEN=etcd-cluster
ETCD_ENABLE_V2=true
Execute after adding systemctl status etcd.service start-up .
The new member will run as part of the cluster and immediately start catching up with the other members of the cluster .
If you add multiple members , The best practice is to configure a single member at a time and verify that it starts correctly before adding more new members .
If you add a new member to a node's cluster , The cluster cannot continue to work until the new member is started , Because it requires two members as follower To reach agreement on consistency . This behavior only occurs in etcdctl member add Affect the time when the cluster and new members successfully establish a connection to existing members .
Check 3 platform etcd The state of :
[[email protected] ~]# etcdctl --endpoints=http://192.168.4.10:2379,http://192.168.4.20:2379,http://192.168.4.30:2379 endpoint health
http://192.168.4.10:2379 is healthy: successfully committed proposal: took = 16.900029ms
http://192.168.4.30:2379 is healthy: successfully committed proposal: took = 17.184419ms
http://192.168.4.20:2379 is healthy: successfully committed proposal: took = 10.413913ms
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member list
39d7dd629f95330e, started, etcd2, http://192.168.4.20:2380, http://192.168.4.20:2379, false
9825b911c2558475, started, etcd3, http://192.168.4.30:2380, http://192.168.4.30:2379, false
f9b6e5803038fabb, started, etcd1, http://192.168.4.10:2380, http://192.168.4.10:2379, false
see etcd3 Whether the data in is synchronized :
[[email protected] ~]# ETCDCTL_API=2 etcdctl --endpoints="http://192.168.4.30:2379" get /docker-flannel/network/config
{
"Network": "10.0.0.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "vxlan"
}
}
Update the members
to update advertise client URLs
To update members of advertise client URLs, After a simple update client URL Mark (--advertise-client-urls) Or the environment variable to restart the member (ETCD_ADVERTISE_CLIENT_URLS). After the restart, the members will release the updated URL. Error updated client URL Will not affect etcd The health of the cluster .
to update advertise peer URLs
To update members advertise peer URLs, First update it with the member command and then restart the member . Additional behavior is required because of updates peer URL The cluster wide configuration has been modified and can affect etcd The health of the cluster .
To update peer URL, First , We need to find the target members ID. Use etcdctl List all members :
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member list
39d7dd629f95330e, started, etcd2, http://192.168.4.20:2380, http://192.168.4.20:2379, false
9825b911c2558475, started, etcd3, http://192.168.4.30:2380, http://192.168.4.30:2379, false
f9b6e5803038fabb, started, etcd1, http://192.168.4.10:2380, http://192.168.4.10:2379, false
In this case , Update the members ID by 9825b911c2558475(etcd3) And modify its peerURLs The value is http://192.168.4.30:23800.
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member update 9825b911c2558475 --peer-urls=http://192.168.4.30:23800
Member 9825b911c2558475 updated in cluster b9b6bab8c2110fd
View the member list again :
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member list
39d7dd629f95330e, started, etcd2, http://192.168.4.20:2380, http://192.168.4.20:2379, false
9825b911c2558475, started, etcd3, http://192.168.4.30:23800, http://192.168.4.30:2379, false
f9b6e5803038fabb, started, etcd1, http://192.168.4.10:2380, http://192.168.4.10:2379, false
Delete members
Suppose we want to delete the member ID yes 9825b911c2558475(etcd3). It can be used remove Command to execute the delete :
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" member remove 9825b911c2558475
Member 9825b911c2558475 removed from cluster b9b6bab8c2110fd
At this point, the target member will stop itself and print out the removal information in the log ,etcd The service will stop :
etcd: the member has been permanently removed from the cluster
Can be safely removed leader, Of course in the new leader The cluster will not be active when elected (inactive). This duration is usually the election timeout plus the voting process .
that , You can view which node is leader:
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379" endpoint status --cluster -w table
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.4.30:2379 | f131e3371e51a36 | 3.4.13 | 25 kB | false | false | 62 | 198958 | 198958 | |
| http://192.168.4.20:2379 | 441a3fcaee433945 | 3.4.13 | 25 kB | true | false | 62 | 198958 | 198958 | |
| http://192.168.4.10:2379 | f9b6e5803038fabb | 3.4.13 | 25 kB | false | false | 62 | 198958 | 198958 | |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Or use --cluster, More convenient , No need to use --endpoints Flag to specify each endpoint separately .:
[[email protected] ~]# etcdctl --endpoints="http://192.168.4.10:2379,http://192.168.4.20:2379,http://192.168.4.30:2379" endpoint status -w table
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.4.10:2379 | f9b6e5803038fabb | 3.4.13 | 25 kB | false | false | 62 | 198963 | 198963 | |
| http://192.168.4.20:2379 | 441a3fcaee433945 | 3.4.13 | 25 kB | true | false | 62 | 198963 | 198963 | |
| http://192.168.4.30:2379 | f131e3371e51a36 | 3.4.13 | 25 kB | false | false | 62 | 198963 | 198963 | |
+--------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Strict reconfiguration check mode (-strict-reconfig-check)
As mentioned above , The best practice for adding new members is to configure a single member at a time and verify that it starts correctly before adding more new members . This step-by-step approach is very important , Because if the newly added member is not configured correctly ( for example peer URL Incorrect ), The cluster will lose a quorum . The loss of quorum occurs because the newly added members are counted by the quorum , Even if this member is inaccessible to other existing members . Similarly, the loss of quorum may occur when there are connection problems or operation problems .
To avoid this problem ,etcd Provide options -strict-reconfig-check. If this option is passed to etcd, etcd Reconfiguration request denied , If the number of members started will be less than the quorum of the reconfigured cluster .
It is recommended to enable this option . Of course , It is turned off by default for compatibility . The environment variable name is : environment variable : ETCD_STRICT_RECONFIG_CHECK.
Reference resources :https://doczhcn.gitbook.io/etcd/index/index-1/clustering/runtime-configuration#yan-ge-zhong-pei-zhi-jian-cha-mo-shi-strictreconfigcheck
边栏推荐
- JS prototype. The find () method has no effect on the object array. It is urgent...
- Droid-slam: depth vision slam for monocular and binocular rgbd cameras
- Tu ne peux pas être libre sans richesse?
- Only when you find your own advantages can you work tirelessly and get twice the result with half the effort!
- Lifeifei: I am more like a scientist in physics than an engineer
- 拆分数据---水平拆分和纵向拆分
- CAP理论听起来很高大上,其实很简单
- MWC 2022 lights up the future, and everything serves
- WordPress landing page customization plug-in recommendation
- 为WordPress相关日志插件增加自动缩略图功能
猜你喜欢

使用Yolov5训练好模型调用电脑自带摄像头时出现问题:TypeError: argument of type “int‘ is not iterable的解决方法

SpingBoot+Quartrz生产环境的应用支持分布式、自定义corn、反射执行多任务

How programmers do sidelines

数据库系统概论 ---- 第二章 -- 关系数据库(2.4 关系代数)

Appearance mode -- it has been used in various packages for a long time!

The application of the spingboot+quartrz production environment supports distributed, custom corn, reflective execution of multiple tasks

Exploration of kangaroo cloud data stack on spark SQL optimization based on CBO

Introduction to thread pool: ThreadPoolExecutor

Display of receiving address list 【 project mall 】

找到自己的优势,才能干活不累,事半功倍!
随机推荐
Where is it safer to open an account for soda ash futures? How much capital is needed to buy soda ash futures?
小 P 周刊 Vol.08
National multi-year solar radiation spatial distribution data 1981-2022, temperature distribution data, evapotranspiration data, evaporation data, rainfall distribution data, sunshine data, wind speed
让WordPress支持注册用户上传自定义头像功能
How programmers do sidelines
nft数字藏品app系统搭建
Writing the program into the microcontroller can control the forward and reverse rotation of the motor more conveniently and quickly
js面试题---箭头函数,find和filter some和every
AcWing 1944. Record keeping (hash, STL)
发布WordPress数据库缓存插件:DB Cache Reloaded 3.1
不做伪工作者
李飞飞:我更像物理学界的科学家,而不是工程师|深度学习崛起十年
Electron desktop development (development of an alarm clock [End])
IIHS tsp+ annual safety list released: 7 EVs were selected, and there are common problems in pedestrian AEB
JS merge two objects (interview questions)
Only when you find your own advantages can you work tirelessly and get twice the result with half the effort!
UCI-HAR数据集的处理
Use yolov3 to train yourself to make datasets and get started quickly
Command mode - attack, secret weapon
Development of official account system for digital collection app applet