当前位置:网站首页>Project practice, redis cluster technology learning (12)
Project practice, redis cluster technology learning (12)
2022-07-02 10:05:00 【User 1289394】
Redis.6.2 Fault recovery
After the failure node becomes objective offline , If the downline node is the master node holding the slot, it needs to be in its slave node
Select one of the points to replace it , So as to ensure the high availability of the cluster . All slave nodes of the offline master node bear
Fault recovery obligations , When the slave node finds the master node copied by itself through the internal scheduled task, it enters the objective
When offline , Will trigger the recovery process .
1. Qualification check
Each slave node should check the last disconnection time with the master node , Determine whether it is qualified to replace the failed main section
spot . If the disconnection time between the slave node and the master node exceeds cluster-node-time*cluster-slavevalidity-factor, The current slave node is not eligible for failover . Parameters cluster-slavevalidity-factor Effective factor for slave nodes , The default is 10.
2. Time to prepare for the election
When the slave node is eligible for fail over , Update the time when the fault election is triggered , Only after reaching this time
To execute the subsequent process . The fields related to fault election time are as follows :
struct clusterState {
...
mstime_t failover_auth_time; /* Record the failure election time before or next time */
int failover_auth_rank; /* Record the current slave node ranking */
}The reason why delay trigger mechanism is adopted here , Mainly by using different delayed elections for multiple slave nodes
Time to support priority issues .( The specific pseudo code is otherwise documented )
3. Launch an election
When the timing task detection from the node reaches the fault election time (failover_auth_time) After arrival , The process of initiating an election is as follows :
(1) Update configuration era
The configuration era is an integer that only increases but not decreases , Each master node maintains its own configuration era
(clusterNode.configEpoch) Indicates the version of the current master node , Configuration of all master nodes
The eras are not equal , The slave node copies the configuration era of the master node .
The application scenarios for configuring the era are :
· New nodes join .
· Slot node mapping conflict detection .
· Voting conflict detection from nodes .
(2) Broadcast election news
Broadcast election news in the cluster (FAILOVER_AUTH_REQUEST), And record that it has been sent
The status of the message , Ensure that the slave node can only initiate one election in a configuration era . eliminate
The content of the message is like ping The news will just type Type changed to
FAILOVER_AUTH_REQUEST.
4. The election vote
Only the master node holding the slot will process the failure election message
(FAILOVER_AUTH_REQUEST), Because each node holding slots is in a configuration
There is only one vote in every yuan
The voting process is actually a leader election process , If there is N A master node holding slots
On behalf of N votes . Since the master node holding the slot in each configuration era can only vote for one slave node , So only one can get it from the node N/2+1 The votes of the , Make sure to find the only slave node .
For example, there are 5 A master node holding slots , Master node b After the failure, there are 4 individual , When one of them
Collected from nodes 3 When voting, the delegates get enough votes to replace the master node , The failed master node is also counted in the number of votes , Suppose the size of nodes in the cluster is 3 Lord 3 from , Among them is 2 A master node is deployed on one machine , When this machine goes down , Because... Cannot be collected from node
3/2+1 A primary node vote will result in failover failure . This problem also applies to the fault discovery link .
Therefore, when deploying a cluster, all primary nodes need to be deployed at least 3 A single point of failure can only be avoided on a physical machine .
边栏推荐
猜你喜欢

2837xd代码生成模块学习(3)——IIC、eCAN、SCI、Watchdog、eCAP模块

Alibaba cloud ack introduction

2837xd 代码生成——补充(2)

Required request body is missing: (cross domain problem)

Ue5 - AI pursuit (blueprint, behavior tree)

分享一篇博客(水一篇博客)

Off grid control of three-phase inverter - PR control

Matlab代码生成之SIL/PIL测试

渗透测试的介绍和防范

Share a blog (water blog)
随机推荐
UE4 night lighting notes
2837xd code generation - Supplement (2)
Int to string, int to qstring
[ue5] blueprint making simple mine tutorial
[ue5] two implementation methods of AI random roaming blueprint (role blueprint and behavior tree)
2837xd code generation module learning (4) -- idle_ task、Simulink Coder
ue虚幻引擎程序化植物生成器设置——如何快速生成大片森林
【虚幻】按键开门蓝图笔记
Alibaba cloud Prometheus monitoring service
[Yu Yue education] University Physics (Electromagnetics) reference materials of Taizhou College of science and technology, Nanjing University of Technology
[illusory] automatic door blueprint notes
int与string、int与QString互转
Read Day5 30 minutes before going to bed every day_ All key values in the map, how to obtain all value values
Alibaba cloud SMS service
Error reporting on the first day of work (incomplete awvs unloading)
MySQL default transaction isolation level and row lock
2837xd 代码生成——StateFlow(3)
Web security and defense
2837xd code generation - Supplement (3)
Because of hard work, the fruit goes with fate