当前位置:网站首页>Project practice, redis cluster technology learning (12)
Project practice, redis cluster technology learning (12)
2022-07-02 10:05:00 【User 1289394】
Redis.6.2 Fault recovery
After the failure node becomes objective offline , If the downline node is the master node holding the slot, it needs to be in its slave node
Select one of the points to replace it , So as to ensure the high availability of the cluster . All slave nodes of the offline master node bear
Fault recovery obligations , When the slave node finds the master node copied by itself through the internal scheduled task, it enters the objective
When offline , Will trigger the recovery process .
1. Qualification check
Each slave node should check the last disconnection time with the master node , Determine whether it is qualified to replace the failed main section
spot . If the disconnection time between the slave node and the master node exceeds cluster-node-time*cluster-slavevalidity-factor, The current slave node is not eligible for failover . Parameters cluster-slavevalidity-factor Effective factor for slave nodes , The default is 10.
2. Time to prepare for the election
When the slave node is eligible for fail over , Update the time when the fault election is triggered , Only after reaching this time
To execute the subsequent process . The fields related to fault election time are as follows :
struct clusterState { ... mstime_t failover_auth_time; /* Record the failure election time before or next time */ int failover_auth_rank; /* Record the current slave node ranking */ }
The reason why delay trigger mechanism is adopted here , Mainly by using different delayed elections for multiple slave nodes
Time to support priority issues .( The specific pseudo code is otherwise documented )
3. Launch an election
When the timing task detection from the node reaches the fault election time (failover_auth_time) After arrival , The process of initiating an election is as follows :
(1) Update configuration era
The configuration era is an integer that only increases but not decreases , Each master node maintains its own configuration era
(clusterNode.configEpoch) Indicates the version of the current master node , Configuration of all master nodes
The eras are not equal , The slave node copies the configuration era of the master node .
The application scenarios for configuring the era are :
· New nodes join .
· Slot node mapping conflict detection .
· Voting conflict detection from nodes .
(2) Broadcast election news
Broadcast election news in the cluster (FAILOVER_AUTH_REQUEST), And record that it has been sent
The status of the message , Ensure that the slave node can only initiate one election in a configuration era . eliminate
The content of the message is like ping The news will just type Type changed to
FAILOVER_AUTH_REQUEST.
4. The election vote
Only the master node holding the slot will process the failure election message
(FAILOVER_AUTH_REQUEST), Because each node holding slots is in a configuration
There is only one vote in every yuan
The voting process is actually a leader election process , If there is N A master node holding slots
On behalf of N votes . Since the master node holding the slot in each configuration era can only vote for one slave node , So only one can get it from the node N/2+1 The votes of the , Make sure to find the only slave node .
For example, there are 5 A master node holding slots , Master node b After the failure, there are 4 individual , When one of them
Collected from nodes 3 When voting, the delegates get enough votes to replace the master node , The failed master node is also counted in the number of votes , Suppose the size of nodes in the cluster is 3 Lord 3 from , Among them is 2 A master node is deployed on one machine , When this machine goes down , Because... Cannot be collected from node
3/2+1 A primary node vote will result in failover failure . This problem also applies to the fault discovery link .
Therefore, when deploying a cluster, all primary nodes need to be deployed at least 3 A single point of failure can only be avoided on a physical machine .
边栏推荐
- 2837xd code generation - stateflow (3)
- 2837xd代码生成模块学习(1)——GPIO模块
- Sil/pil test of matlab code generation
- UE5——AI追逐(藍圖、行為樹)
- Introduction to go language
- Personal experience & blog status
- Ue5 - AI pursuit (blueprint, behavior tree)
- Typora installation package sharing
- c语言编程题
- ZK configuration center -- configuration and use of config Toolkit
猜你喜欢
【UE5】动画重定向:如何将幻塔人物导入进游戏玩耍
阿里云短信服务
Summary of demand R & D process nodes and key outputs
2837xd code generation - Supplement (2)
逆变器simulink模型——处理器在环测试(PIL)
Save video opencv:: videowriter
Read Day6 30 minutes before going to bed every day_ Day6_ Date_ Calendar_ LocalDate_ TimeStamp_ LocalTime
QT QLabel样式设置
阿里云SLS日志服务
Required request body is missing: (cross domain problem)
随机推荐
Blender multi lens (multi stand) switching
2837xd code generation - Supplement (3)
2837xd code generation module learning (1) -- GPIO module
This monitoring system makes workers tremble: turnover intention and fishing can be monitored. After the dispute, the product page has 404
Ue5 - AI pursuit (blueprint, behavior tree)
阿里云Prometheus监控服务
[ue5] animation redirection: how to import magic tower characters into the game
【UE5】AI随机漫游蓝图两种实现方法(角色蓝图、行为树)
Typora installation package sharing
Tinyxml2 reading and modifying files
Data insertion in C language
How does {} prevent SQL injection? What is its underlying principle?
QT QLabel样式设置
BugkuCTF-web21(详细解题思路及步骤)
2837xd Code Generation - stateflow (4)
三相并网逆变器PI控制——离网模式
How to handle error logic gracefully
Matlab生成dsp程序——官方例程学习(6)
Alibaba cloud Prometheus monitoring service
Image recognition - data augmentation