当前位置：网站首页>Project practice, redis cluster technology learning (12)

Project practice, redis cluster technology learning (12)

2022-07-02 10:05:00 【User 1289394】

Redis.6.2 Fault recovery

After the failure node becomes objective offline , If the downline node is the master node holding the slot, it needs to be in its slave node

Select one of the points to replace it , So as to ensure the high availability of the cluster . All slave nodes of the offline master node bear

Fault recovery obligations , When the slave node finds the master node copied by itself through the internal scheduled task, it enters the objective

When offline , Will trigger the recovery process .

1. Qualification check

Each slave node should check the last disconnection time with the master node , Determine whether it is qualified to replace the failed main section

spot . If the disconnection time between the slave node and the master node exceeds cluster-node-time*cluster-slavevalidity-factor, The current slave node is not eligible for failover . Parameters cluster-slavevalidity-factor Effective factor for slave nodes , The default is 10.

2. Time to prepare for the election

When the slave node is eligible for fail over , Update the time when the fault election is triggered , Only after reaching this time

To execute the subsequent process . The fields related to fault election time are as follows ：

struct clusterState {
 ...
 mstime_t failover_auth_time; /*  Record the failure election time before or next time  */
 int failover_auth_rank; /*  Record the current slave node ranking  */
}

The reason why delay trigger mechanism is adopted here , Mainly by using different delayed elections for multiple slave nodes

Time to support priority issues .（ The specific pseudo code is otherwise documented ）

3. Launch an election

When the timing task detection from the node reaches the fault election time （failover_auth_time） After arrival , The process of initiating an election is as follows ：

（1） Update configuration era

The configuration era is an integer that only increases but not decreases , Each master node maintains its own configuration era

（clusterNode.configEpoch） Indicates the version of the current master node , Configuration of all master nodes

The eras are not equal , The slave node copies the configuration era of the master node .

The application scenarios for configuring the era are ：

· New nodes join .

· Slot node mapping conflict detection .

· Voting conflict detection from nodes .

（2） Broadcast election news

Broadcast election news in the cluster （FAILOVER_AUTH_REQUEST）, And record that it has been sent

The status of the message , Ensure that the slave node can only initiate one election in a configuration era . eliminate

The content of the message is like ping The news will just type Type changed to

FAILOVER_AUTH_REQUEST.

4. The election vote

Only the master node holding the slot will process the failure election message

（FAILOVER_AUTH_REQUEST）, Because each node holding slots is in a configuration

There is only one vote in every yuan

The voting process is actually a leader election process , If there is N A master node holding slots

On behalf of N votes . Since the master node holding the slot in each configuration era can only vote for one slave node , So only one can get it from the node N/2+1 The votes of the , Make sure to find the only slave node .

For example, there are 5 A master node holding slots , Master node b After the failure, there are 4 individual , When one of them

Collected from nodes 3 When voting, the delegates get enough votes to replace the master node , The failed master node is also counted in the number of votes , Suppose the size of nodes in the cluster is 3 Lord 3 from , Among them is 2 A master node is deployed on one machine , When this machine goes down , Because... Cannot be collected from node

3/2+1 A primary node vote will result in failover failure . This problem also applies to the fault discovery link .

Therefore, when deploying a cluster, all primary nodes need to be deployed at least 3 A single point of failure can only be avoided on a physical machine .

原网站

版权声明
本文为[User 1289394]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202151657425445.html

当前位置：网站首页>Project practice, redis cluster technology learning (12)

Project practice, redis cluster technology learning (12)

边栏推荐

猜你喜欢

随机推荐