当前位置：网站首页>Project practice, redis cluster technology learning (13)

Project practice, redis cluster technology learning (13)

2022-07-02 10:05:00 【User 1289394】

5. Replace master

When enough votes are collected from the nodes , Trigger replace master operation ：

1） Currently, the replication is cancelled from the node to the primary node .

2） perform clusterDelSlot Operation cancels the slot that the failed master node is responsible for , And implement clusterAddSlot Delegate these slots to yourself .

3） Broadcast your own... To the cluster pong news , Notify all nodes in the cluster that they have changed from node to primary node and taken over the slot information of the failed primary node .

Redis.6.3 Fail over time

After introducing the process of fault discovery and recovery , At this time, we can estimate the failover time ：

1） Subjective offline （pfail） Identify time =cluster-node-timeout.

2） Subjective offline status message propagation time <=cluster-node-timeout/2. Message communication mechanism for more than cluster-node-timeout/2 The uncommunicating node initiates ping news , When selecting which nodes are included in the message body, the offline status node is preferred , So usually more than half of the primary nodes can be collected in this period of time pfail Report to complete fault discovery .

3） Transfer time from node <=1000 millisecond . Because of the delay in launching the election mechanism , The slave node with the largest offset will delay at most 1 Seconds to vote . Usually the first election will be a success , So the transfer time from the node is 1 Within seconds .

Based on the above analysis, the failover time can be estimated , as follows ：

failover-time( millisecond ) ≤ cluster-node-timeout + cluster-node-timeout/2 + 1000 therefore , Failover time follows cluster-node-timeout Parameters are closely related , Default 15 second . During configuration, appropriate adjustments can be made according to the business tolerance , But not the smaller the better , The bandwidth consumption section in the next section will further explain .

10.6.4 Failover drill

So far, the main details of failover have been introduced , Next, simulate the master node through the cluster built before

Failure scenario , Analyze failover behavior . Use kill-9 Force the master node to shut down 6385 process , As shown in the figure .

Confirm cluster status ：

127.0.0.1:6379> cluster nodes 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756

127.0.0.1:6385 master - 0 1471877563600 16 connected 0-1365 5462-6826 10923-

12287 15018-16383 40622f9e7adc8ebd77fca0de9edfe691cb8a74fb 127.0.0.1:6382

slave cfb28ef1deee4e0fa78da

……

closed 6385 process ：

# ps -ef | grep redis-server | grep 6385

501 1362 1 0 10:50 0:11.65 redis-server *:6385 [cluster]

# kill -9 1362

Log analysis is as follows ：

· From the node 6386 With the master node 6385 Replication interrupt , The log is as follows ：

==> redis-6386.log <==

# Connection with master lost.

* Caching the disconnected master state.

* Connecting to MASTER 127.0.0.1:6385

* MASTER <-> SLAVE sync started

# Error condition on socket for SYNC: Connection refused

·6379 and 6380 Both master nodes are marked 6385 For the subjective , More than half are therefore marked as objective offline status , Print the following log ：

==> redis-6380.log <==

* Marking node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 as failing (quorum

reached).

==> redis-6379.log <==

* Marking node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 as failing (quorum

reached).

· From the node identification is copying the primary node into the objective offline after the election time , The log prints the election delay 964 In milliseconds , And print the offset currently copied from the node .

==> redis-6386.log <==

# Start of election delayed for 964 milliseconds (rank #0, offset 1822).

· When the election time is delayed , Update configuration era from node and initiate failure election .

==> redis-6386.log <==

1364:S 22 Aug 23:12:25.064 # Starting a failover election for epoch 17.

·6379 and 6380 The master node is the slave node 6386 vote , The log is as follows ：

==> redis-6380.log <==

# Failover auth granted to 475528b1bcf8e74d227104a6cf1bf70f00c24aae for epoch 17

==> redis-6379.log <==

# Failover auth granted to 475528b1bcf8e74d227104a6cf1bf70f00c24aae for epoch 17

· Get... From node 2 After the primary nodes vote , More than half perform the replace master operation , To complete failover ：

==> redis-6386.log <==

# Failover election won: I'm the new master.

# configEpoch set to 17 after successful failover

After successful failover , We have failed nodes 6385 Resume , Observe whether the node status is correct ：

1） Restart the failed node 6385.

2） 6385 After the node starts, it finds that its own slot is assigned to another node , Then use the existing cluster configuration

Subject to , Become a new master node 6386 The slave node , The key logs are as follows ：

# I have keys for slot 4096, but the slot is assigned to another node. Setting it

importing state.

# Configuration change detected. Reconfiguring myself as a replica of

475528b1bcf8e74d227104a6cf1bf70f00c24aae

3) Other nodes in the cluster receive 6385 It's from ping news , Clear objective offline status ：

==> redis-6379.log <==

* Clear FAIL state for node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756: master

without

slots is reachable again.

==> redis-6380.log <==

* Clear FAIL state for node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756: master

without

slots is reachable again.

……

4）6385 The node becomes a slave node , For the master node 6386 Initiate the replication process ：

==> redis-6385.log <==

* MASTER <-> SLAVE sync: Flushing old data

* MASTER <-> SLAVE sync: Loading DB in memory

* MASTER <-> SLAVE sync: Finished with success

5） The final cluster status is shown in the figure .

原网站

版权声明
本文为[User 1289394]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/02/202202151657425404.html

当前位置：网站首页>Project practice, redis cluster technology learning (13)

Project practice, redis cluster technology learning (13)

边栏推荐

猜你喜欢

随机推荐