当前位置:网站首页>Project practice, redis cluster technology learning (13)
Project practice, redis cluster technology learning (13)
2022-07-02 10:05:00 【User 1289394】
5. Replace master
When enough votes are collected from the nodes , Trigger replace master operation :
1) Currently, the replication is cancelled from the node to the primary node .
2) perform clusterDelSlot Operation cancels the slot that the failed master node is responsible for , And implement clusterAddSlot Delegate these slots to yourself .
3) Broadcast your own... To the cluster pong news , Notify all nodes in the cluster that they have changed from node to primary node and taken over the slot information of the failed primary node .
Redis.6.3 Fail over time
After introducing the process of fault discovery and recovery , At this time, we can estimate the failover time :
1) Subjective offline (pfail) Identify time =cluster-node-timeout.
2) Subjective offline status message propagation time <=cluster-node-timeout/2. Message communication mechanism for more than cluster-node-timeout/2 The uncommunicating node initiates ping news , When selecting which nodes are included in the message body, the offline status node is preferred , So usually more than half of the primary nodes can be collected in this period of time pfail Report to complete fault discovery .
3) Transfer time from node <=1000 millisecond . Because of the delay in launching the election mechanism , The slave node with the largest offset will delay at most 1 Seconds to vote . Usually the first election will be a success , So the transfer time from the node is 1 Within seconds .
Based on the above analysis, the failover time can be estimated , as follows :
failover-time( millisecond ) ≤ cluster-node-timeout + cluster-node-timeout/2 + 1000 therefore , Failover time follows cluster-node-timeout Parameters are closely related , Default 15 second . During configuration, appropriate adjustments can be made according to the business tolerance , But not the smaller the better , The bandwidth consumption section in the next section will further explain .
10.6.4 Failover drill
So far, the main details of failover have been introduced , Next, simulate the master node through the cluster built before
Failure scenario , Analyze failover behavior . Use kill-9 Force the master node to shut down 6385 process , As shown in the figure .
Confirm cluster status :
127.0.0.1:6379> cluster nodes 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756
127.0.0.1:6385 master - 0 1471877563600 16 connected 0-1365 5462-6826 10923-
12287 15018-16383 40622f9e7adc8ebd77fca0de9edfe691cb8a74fb 127.0.0.1:6382
slave cfb28ef1deee4e0fa78da
……
closed 6385 process :
# ps -ef | grep redis-server | grep 6385
501 1362 1 0 10:50 0:11.65 redis-server *:6385 [cluster]
# kill -9 1362
Log analysis is as follows :
· From the node 6386 With the master node 6385 Replication interrupt , The log is as follows :
==> redis-6386.log <==
# Connection with master lost.
* Caching the disconnected master state.
* Connecting to MASTER 127.0.0.1:6385
* MASTER <-> SLAVE sync started
# Error condition on socket for SYNC: Connection refused
·6379 and 6380 Both master nodes are marked 6385 For the subjective , More than half are therefore marked as objective offline status , Print the following log :
==> redis-6380.log <==
* Marking node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 as failing (quorum
reached).
==> redis-6379.log <==
* Marking node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 as failing (quorum
reached).
· From the node identification is copying the primary node into the objective offline after the election time , The log prints the election delay 964 In milliseconds , And print the offset currently copied from the node .
==> redis-6386.log <==
# Start of election delayed for 964 milliseconds (rank #0, offset 1822).
· When the election time is delayed , Update configuration era from node and initiate failure election .
==> redis-6386.log <==
1364:S 22 Aug 23:12:25.064 # Starting a failover election for epoch 17.
·6379 and 6380 The master node is the slave node 6386 vote , The log is as follows :
==> redis-6380.log <==
# Failover auth granted to 475528b1bcf8e74d227104a6cf1bf70f00c24aae for epoch 17
==> redis-6379.log <==
# Failover auth granted to 475528b1bcf8e74d227104a6cf1bf70f00c24aae for epoch 17
· Get... From node 2 After the primary nodes vote , More than half perform the replace master operation , To complete failover :
==> redis-6386.log <==
# Failover election won: I'm the new master.
# configEpoch set to 17 after successful failover
After successful failover , We have failed nodes 6385 Resume , Observe whether the node status is correct :
1) Restart the failed node 6385.
2) 6385 After the node starts, it finds that its own slot is assigned to another node , Then use the existing cluster configuration
Subject to , Become a new master node 6386 The slave node , The key logs are as follows :
# I have keys for slot 4096, but the slot is assigned to another node. Setting it
to
importing state.
# Configuration change detected. Reconfiguring myself as a replica of
475528b1bcf8e74d227104a6cf1bf70f00c24aae
3) Other nodes in the cluster receive 6385 It's from ping news , Clear objective offline status :
==> redis-6379.log <==
* Clear FAIL state for node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756: master
without
slots is reachable again.
==> redis-6380.log <==
* Clear FAIL state for node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756: master
without
slots is reachable again.
……
4)6385 The node becomes a slave node , For the master node 6386 Initiate the replication process :
==> redis-6385.log <==
* MASTER <-> SLAVE sync: Flushing old data
* MASTER <-> SLAVE sync: Loading DB in memory
* MASTER <-> SLAVE sync: Finished with success
5) The final cluster status is shown in the figure .
边栏推荐
- 2837xd code generation - Supplement (2)
- What is the relationship between realizing page watermarking and mutationobserver?
- 2837xd代码生成模块学习(2)——ADC、ePWM模块、Timer0
- ue虛幻引擎程序化植物生成器設置——如何快速生成大片森林
- Read 30 minutes before going to bed every day_ day4_ Files
- C language programming problems
- Tools used for Yolo object recognition and data generation
- Configuration programmée du générateur de plantes du moteur illusoire UE - - Comment générer rapidement une grande forêt
- 2837xd 代码生成——总结篇
- Is the C language too fat
猜你喜欢
C language programming problems
2837xd Code Generation - stateflow (4)
2837xd代码生成模块学习(1)——GPIO模块
2837xd code generation - stateflow (1)
【UE5】动画重定向:如何将幻塔人物导入进游戏玩耍
Memories of a chat
Alibaba cloud Prometheus monitoring service
并网逆变器PI控制(并网模式)
2837xd code generation - Supplement (3)
Tools used for Yolo object recognition and data generation
随机推荐
ue虚幻引擎程序化植物生成器设置——如何快速生成大片森林
【UE5】蓝图制作简单地雷教程
Configuration programmée du générateur de plantes du moteur illusoire UE - - Comment générer rapidement une grande forêt
逆变器simulink模型——处理器在环测试(PIL)
保存视频 opencv::VideoWriter
Record the interesting process of using Xray for the first time
TD conducts functional simulation with Modelsim
阿里云SLS日志服务
vs+qt 设置应用程序图标
【虚幻】按键开门蓝图笔记
Junit5 支持suite的方法
Matlab生成dsp程序——官方例程学习(6)
三相逆变器离网控制——PR控制
YOLO物体识别,生成数据用到的工具
Large neural networks may be beginning to realize: the chief scientist of openai leads to controversy, and everyone quarrels
[ue5] blueprint making simple mine tutorial
2837xd 代码生成——总结篇
[illusory] automatic door blueprint notes
Mixed development of uni app -- Taking wechat applet as an example
Record personal understanding and experience of game console configuration