当前位置:网站首页>Project practice, redis cluster technology learning (13)
Project practice, redis cluster technology learning (13)
2022-07-02 10:05:00 【User 1289394】
5. Replace master
When enough votes are collected from the nodes , Trigger replace master operation :
1) Currently, the replication is cancelled from the node to the primary node .
2) perform clusterDelSlot Operation cancels the slot that the failed master node is responsible for , And implement clusterAddSlot Delegate these slots to yourself .
3) Broadcast your own... To the cluster pong news , Notify all nodes in the cluster that they have changed from node to primary node and taken over the slot information of the failed primary node .
Redis.6.3 Fail over time
After introducing the process of fault discovery and recovery , At this time, we can estimate the failover time :
1) Subjective offline (pfail) Identify time =cluster-node-timeout.
2) Subjective offline status message propagation time <=cluster-node-timeout/2. Message communication mechanism for more than cluster-node-timeout/2 The uncommunicating node initiates ping news , When selecting which nodes are included in the message body, the offline status node is preferred , So usually more than half of the primary nodes can be collected in this period of time pfail Report to complete fault discovery .
3) Transfer time from node <=1000 millisecond . Because of the delay in launching the election mechanism , The slave node with the largest offset will delay at most 1 Seconds to vote . Usually the first election will be a success , So the transfer time from the node is 1 Within seconds .
Based on the above analysis, the failover time can be estimated , as follows :
failover-time( millisecond ) ≤ cluster-node-timeout + cluster-node-timeout/2 + 1000 therefore , Failover time follows cluster-node-timeout Parameters are closely related , Default 15 second . During configuration, appropriate adjustments can be made according to the business tolerance , But not the smaller the better , The bandwidth consumption section in the next section will further explain .
10.6.4 Failover drill
So far, the main details of failover have been introduced , Next, simulate the master node through the cluster built before
Failure scenario , Analyze failover behavior . Use kill-9 Force the master node to shut down 6385 process , As shown in the figure .
Confirm cluster status :
127.0.0.1:6379> cluster nodes 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756
127.0.0.1:6385 master - 0 1471877563600 16 connected 0-1365 5462-6826 10923-
12287 15018-16383 40622f9e7adc8ebd77fca0de9edfe691cb8a74fb 127.0.0.1:6382
slave cfb28ef1deee4e0fa78da
……
closed 6385 process :
# ps -ef | grep redis-server | grep 6385
501 1362 1 0 10:50 0:11.65 redis-server *:6385 [cluster]
# kill -9 1362
Log analysis is as follows :
· From the node 6386 With the master node 6385 Replication interrupt , The log is as follows :
==> redis-6386.log <==
# Connection with master lost.
* Caching the disconnected master state.
* Connecting to MASTER 127.0.0.1:6385
* MASTER <-> SLAVE sync started
# Error condition on socket for SYNC: Connection refused
·6379 and 6380 Both master nodes are marked 6385 For the subjective , More than half are therefore marked as objective offline status , Print the following log :
==> redis-6380.log <==
* Marking node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 as failing (quorum
reached).
==> redis-6379.log <==
* Marking node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756 as failing (quorum
reached).
· From the node identification is copying the primary node into the objective offline after the election time , The log prints the election delay 964 In milliseconds , And print the offset currently copied from the node .
==> redis-6386.log <==
# Start of election delayed for 964 milliseconds (rank #0, offset 1822).
· When the election time is delayed , Update configuration era from node and initiate failure election .
==> redis-6386.log <==
1364:S 22 Aug 23:12:25.064 # Starting a failover election for epoch 17.
·6379 and 6380 The master node is the slave node 6386 vote , The log is as follows :
==> redis-6380.log <==
# Failover auth granted to 475528b1bcf8e74d227104a6cf1bf70f00c24aae for epoch 17
==> redis-6379.log <==
# Failover auth granted to 475528b1bcf8e74d227104a6cf1bf70f00c24aae for epoch 17
· Get... From node 2 After the primary nodes vote , More than half perform the replace master operation , To complete failover :
==> redis-6386.log <==
# Failover election won: I'm the new master.
# configEpoch set to 17 after successful failover
After successful failover , We have failed nodes 6385 Resume , Observe whether the node status is correct :
1) Restart the failed node 6385.
2) 6385 After the node starts, it finds that its own slot is assigned to another node , Then use the existing cluster configuration
Subject to , Become a new master node 6386 The slave node , The key logs are as follows :
# I have keys for slot 4096, but the slot is assigned to another node. Setting it
to
importing state.
# Configuration change detected. Reconfiguring myself as a replica of
475528b1bcf8e74d227104a6cf1bf70f00c24aae
3) Other nodes in the cluster receive 6385 It's from ping news , Clear objective offline status :
==> redis-6379.log <==
* Clear FAIL state for node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756: master
without
slots is reachable again.
==> redis-6380.log <==
* Clear FAIL state for node 1a205dd8b2819a00dd1e8b6be40a8e2abe77b756: master
without
slots is reachable again.
……
4)6385 The node becomes a slave node , For the master node 6386 Initiate the replication process :
==> redis-6385.log <==
* MASTER <-> SLAVE sync: Flushing old data
* MASTER <-> SLAVE sync: Loading DB in memory
* MASTER <-> SLAVE sync: Finished with success
5) The final cluster status is shown in the figure .
边栏推荐
- UE4夜间打光笔记
- Ckeditor 4.10.1 upload pictures to prompt "incorrect server response" problem solution
- [unreal] animation notes of the scene
- [ue5] blueprint making simple mine tutorial
- 【UE5】动画重定向:如何将幻塔人物导入进游戏玩耍
- QT qlabel style settings
- In SQL injection, why must the ID of union joint query be equal to 0
- C language programming problems
- 2837xd code generation module learning (4) -- idle_ task、Simulink Coder
- 2837xd 代碼生成——補充(1)
猜你喜欢

Skywalking理论与实践

Bookmark collection management software suspension reading and data migration between knowledge base and browser bookmarks

It is the most difficult to teach AI to play iron fist frame by frame. Now arcade game lovers have something

【虚幻】自动门蓝图笔记
![[ue5] two implementation methods of AI random roaming blueprint (role blueprint and behavior tree)](/img/dd/cbe608fcbbbdf187dd6f7312271d2e.png)
[ue5] two implementation methods of AI random roaming blueprint (role blueprint and behavior tree)

PI control of grid connected inverter (grid connected mode)

QT qlabel style settings

Eslint reports an error

Failed to configure a DataSource: ‘url‘ attribute is not specified and no embedd

Attack and defense world web advanced area unserialize3
随机推荐
Image recognition - Data Cleaning
2837xd代码生成模块学习(3)——IIC、eCAN、SCI、Watchdog、eCAP模块
MySQL default transaction isolation level and row lock
滲透測試的介紹和防範
[unreal] key to open the door blueprint notes
Alibaba cloud ack introduction
UE4夜间打光笔记
C language strawberry
This monitoring system makes workers tremble: turnover intention and fishing can be monitored. After the dispute, the product page has 404
Binary and decimal system of C language
Required request body is missing: (cross domain problem)
How much is it to develop a system software in Beijing, and what funds are needed to develop the software
2837xd 代碼生成——補充(1)
[unreal] animation notes of the scene
MySQL transaction
kinect dk 获取CV::Mat格式的彩色RGB图像(openpose中使用)
Attack and defense world web advanced area unserialize3
三相并网逆变器PI控制——离网模式
Error reporting on the first day of work (incomplete awvs unloading)
阿里云SLS日志服务