当前位置:网站首页>Redis core technology and practice - learning notes (VIII) sentinel cluster: sentinel hung up
Redis core technology and practice - learning notes (VIII) sentinel cluster: sentinel hung up
2022-07-03 17:51:00 【Tom Kong】
Once multiple instances form a sentinel cluster , Even if a sentinel instance fails and hangs up , Other sentinels can continue to work together to complete the master-slave switch , Whether the main database is offline includes whether it is in the offline state , Select a new master library , And notification of slave libraries and clients .
When configuring sentry information , Just use the following configuration items , Set up the master database IP And port , There is no connection information configured for other sentinels .
sentinel monitor <master-name> <ip> <redis-port> <quorum>
be based on pub/sub The sentinel cluster of the mechanism consists of
Sentinel instances can protect each other and find , Thanks to Redis Provided pub/sub Mechanism , That is to say Release / Subscribe mechanism .
The sentinel needs to establish a connection to the main library , You can publish messages on the main library , Post his own connection information (IP And port ). You can also subscribe to messages from the main library , Get links from other sentinels . When multiple sentinel instances publish and subscribe to the main database , They can know each other's ip Address and port .
Redis Will channel In the form of , Manage messages by category . Be similar to The subject of the message queue .
Only apps that subscribe to the same channel , In order to exchange information through published messages .
"__sentinel_:hello" channel : Different sentinels use it to protect each other, discover and communicate with each other .
Example : The sentry put his IP(172.16.19.3) And port (26579) Publish to "__sentinel__:hello" On channel , Sentinel subscribed to the channel . Now the sentry 2 and 3 You can get sentinels directly from this channel 1 Of IP Address and port number .
sentry 2,3 You can talk to the sentry 1 Set up a network connection . Allied , sentry 2 and 3 A network connection can also be established between . In this way, the sentinel cluster is formed . They can communicate with each other through network connection , Judge and negotiate whether the main database is offline .
The Sentinels, in addition to establishing connections with each other to form a cluster , It also needs to be Establish a connection with the slave Library . The Sentinel's surveillance mission requires Determine the heartbeat of the master-slave database , Select the master and notify .
How does the sentry know from the library IP Address and port ?
The sentry sent it to the main vault INFO command complete . sentry 2 Send to the master database INFO command , After the main library accepts this command , Return from the library list to the sentinel . The sentry is based on From the library list Connection information in , Make a connection to each slave library , On this connection, the slave library is continuously connected monitor . sentry 1 and 3 In the same way Establish a connection with the slave Library .
adopt pub/sub Mechanism , Sentinels can form clusters before , meanwhile , The sentry passed again INFO command , Get the connection information from the library . You can also connect to the slave library , And monitor .
be based on pub/sub Mechanism for client event notification
A sentinel is one that runs in a specific mode Redis example , It just doesn't service request operations , It's just done monitor , Elector and notice The task of . So every sentinel instance provides pub/sub Mechanism . The client can subscribe to messages from the sentry . The sentry offers many news subscription channels , Different channels contain different key events in the process of master-slave switch .
event | Related channels |
Main library offline events | +sdown( The instance enters the subjective offline state ) |
-sdown( The instance exits the subjective offline state ) | |
+odown( The instance goes offline ) | |
-odown( The instance exits the offline status ) | |
Reconfigure events from library | +slave -reconf -sent( The sentry sent salveof Command to reconfigure the slave Library ) |
+slave -reconf -inprog( A new master library has been configured from the library but has not yet been synchronized ) | |
+slave -reconf -done( The slave library is configured with a new master library and synchronized with the new master library ) | |
New main library switch | +switch -master( The main library address has changed ) |
The client reads the sentry configuration file , Get sentinel address and port , Network with the sentry , Then we execute the subscription command on the client , To get different event messages .
Which sentry performs the master-slave switch ?
Determine which sentry performs the master-slave switch process , And the main library " Objective offline " The process of judging is similar to , Also a " Vote for arbitration " The process of .
The process of judging the objective offline : Most sentinels believe that the main warehouse has been " Subjective offline ".
Any sentinel just needs to judge the main library " Subjective offline " after , Will be sent to other instances is-master-down-by-addr command . next , Other instances will make decisions according to their connection with the main database Y or N Response ,Y It's equivalent to a yes vote ,N It's equivalent to a negative vote .
After a sentry obtains the affirmative vote required for arbitration , You can mark the master database as " Objective offline ", The number of affirmative votes is quorum Configuration item settings .
This Sentry will send orders to other sentries , Indicates that you want to perform master-slave switching yourself , And let all the other sentinels vote , This voting process is called "Leader The election ". Because the sentinel who finally performs master-slave switching becomes Leader, The voting process is called determine Leader.
Leader The election
- Get more than half of the votes
- The number of votes you get also needs to be greater than or equal to quorum value
With 3 One sentry, for example , Suppose at this time quorum Set to 2, So anyone who wants to be Leader Our sentinels mainly get 2 Just a yes vote .
Time | sentry 1 s1 | sentry 2 s2 | sentry 3 s3 |
T1 | Vote for yourself Y, towards s2,s3 Send a request for a vote to become Leader | ||
T2 | Vote for yourself Y, towards s1,s2 Send a request for a vote to become Leader | ||
T3 | received s3 Reply to your request N | received s3 Reply to your request Y | |
T4 | received s1 Reply to your request N | ||
T5 | 1 ticket Y,1 ticket N | 2 ticket Y, Become Leader |
- T1 moment ,s1 Judge the master database as " Objective offline ", He wants to be Leader, Vote for yourself Y, towards s2,s3 Send a request for a vote to become Leader.
- T2 moment ,s3 Judge the master database as " Objective offline ", He wants to be Leader, Vote for yourself Y, towards s1,s2 Send a request for a vote to become Leader.
- T3 moment ,S1 received S3 Of Leader Request to vote . because S1 I've voted for myself Y, therefore It can no longer vote for other sentinels , therefore S1 reply N Disagree . meanwhile ,S2 received T2 when S3 Sent Leader Request to vote . because S2 I haven't voted before , It will reply to the first sentinel who sent it a vote request Y, Reply to the sentinel who will send a vote request later N, therefore , stay T3 when ,S2 reply S3, agree! S3 Become Leader.
- T4 moment ,s2 Just received T1 when s1 The voting order sent . because s2 Already in T3 Always agree s3 Request to vote , here ,s2 to s1 reply N, Disagree s1 Become leader.
- This happens because s3 Your request arrives first , Probably s1 And s2 The network transmission between is just congested , This caused slow transmission of voting requests .
- T5 moment ,s1 get 1 ticket Y,1 ticket N,s3 get 2 ticket Y It also reaches the preset quorum value (quorum by 2), Become Leader.
- If this round of voting does not produce leader. The sentinel group will wait for a while ( Twice the sentinel failover timeout ), In the re-election .
The successful voting of sentinel clusters depends largely on the normal network transmission of election orders . If the network is under high pressure or blocked for a short time , As a result, no Sentry can get more than half of the votes . So when the network congestion improves, the success profile will increase . If the sentinel cluster has only 2 An example , At this point, a sentry wants to be Leader, Must obtain 2 ticket , instead of 1 ticket . If one 1 If one sentry hangs up, the master-slave switch cannot be completed .
In order to achieve Master slave switch , We introduced sentry ; for fear of Single sentry failure The master-slave switch cannot be performed after , And to reduce the miscarriage of justice , And the introduction of The sentry cluster ; Sentinel cluster needs some mechanisms to support its normal operation .
Make sure that the configuration of all sentinel instances is consistent , Especially the subjective judgment value down-after-milliseconds. We used to step on one “ pit ”. at that time , In our project , Because this value is not configured consistently on different sentinel instances , As a result, the sentinel cluster has not reached a consensus on the failed main database , So we didn't switch the main database in time , The end result of cluster service instability . therefore , You must not ignore this seemingly simple experience .
After class questions
Let's say I have a Redis Clusters are " One master, four slaves ", At the same time, the configuration includes 5 A cluster of sentinel instances ,quorum The value is 2, So during the operation , If 3 Sentinel instance failed , here Redis If the master-slave fails , It can also correctly judge the main database " Objective offline " Well ? You can also switch between master and slave Libraries ? Whether the more sentinel instances, the better ? If you turn it up down-after-milliseconds value , Can you reduce misjudgment ?
- It can be judged objectively , because quorum=2, When a sentry judges the main library " Subjective offline " after , Ask another sentry , When 2 All the Sentinels decided " Subjective offline ", Satisfy quorum value , So the main library " Objective offline ".
- Can't finish the objective offline , More than half must be elected leader.
- The sentry is judging " Subjective offline " And elections " Sentry leader " It is necessary to communicate with other nodes when , Exchange information . The more sentinel instances , The more communication times , Deploy more sentinels on different machines , The more nodes, the greater the risk of machine failure , These will affect the Sentinel's communication and election . When something goes wrong, it will also mean longer election time , Switching between master and slave takes longer .
- Turn it up properly down-after-milliseconds value , When there is a short-term fluctuation in the network between the sentry and the main database , It can reduce the probability of misjudgment . But turn it up down-after-milliseconds Value also means that the master-slave switch time will be longer , The longer the impact on the business , We need to weigh it against the actual scenario , Set a reasonable threshold .
边栏推荐
猜你喜欢
AcWing 271. 杨老师的照相排列【多维DP】
TCP拥塞控制详解 | 3. 设计空间
QT adjust win screen brightness and sound size
一入“远程”终不悔,几人欢喜几人愁。| 社区征文
win32:堆破坏的dump文件分析
面试官:值为 nil 为什么不等于 nil ?
The third day of writing C language by Yabo people
How to deploy applications on kubernetes cluster
Leetcode Valentine's Day Special - looking for a single dog
vs2013已阻止安装程序,需安装IE10
随机推荐
Hongmeng fourth training
i++与++i的区别:通俗易懂的讲述他们的区别
Applet setting multi account debugging
Basic grammar of interview (Part 2)
Research on Swift
Internet Hospital his Management Platform source, online Inquiry, appointment Registration Smart Hospital Small program source
QT adjust win screen brightness and sound size
互聯網醫院HIS管理平臺源碼,在線問診,預約掛號 智慧醫院小程序源碼
Getting started with deops
vs2013已阻止安装程序,需安装IE10
Ml (machine learning) softmax function to realize the classification of simple movie categories
[combinatorics] recursive equation (the non-homogeneous part is an exponential function and the bottom is the characteristic root | example of finding a special solution)
Discussion sur la logique de conception et de mise en oeuvre du processus de paiement
[combinatorics] generating function (summation property)
POM in idea XML graying solution
OpenSSL的SSL/BIO_get_fd
Implementation of Tetris in C language
Embedded-c language-7
[教程]在 CoreOS 上构建你的第一个应用
Distributed task distribution framework gearman