当前位置:网站首页>Meituan Er Mian: why does redis have sentinels?

Meituan Er Mian: why does redis have sentinels?

2022-07-06 09:26:00 Java domain

Subscription column


Don't talk much , Departure !

outline

Why the sentinel mechanism ?
stay Redis In the master-slave architecture of , Because the master-slave mode is read-write separation , If the primary node (master) Hang up , Then there will be no master node to serve the write request of the client , There is no master node for the slave node (slave) Data synchronization .

Master node hung

If you want to restore service at this time , It needs human intervention , Select a 「 From the node 」 Switch to a 「 Master node 」, Then let the other slave nodes point to the new master node , At the same time, you also need to notify the upstream connections Redis The client of the master node , Configure it as the master node in the IP The address is updated to 「 New master node 」 Of IP Address .

It's not very easy “ intelligence ” 了 , If there is a node that can monitor 「 Master node 」 The state of , When it is found that the primary node is hung , It automatically sends a 「 From the node 」 Switch to a 「 Master node 」 Words , Then it can save us a lot of things !

Redis stay 2.8 Sentinel provided after version (Sentinel) Mechanism , Its function is to realize the failover of master-slave nodes . It monitors whether the master node is alive , If you find that the primary node is hung , It will elect a slave node to switch to the master node , And notify the relevant information of the new master node to the slave node and the client .

How the sentinel mechanism works ?
Sentry is actually a special operation mode Redis process , So it is also a node . from “ sentry ” The name can also be seen , It's quite so “ Observer node ”, The object of observation is the master-slave node .

Of course , It's not just observation , When it observed an abnormal condition , Will make some “ action ”, To fix the abnormal state .

Sentinel node is mainly responsible for three things : monitor 、 Elector 、 notice .

The duty of the sentry

therefore , We should focus on these three things :

How the sentinel node monitors the node ? How to judge whether the primary node is really faulty ?

According to what rules, select a slave node to switch to the master node ?

How to inform the slave node and client of the relevant information of the new master node ?

How to judge whether the primary node is really faulty ?
Sentinels periodically send messages to all master and slave nodes PING command , When the master-slave node receives PING After the command , Will send a response command to the sentry , In this way, we can judge whether they are running normally .

Sentinels monitor master-slave nodes

If the master node or slave node does not respond to the sentinel within the specified time PING command , The Sentry will mark them as 「 Subjective offline 」. This 「 At the appointed time 」 It's a configuration item down-after-milliseconds Parameter setting , In milliseconds .

Subjective offline ? Is there any objective offline ?

Yes that's right , The objective offline is only applicable to the primary node .

The reason is for 「 Master node 」 Design 「 Subjective offline 」 and 「 Objective offline 」 Two states , It's because it's possible 「 Master node 」 In fact, there is no fault , Maybe it's just because the system pressure of the primary node is relatively high or the network sends congestion , This causes the master node not to respond to the sentinel within the specified time PING command .

therefore , In order to reduce misjudgment , Sentinels will not deploy only one node when deployed , Instead, multiple nodes are deployed as sentinel clusters ( A minimum of three machines are required to deploy the sentinel cluster ), Through multiple sentinel nodes to judge , You can avoid a single sentry because of its poor network , And misjudge the main node offline . meanwhile , The probability of multiple sentinel networks being unstable at the same time is small , They make decisions together , The miscalculation rate can also be reduced .

Specifically, how to determine that the primary node is 「 Objective offline 」 What about ?

When a sentinel judges that the primary node is 「 Subjective offline 」 after , Will give orders to other sentinels , When the other sentinels received this order , According to the network conditions of itself and the master node , Respond by voting in favor or refusing to vote .

When the number of approval votes of this sentry reaches... In the sentry configuration file quorum After the value set by the configuration item , At this time, the master node will be marked as 「 Objective offline 」.

for example , Now there is 3 A sentinel ,quorum The configuration is 2, So a sentry needs 2 Yes, yes , You can mark the master node as “ Objective offline ” 了 . this 2 The Yes vote includes one for the sentry himself and the other two .

PS:quorum The value of is generally set to one-half of the number of sentinels plus 1, for example 3 A sentry is set up 2.

After the sentinel judges the objective offline of the main node , The sentry is about to start in multiple 「 From the node 」 in , Select a slave node to be the new master node .

How to select a new master node ?
There are so many 「 From the node 」, Which slave node to choose as the new master node ?

How about a random way ? Random way , It's easy to implement , However, if a slave node with poor network status is selected as the new master node , Then there may be another master-slave failover in the near future .

therefore , We must first filter out the bad network status from the node . First, filter out the offline slave nodes , Then filter out the slave nodes with poor network connection in the past .

How to judge whether the network connection status before the slave node is bad ?

Redis There's a man named down-after-milliseconds * 10 Configuration item , Its down-after-milliseconds Is the maximum connection timeout of the master-slave node disconnection . If in down-after-milliseconds In milliseconds , The master and slave nodes are not connected through the network , We can think that the master-slave node is disconnected . If the disconnection occurs more than 10 Time , This indicates that the network condition of the slave node is not good , Not suitable as a new master node .

thus , We filter out the bad network status from the nodes , Next, we will conduct three rounds of investigation on all slave nodes : priority 、 Replication progress 、ID Number . During each round of investigation , Which slave node wins first , Select it as the new master node .

The first round of investigation : Sentinels will first sort according to the priority of slave nodes , The lower the priority, the higher the ranking ,

The second round of investigation : If the priority is the same , View the copied subscripts , Which comes from 「 Master node 」 Received a lot of replication data , Which one is on the front .

The third round of investigation : If the priority and subscript are the same , Select the slave node ID The smaller one .

The first round of investigation : The node with the highest priority wins
Redis There's a man named slave-priority Configuration item , You can set priorities for slave nodes .

The server configuration of each slave node is not necessarily the same , We can set the priority of slave nodes according to the server performance configuration .

such as , If 「 A From the node 」 The physical memory is the largest of all slave nodes , Then we can put 「 A From the node 」 Set the priority of to the highest . So when the sentry makes the first round of consideration , The highest priority A The slave node will win first , Then it will become the new master node .

The second round of investigation : The node with the highest replication progress wins
If in the first round of investigation , It is found that there are two slave nodes with the highest priority , Then there will be a second round of investigation , Compare the replication progress of two slave nodes .

What is replication progress ? In the master-slave architecture , The master node synchronizes the write operation to the slave node , In the process , Master node will use master_repl_offset Record the current latest write operation in repl_backlog_buffer Position in , The slave node will use slave_repl_offset This value records the current replication progress .

If a slave node slave_repl_offset Nearest master_repl_offset, It shows that its replication progress is the most advanced , So you can select it as the new master node .

The third round of investigation :ID The smaller one wins from the node
If in the second round of investigation , It is found that the priority and replication progress of two slave nodes are the same , Then there will be a third round of investigation , Compare two slave nodes ID Number ,ID The smaller one wins from the node .

What is? ID Number ? Each slave node has a number , This number is ID Number , Is used to uniquely identify the slave node .

Come here , The election is finally over . Let's briefly summarize :

Filter out the offline slave nodes ;

Filter out the slave nodes with poor historical network connection status ;

Take the rest from the node , Conduct three rounds of investigation : priority 、 Replication progress 、ID Number . In each round of investigation , If a winning slave node is found , Take it as the new master node .

Which sentinel is responsible for master-slave failover ?
As I said before , In order to be more “ objective ” It is judged that the master node has failed , Generally, it will not be judged only by the detection results of a single sentry , But multiple sentinels judge together , This can reduce the probability of misjudgment , So sentinels exist in the form of sentinel clusters .

After selecting the slave node to be the master node , Which node in the sentinel cluster performs master-slave failover ?

So at this point , You also need to choose one of the sentinel clusters leeder, Give Way Leader To perform master-slave switching .

The election leeder The process of voting is actually a voting process , Before the voting begins , There must be one 「 candidates 」.

Who will be the candidate ?

Which sentinel node determines that the primary node is 「 Objective offline 」, This sentinel node is the candidate , The so-called candidate is to be Leader The sentinel of .

for instance , Suppose there are three sentinels . Be a sentry A First judge to the primary node 「 After going offline 」, It will send to other instances is-master-down-by-addr command . next , Other sentinels will be based on their network connection with the master node , Respond by voting in favor or refusing to vote .

Be a sentry A The number of affirmative votes received reached... In the sentry configuration file quorum After the value set by the configuration item , The master node will be marked as 「 Objective offline 」, The sentry at this time A It's just one. Leader candidates .

How candidates elect to become Leader?

The candidate will send orders to other sentinels , Show that you want to be Leader To perform master-slave switching , And let all the other sentinels vote on it .

Each sentry has only one chance to vote , If you run out, you can't vote , You can vote for yourself or for others , But only candidates can vote for themselves .

So in the voting process , Any one of them 「 candidates 」, Two conditions have to be met :

First of all , Get more than half of the votes ;

second , The number of votes you get also needs to be greater than or equal to quorum value .

for instance , Suppose the sentinel node has 3 individual ,quorum Set to 2, So anyone who wants to be Leader The sentinel just needs to get 2 Yes, yes , The election will be successful . If the conditions are not met , We need a new election .

At this time, some students will ask , If at some point in time , Just two sentinel nodes judge that the primary node is an objective offline node , Then there are two candidates ? How to decide who is Leader Well ?

Each candidate will vote for himself first , Then ask the other sentinels to vote . If voters receive first 「 candidates A」 To ask for a vote , Will vote for it first , If voters run out of voting opportunities , received 「 candidates B」 After your request for a vote , Will refuse to vote . At this time , candidates A The above two conditions are met first , therefore 「 candidates A」 Will be elected as Leader.

Why should sentinel nodes have at least 3 individual ?

If there are only 2 Sentinel nodes , At this point, if a sentry wants to succeed in becoming Leader, Must obtain 2 ticket , instead of 1 ticket .

therefore , If one of the sentinels in the sentinel cluster dies , Then there's only one sentinel left , If this sentry wants to be Leader, At this time, the number of votes cannot reach 2 ticket , You can't succeed in becoming Leader, At this time, the master-slave node cannot be switched .

therefore , Usually we will at least configure 3 Sentinel nodes . At this time , If one of the sentinels in the sentinel cluster dies , Then there are two sentinels left , If this sentry wants to be Leader, There is still a chance to achieve 2 Ticket , So the election can still be successful , It will not lead to the failure of master-slave node switching .

Of course , You have to ask , If 3 Sentinel nodes , Hang up 2 What can I do ? It's time for human intervention , Or add a little more sentry nodes .

One more question ,Redis 1 Lord 4 from ,5 A sentinel ,quorum Set to 3, If 2 A sentinel malfunction , When the primary node goes down , Whether the Sentry can judge the master node “ Objective offline ”? Can I switch automatically ?

Sentinel cluster can determine the master node “ Objective offline ”. The sentinel cluster remains 3 A sentinel , When a sentry judges the master node “ Subjective offline ” after , Ask in addition 2 After a sentry , It's possible to get 3 A yes vote , And that's it quorum Value , therefore , The sentinel cluster can determine that the primary node is “ Objective offline ”.

Sentinel cluster can complete master-slave switching . When a sentinel marks the primary node as 「 Objective offline 」 after , There will be elections Leader The process of , Because at this time, the sentinel cluster remains 3 A sentinel , Then you can still get more than half (5/2+1=3) Tickets for , And it has reached quorum value , Satisfied the election Leader Two conditions of , So you can win the election , Therefore, the sentinel cluster can complete the master-slave switching .

If quorum Set to 2 Words , And there are 3 A sentinel malfunction . At this time, the sentinel cluster can still determine that the primary node is “ Objective offline ”, But the sentry cannot complete the master-slave switch , You can deduce it yourself .

quorum It is recommended to set the value of to one-half of the number of sentinels plus 1, for example 3 A sentry is set up 2,5 Sentinels are set to 3, And the number of sentinel nodes should be odd .

How to notify the client of the new master node ?
After the previous series of operations , Sentinel cluster has finally completed the master-slave failover , So how should the information of the new master node be notified to the client ?

This is mainly through Redis The publisher of / Subscriber mechanism . Each sentinel node provides a publisher / Subscriber mechanism , The client can subscribe to messages from the sentry .

such as , The client subscribes to the event of master-slave switching , When the sentry selects the new master node , The new master node will be released IP Address and port information , At this time, the client can receive this message , Then use the of the new master node IP Address and port communicate .

How do sentinel clusters form ?
As mentioned earlier Redis The publisher of / Subscriber mechanism , Then I have to mention the composition of sentinel clusters , Because it also uses this technology .

When I first set up a sentinel cluster , I was surprised at the time . Because when configuring sentry information , You only need to fill in the following parameters , Set the name of the master node 、 The master node IP Address and port number and quorum value .

sentinel monitor <master-name> <ip> <redis-port> <quorum> 

Copy code

There is no need to fill in the information of other sentinel nodes , I wonder how they perceive each other , How to form a sentinel cluster ?

I learned later , Sentinel nodes are connected through Redis The publisher of / Subscriber mechanism to discover each other .

In a master-slave cluster , There is a node named on the master node __sentinel__:hello Channel , It's through it that different sentinels find each other , To communicate with each other .

In the following illustration , sentry A Put your own IP Address and port information is published to __sentinel__:hello On channel , sentry B and C Subscribe to the channel . So at this time , sentry B and C You can get sentinels directly from this channel A Of IP Address and port number . then , sentry B、C You can talk to the sentry A Set up a network connection .

In this way , sentry B and C You can also set up a network connection , thus , The sentinel cluster formed .

The sentinel cluster will be right 「 From the node 」 Monitor the running state of the system , How does the sentinel group know 「 From the node 」 Information about ?

The master node knows all 「 From the node 」 Information about , So the sentinel will send... To the master node INFO Command to get all 「 From the node 」 Information about .

As shown in the figure below , sentry B Send to the master node INFO command , After the master node receives this command , Will return from the node list to the sentinel . next , The sentinel can use the connection information from the node list , Establish a connection with each slave node , And continuously monitor the slave node on this connection . sentry A and C You can establish a connection with the slave node in the same way .

Officially passed Redis The publisher of / Subscriber mechanism , Sentinels can perceive each other , Then form a cluster , meanwhile , The sentry passed again INFO command , All slave node connection information is obtained in the master node , Then you can establish a connection with the slave node , And monitored .

Reference material :

《Redis Core technology and actual combat 》

《Redis Design and implementation 》

summary
Redis stay 2.8 Sentinel provided after version (Sentinel) Mechanism , Its function is to realize master-slave fault automatic transfer . It monitors whether the master node is alive , If you find that the primary node is hung , It will elect a slave node to switch to the master node , And notify the relevant information of the new master node to the slave node and the client .

Sentinels are usually deployed in clusters , Need at least 3 Sentinel nodes , The sentinel cluster is mainly responsible for three things : monitor 、 Elector 、 notice .

Sentinel node through Redis The publisher of / Subscriber mechanism , Sentinels can perceive each other , Connect with each other , Then form a sentinel cluster , At the same time, the sentry passed INFO command , All slave node connection information is obtained in the master node , Then you can establish a connection with the slave node , And monitored .

The sentinel cluster will determine whether the primary node is..., by voting 「 Objective offline 」, If the primary node is determined to be an objective offline node , Then it will start from all 「 From the node 」 Select one of them as the new master node , The selected rule has the following steps :

Filter out the offline slave nodes ;

Filter out the slave nodes with poor historical network connection status ;

Take the rest from the node , Conduct three rounds of investigation : priority 、 Replication progress 、ID Number . In each round of investigation , If a winning slave node is found , Take it as the new master node .

After selecting the slave node , You need to choose one from the sentinel cluster leader Perform master-slave switching . The election leader The process of , It's also a voting process , Anyone who wants to be leader Sentinel node of , Two conditions have to be met :

First of all , Get more than half of the votes ;

second , The number of votes you get also needs to be greater than or equal to quorum value .

The election is over leader After the sentry node , Perform master-slave switching . After the master-slave switch is completed , adopt Redis The publisher of / The subscriber mechanism notifies the client of the new master node IP Address and port .

If you are interested and want to know more about the content and related learning materials, please like the collection + Comment forwarding + Pay attention to me , There will be a lot of dry goods in the back . I have some interview questions 、 framework 、 Design materials can be said to be necessary for programmer interview ! All the information has been put into the network disk , If necessary, please download ! I replied by private letter 【666】 Free access to  

 

原网站

版权声明
本文为[Java domain]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060900000054.html