Redis sentinel simply looks at the trade-offs between distributed high availability and consistency

redis-sentinel

Redis Sentinel It's a distributed system , by Redis Provide high availability solutions . It can be understood that the sentinel mechanism is a set of supervision system based on replication , Timely failover .

Rumor agreement (gossip protocols)

sentinel Cluster judgment master Whether to go offline is using rumor agreement .

Every Sentinel The process will take Once a second The frequency to the known master server 、 From the server and other Sentinel Instance sends a PING command
Each instance （instance） Distance from the last valid reply PING The order took longer than down-after-milliseconds The value specified by the option , Then the instance will be marked Subjective offline . An effective reply can be ：+PONG、-LOADING perhaps -MASTERDOWN
If one Master The primary server is marked as Subjective offline , Then monitor all of the main server Sentinel Will Once a second Confirm that the main server is indeed in the offline state of the main line
If there are enough Sentinel（ At least the number specified in the configuration file ） Confirm a Master The primary server is Subjective offline state , Then the master server is marked Objective offline
Not enough Sentinel Agree to the main server offline , The objective offline status of the primary server will be removed ; When the master server returns to Sentinel Of PING Command returns a valid reply , The subjective offline state of the primary server will be removed

Voting agreement (agreement protocols)

Subjective offline ： The so-called subjective offline , It's a single sentinel Consider a service offline （ It is possible that the subscription could not be received , The network between them is blocked and so on ）. Objective offline ： When the subjective offline node is the master node , At this time, the sentinel 3 The command node will pass through sentinel is-masterdown-by-addr Looking for other sentinel nodes to judge the master node , If other sentinels also think that the main node is offline , When the number of subjective offline votes exceeds quorum（ The election ） Number , At this time, the sentinel node thinks that there is a problem with the master node , In this way, the objective offline , Most sentinel nodes agree to offline operations , In other words, it is objective offline .

If the master node is judged to be offline objectively , It is necessary to select a sentinel node to complete the subsequent failover work , Elect a leader The process is as follows :

a) Every sentinel node online can be a leader , When it confirms （ Like sentinels 3） When the primary node is offline , They'll send... To other sentinels is-master-down-by-addr command , Ask for judgment and ask to set yourself as a leader , It's up to the leader to handle the failover ;

b) And each sentinel The node is receiving a ”sentinel is-master-down-by addr” On command , Only the first node is allowed to vote , This command will be rejected by other nodes ;

c) If the sentry 3 Find yourself in an election with more than or equal to num(sentinels)/2+1 when , Will be the leader , If not more than , Continue the election …………

At the slave node (slave node) Select the new master node (master node)

sentinel The state data structure holds all the slave service information of the master service , The lead sentinel Follow these rules to select the new master service from the list of services

Filter out subjective offline nodes
choice slave-priority The highest node , If yes, return no and continue to select
Select the system node with the largest copy offset , Because the larger the copy offset, the more complete the data is copied , If there is, it returns , No, just continue
choice run_id The smallest node

Update master slave status

adopt slaveof no one command , Let the selected slave node become the master node ; And pass slaveof Command to make other nodes its slaves .

Set the offline master node as the slave node of the new master node , When it returns to normal , Copy the new master node , Become the slave node of the new master node .

Configure propagation

Once a sentinel Successfully to a master the failover, It will bring about master Inform others of the latest configuration of sentinel, The rest of the sentinel Then update the corresponding master Configuration of .

One faiover To be successful ,sentinel You have to be able to choose master Of slave send out SLAVE OF NO ONE command , And then be able to go through INFO Command to see new master Configuration information .

When will a slave Elected as master And send the SLAVE OF NO ONE` after , Even if the others slave Not for the new master Reconfigure yourself ,failover Also considered successful , And then all sentinels New configuration information will be released .

The way new partners spread to each other in a cluster , That's why we need to be sentinel Conduct failover The reason why you must be authorized a version number .

Every sentinel Use ## Release / subscribe ## It's spreading continuously in the same way master Configuration version information for , Configure the spread of ## Release / subscribe ## The pipeline is ：__sentinel__:hello.

Because every configuration has a version number , So take the one with the largest version number as the standard .

Take a chestnut ： Let's say I have a name mymaster The address for 192.168.1.50:6379. In limine , All in the cluster sentinel We all know the address , So for mymaster Type the version number of the configuration 1. After a while mymaster dead , There is one sentinel Authorized to use version number 2 On the failover. If failover succeed , Suppose the address changes to 192.168.1.50:9000, The configured version number is 2, Conduct failover Of sentinel The new configuration will be broadcast to other sentinel, Because of other sentinel The version number maintained is 1, The new configuration version number was found to be 2 when , The version number is bigger , Description configuration updated , So the latest version number is 2 Configuration of .

It means sentinel Clusters guarantee the second kind of activity ： A person who can communicate with each other sentinel The cluster will eventually adopt the configuration with the highest version number and the same version number .

summary

redis sentinel In essence, it is to realize fault removal , Automatic recovery , That is, improve system availability , The consistency model is the final consistency , Rely on broadcast to obtain configuration information , Each of the clusters sentinel Will eventually adopt the highest version of the configuration .