当前位置:网站首页>Meituan Er Mian: why does redis have sentinels?
Meituan Er Mian: why does redis have sentinels?
2022-07-06 09:26:00 【Java domain】
Subscription column
Don't talk much , Departure !
outline
Why the sentinel mechanism ?
stay Redis In the master-slave architecture of , Because the master-slave mode is read-write separation , If the primary node (master) Hang up , Then there will be no master node to serve the write request of the client , There is no master node for the slave node (slave) Data synchronization .
Master node hung
If you want to restore service at this time , It needs human intervention , Select a 「 From the node 」 Switch to a 「 Master node 」, Then let the other slave nodes point to the new master node , At the same time, you also need to notify the upstream connections Redis The client of the master node , Configure it as the master node in the IP The address is updated to 「 New master node 」 Of IP Address .
It's not very easy “ intelligence ” 了 , If there is a node that can monitor 「 Master node 」 The state of , When it is found that the primary node is hung , It automatically sends a 「 From the node 」 Switch to a 「 Master node 」 Words , Then it can save us a lot of things !
Redis stay 2.8 Sentinel provided after version (Sentinel) Mechanism , Its function is to realize the failover of master-slave nodes . It monitors whether the master node is alive , If you find that the primary node is hung , It will elect a slave node to switch to the master node , And notify the relevant information of the new master node to the slave node and the client .
How the sentinel mechanism works ?
Sentry is actually a special operation mode Redis process , So it is also a node . from “ sentry ” The name can also be seen , It's quite so “ Observer node ”, The object of observation is the master-slave node .
Of course , It's not just observation , When it observed an abnormal condition , Will make some “ action ”, To fix the abnormal state .
Sentinel node is mainly responsible for three things : monitor 、 Elector 、 notice .
The duty of the sentry
therefore , We should focus on these three things :
How the sentinel node monitors the node ? How to judge whether the primary node is really faulty ?
According to what rules, select a slave node to switch to the master node ?
How to inform the slave node and client of the relevant information of the new master node ?
How to judge whether the primary node is really faulty ?
Sentinels periodically send messages to all master and slave nodes PING command , When the master-slave node receives PING After the command , Will send a response command to the sentry , In this way, we can judge whether they are running normally .
Sentinels monitor master-slave nodes
If the master node or slave node does not respond to the sentinel within the specified time PING command , The Sentry will mark them as 「 Subjective offline 」. This 「 At the appointed time 」 It's a configuration item down-after-milliseconds Parameter setting , In milliseconds .
Subjective offline ? Is there any objective offline ?
Yes that's right , The objective offline is only applicable to the primary node .
The reason is for 「 Master node 」 Design 「 Subjective offline 」 and 「 Objective offline 」 Two states , It's because it's possible 「 Master node 」 In fact, there is no fault , Maybe it's just because the system pressure of the primary node is relatively high or the network sends congestion , This causes the master node not to respond to the sentinel within the specified time PING command .
therefore , In order to reduce misjudgment , Sentinels will not deploy only one node when deployed , Instead, multiple nodes are deployed as sentinel clusters ( A minimum of three machines are required to deploy the sentinel cluster ), Through multiple sentinel nodes to judge , You can avoid a single sentry because of its poor network , And misjudge the main node offline . meanwhile , The probability of multiple sentinel networks being unstable at the same time is small , They make decisions together , The miscalculation rate can also be reduced .
Specifically, how to determine that the primary node is 「 Objective offline 」 What about ?
When a sentinel judges that the primary node is 「 Subjective offline 」 after , Will give orders to other sentinels , When the other sentinels received this order , According to the network conditions of itself and the master node , Respond by voting in favor or refusing to vote .
When the number of approval votes of this sentry reaches... In the sentry configuration file quorum After the value set by the configuration item , At this time, the master node will be marked as 「 Objective offline 」.
for example , Now there is 3 A sentinel ,quorum The configuration is 2, So a sentry needs 2 Yes, yes , You can mark the master node as “ Objective offline ” 了 . this 2 The Yes vote includes one for the sentry himself and the other two .
PS:quorum The value of is generally set to one-half of the number of sentinels plus 1, for example 3 A sentry is set up 2.
After the sentinel judges the objective offline of the main node , The sentry is about to start in multiple 「 From the node 」 in , Select a slave node to be the new master node .
How to select a new master node ?
There are so many 「 From the node 」, Which slave node to choose as the new master node ?
How about a random way ? Random way , It's easy to implement , However, if a slave node with poor network status is selected as the new master node , Then there may be another master-slave failover in the near future .
therefore , We must first filter out the bad network status from the node . First, filter out the offline slave nodes , Then filter out the slave nodes with poor network connection in the past .
How to judge whether the network connection status before the slave node is bad ?
Redis There's a man named down-after-milliseconds * 10 Configuration item , Its down-after-milliseconds Is the maximum connection timeout of the master-slave node disconnection . If in down-after-milliseconds In milliseconds , The master and slave nodes are not connected through the network , We can think that the master-slave node is disconnected . If the disconnection occurs more than 10 Time , This indicates that the network condition of the slave node is not good , Not suitable as a new master node .
thus , We filter out the bad network status from the nodes , Next, we will conduct three rounds of investigation on all slave nodes : priority 、 Replication progress 、ID Number . During each round of investigation , Which slave node wins first , Select it as the new master node .
The first round of investigation : Sentinels will first sort according to the priority of slave nodes , The lower the priority, the higher the ranking ,
The second round of investigation : If the priority is the same , View the copied subscripts , Which comes from 「 Master node 」 Received a lot of replication data , Which one is on the front .
The third round of investigation : If the priority and subscript are the same , Select the slave node ID The smaller one .
The first round of investigation : The node with the highest priority wins
Redis There's a man named slave-priority Configuration item , You can set priorities for slave nodes .
The server configuration of each slave node is not necessarily the same , We can set the priority of slave nodes according to the server performance configuration .
such as , If 「 A From the node 」 The physical memory is the largest of all slave nodes , Then we can put 「 A From the node 」 Set the priority of to the highest . So when the sentry makes the first round of consideration , The highest priority A The slave node will win first , Then it will become the new master node .
The second round of investigation : The node with the highest replication progress wins
If in the first round of investigation , It is found that there are two slave nodes with the highest priority , Then there will be a second round of investigation , Compare the replication progress of two slave nodes .
What is replication progress ? In the master-slave architecture , The master node synchronizes the write operation to the slave node , In the process , Master node will use master_repl_offset Record the current latest write operation in repl_backlog_buffer Position in , The slave node will use slave_repl_offset This value records the current replication progress .
If a slave node slave_repl_offset Nearest master_repl_offset, It shows that its replication progress is the most advanced , So you can select it as the new master node .
The third round of investigation :ID The smaller one wins from the node
If in the second round of investigation , It is found that the priority and replication progress of two slave nodes are the same , Then there will be a third round of investigation , Compare two slave nodes ID Number ,ID The smaller one wins from the node .
What is? ID Number ? Each slave node has a number , This number is ID Number , Is used to uniquely identify the slave node .
Come here , The election is finally over . Let's briefly summarize :
Filter out the offline slave nodes ;
Filter out the slave nodes with poor historical network connection status ;
Take the rest from the node , Conduct three rounds of investigation : priority 、 Replication progress 、ID Number . In each round of investigation , If a winning slave node is found , Take it as the new master node .
Which sentinel is responsible for master-slave failover ?
As I said before , In order to be more “ objective ” It is judged that the master node has failed , Generally, it will not be judged only by the detection results of a single sentry , But multiple sentinels judge together , This can reduce the probability of misjudgment , So sentinels exist in the form of sentinel clusters .
After selecting the slave node to be the master node , Which node in the sentinel cluster performs master-slave failover ?
So at this point , You also need to choose one of the sentinel clusters leeder, Give Way Leader To perform master-slave switching .
The election leeder The process of voting is actually a voting process , Before the voting begins , There must be one 「 candidates 」.
Who will be the candidate ?
Which sentinel node determines that the primary node is 「 Objective offline 」, This sentinel node is the candidate , The so-called candidate is to be Leader The sentinel of .
for instance , Suppose there are three sentinels . Be a sentry A First judge to the primary node 「 After going offline 」, It will send to other instances is-master-down-by-addr command . next , Other sentinels will be based on their network connection with the master node , Respond by voting in favor or refusing to vote .
Be a sentry A The number of affirmative votes received reached... In the sentry configuration file quorum After the value set by the configuration item , The master node will be marked as 「 Objective offline 」, The sentry at this time A It's just one. Leader candidates .
How candidates elect to become Leader?
The candidate will send orders to other sentinels , Show that you want to be Leader To perform master-slave switching , And let all the other sentinels vote on it .
Each sentry has only one chance to vote , If you run out, you can't vote , You can vote for yourself or for others , But only candidates can vote for themselves .
So in the voting process , Any one of them 「 candidates 」, Two conditions have to be met :
First of all , Get more than half of the votes ;
second , The number of votes you get also needs to be greater than or equal to quorum value .
for instance , Suppose the sentinel node has 3 individual ,quorum Set to 2, So anyone who wants to be Leader The sentinel just needs to get 2 Yes, yes , The election will be successful . If the conditions are not met , We need a new election .
At this time, some students will ask , If at some point in time , Just two sentinel nodes judge that the primary node is an objective offline node , Then there are two candidates ? How to decide who is Leader Well ?
Each candidate will vote for himself first , Then ask the other sentinels to vote . If voters receive first 「 candidates A」 To ask for a vote , Will vote for it first , If voters run out of voting opportunities , received 「 candidates B」 After your request for a vote , Will refuse to vote . At this time , candidates A The above two conditions are met first , therefore 「 candidates A」 Will be elected as Leader.
Why should sentinel nodes have at least 3 individual ?
If there are only 2 Sentinel nodes , At this point, if a sentry wants to succeed in becoming Leader, Must obtain 2 ticket , instead of 1 ticket .
therefore , If one of the sentinels in the sentinel cluster dies , Then there's only one sentinel left , If this sentry wants to be Leader, At this time, the number of votes cannot reach 2 ticket , You can't succeed in becoming Leader, At this time, the master-slave node cannot be switched .
therefore , Usually we will at least configure 3 Sentinel nodes . At this time , If one of the sentinels in the sentinel cluster dies , Then there are two sentinels left , If this sentry wants to be Leader, There is still a chance to achieve 2 Ticket , So the election can still be successful , It will not lead to the failure of master-slave node switching .
Of course , You have to ask , If 3 Sentinel nodes , Hang up 2 What can I do ? It's time for human intervention , Or add a little more sentry nodes .
One more question ,Redis 1 Lord 4 from ,5 A sentinel ,quorum Set to 3, If 2 A sentinel malfunction , When the primary node goes down , Whether the Sentry can judge the master node “ Objective offline ”? Can I switch automatically ?
Sentinel cluster can determine the master node “ Objective offline ”. The sentinel cluster remains 3 A sentinel , When a sentry judges the master node “ Subjective offline ” after , Ask in addition 2 After a sentry , It's possible to get 3 A yes vote , And that's it quorum Value , therefore , The sentinel cluster can determine that the primary node is “ Objective offline ”.
Sentinel cluster can complete master-slave switching . When a sentinel marks the primary node as 「 Objective offline 」 after , There will be elections Leader The process of , Because at this time, the sentinel cluster remains 3 A sentinel , Then you can still get more than half (5/2+1=3) Tickets for , And it has reached quorum value , Satisfied the election Leader Two conditions of , So you can win the election , Therefore, the sentinel cluster can complete the master-slave switching .
If quorum Set to 2 Words , And there are 3 A sentinel malfunction . At this time, the sentinel cluster can still determine that the primary node is “ Objective offline ”, But the sentry cannot complete the master-slave switch , You can deduce it yourself .
quorum It is recommended to set the value of to one-half of the number of sentinels plus 1, for example 3 A sentry is set up 2,5 Sentinels are set to 3, And the number of sentinel nodes should be odd .
How to notify the client of the new master node ?
After the previous series of operations , Sentinel cluster has finally completed the master-slave failover , So how should the information of the new master node be notified to the client ?
This is mainly through Redis The publisher of / Subscriber mechanism . Each sentinel node provides a publisher / Subscriber mechanism , The client can subscribe to messages from the sentry .
such as , The client subscribes to the event of master-slave switching , When the sentry selects the new master node , The new master node will be released IP Address and port information , At this time, the client can receive this message , Then use the of the new master node IP Address and port communicate .
How do sentinel clusters form ?
As mentioned earlier Redis The publisher of / Subscriber mechanism , Then I have to mention the composition of sentinel clusters , Because it also uses this technology .
When I first set up a sentinel cluster , I was surprised at the time . Because when configuring sentry information , You only need to fill in the following parameters , Set the name of the master node 、 The master node IP Address and port number and quorum value .
sentinel monitor <master-name> <ip> <redis-port> <quorum>
Copy code
There is no need to fill in the information of other sentinel nodes , I wonder how they perceive each other , How to form a sentinel cluster ?
I learned later , Sentinel nodes are connected through Redis The publisher of / Subscriber mechanism to discover each other .
In a master-slave cluster , There is a node named on the master node __sentinel__:hello Channel , It's through it that different sentinels find each other , To communicate with each other .
In the following illustration , sentry A Put your own IP Address and port information is published to __sentinel__:hello On channel , sentry B and C Subscribe to the channel . So at this time , sentry B and C You can get sentinels directly from this channel A Of IP Address and port number . then , sentry B、C You can talk to the sentry A Set up a network connection .
In this way , sentry B and C You can also set up a network connection , thus , The sentinel cluster formed .
The sentinel cluster will be right 「 From the node 」 Monitor the running state of the system , How does the sentinel group know 「 From the node 」 Information about ?
The master node knows all 「 From the node 」 Information about , So the sentinel will send... To the master node INFO Command to get all 「 From the node 」 Information about .
As shown in the figure below , sentry B Send to the master node INFO command , After the master node receives this command , Will return from the node list to the sentinel . next , The sentinel can use the connection information from the node list , Establish a connection with each slave node , And continuously monitor the slave node on this connection . sentry A and C You can establish a connection with the slave node in the same way .
Officially passed Redis The publisher of / Subscriber mechanism , Sentinels can perceive each other , Then form a cluster , meanwhile , The sentry passed again INFO command , All slave node connection information is obtained in the master node , Then you can establish a connection with the slave node , And monitored .
Reference material :
《Redis Core technology and actual combat 》
《Redis Design and implementation 》
summary
Redis stay 2.8 Sentinel provided after version (Sentinel) Mechanism , Its function is to realize master-slave fault automatic transfer . It monitors whether the master node is alive , If you find that the primary node is hung , It will elect a slave node to switch to the master node , And notify the relevant information of the new master node to the slave node and the client .
Sentinels are usually deployed in clusters , Need at least 3 Sentinel nodes , The sentinel cluster is mainly responsible for three things : monitor 、 Elector 、 notice .
Sentinel node through Redis The publisher of / Subscriber mechanism , Sentinels can perceive each other , Connect with each other , Then form a sentinel cluster , At the same time, the sentry passed INFO command , All slave node connection information is obtained in the master node , Then you can establish a connection with the slave node , And monitored .
The sentinel cluster will determine whether the primary node is..., by voting 「 Objective offline 」, If the primary node is determined to be an objective offline node , Then it will start from all 「 From the node 」 Select one of them as the new master node , The selected rule has the following steps :
Filter out the offline slave nodes ;
Filter out the slave nodes with poor historical network connection status ;
Take the rest from the node , Conduct three rounds of investigation : priority 、 Replication progress 、ID Number . In each round of investigation , If a winning slave node is found , Take it as the new master node .
After selecting the slave node , You need to choose one from the sentinel cluster leader Perform master-slave switching . The election leader The process of , It's also a voting process , Anyone who wants to be leader Sentinel node of , Two conditions have to be met :
First of all , Get more than half of the votes ;
second , The number of votes you get also needs to be greater than or equal to quorum value .
The election is over leader After the sentry node , Perform master-slave switching . After the master-slave switch is completed , adopt Redis The publisher of / The subscriber mechanism notifies the client of the new master node IP Address and port .
If you are interested and want to know more about the content and related learning materials, please like the collection + Comment forwarding + Pay attention to me , There will be a lot of dry goods in the back . I have some interview questions 、 framework 、 Design materials can be said to be necessary for programmer interview ! All the information has been put into the network disk , If necessary, please download ! I replied by private letter 【666】 Free access to
边栏推荐
- Advanced Computer Network Review(3)——BBR
- Global and Chinese market of cup masks 2022-2028: Research Report on technology, participants, trends, market size and share
- Reids之缓存预热、雪崩、穿透
- Publish and subscribe to redis
- 使用标签模板解决用户恶意输入的问题
- Withdrawal of wechat applet (enterprise payment to change)
- 为拿 Offer,“闭关修炼,相信努力必成大器
- Implement window blocking on QWidget
- What is MySQL? What is the learning path of MySQL
- [oc]- < getting started with UI> -- common controls - prompt dialog box and wait for the prompt (circle)
猜你喜欢
Redis之连接redis服务命令
甘肃旅游产品预订增四倍:“绿马”走红,甘肃博物馆周边民宿一房难求
Heap (priority queue) topic
Advance Computer Network Review(1)——FatTree
Servlet learning diary 7 -- servlet forwarding and redirection
Simclr: comparative learning in NLP
[oc foundation framework] - < copy object copy >
Publish and subscribe to redis
面渣逆袭:Redis连环五十二问,图文详解,这下面试稳了
基于WEB的网上购物系统的设计与实现(附:源码 论文 sql文件)
随机推荐
[daily question] Porter (DFS / DP)
Advanced Computer Network Review(4)——Congestion Control of MPTCP
自定义卷积注意力算子的CUDA实现
QML control type: Popup
Redis' performance indicators and monitoring methods
Connexion d'initialisation pour go redis
Multivariate cluster analysis
The order of include header files and the difference between double quotation marks "and angle brackets < >
Advanced Computer Network Review(5)——COPE
KDD 2022论文合集(持续更新中)
Selenium+pytest automated test framework practice
AcWing 2456. Notepad
Global and Chinese markets for modular storage area network (SAN) solutions 2022-2028: Research Report on technology, participants, trends, market size and share
有软件负载均衡,也有硬件负载均衡,选择哪个?
五层网络体系结构
Global and Chinese market of electronic tubes 2022-2028: Research Report on technology, participants, trends, market size and share
Advanced Computer Network Review(5)——COPE
Redis之发布订阅
Mysql database recovery (using mysqlbinlog command)
Pytest's collection use case rules and running specified use cases