当前位置:网站首页>Redis highly available sentinel mechanism
Redis highly available sentinel mechanism
2022-07-05 12:18:00 【Xujunsheng】
Redis Highly available sentinel mechanism
The main library is down , How to provide uninterrupted service ?
In the previous article , We analyzed Redis A master-slave mode . In this mode , If the slave library fails , The client can continue to send requests to the master library or other slave Libraries , Carry out relevant operations . But if the main library fails , That directly affects the synchronization of slave libraries , Because there is no corresponding master database for data replication from the slave database .
if , All the requests sent by the client are read requests , Then you can continue to provide services from the library , This can also be accepted in the pure reading business scenario .
Once there is a write request , According to the read-write separation requirements in the master-slave library mode , The main library needs to complete the write operation . here , No instance can serve the client's write request , As shown in the figure below :
therefore , If the main library hangs , We need to run a new master library , For example, switch a slave library to a master library , Think of it as the main library .
There are three issues involved :
- Does the main library really hang up ?
- Which slave database should be selected as the master database ?
- How to inform the slave library and client of the relevant information of the new master library ?
So that's the point Sentinel mechanism 了 .
stay Redis Master slave cluster , Sentry mechanism is the key mechanism to realize automatic switch between master and slave , It effectively solves the three problems of failover in master-slave replication mode .
Sentinel mechanism
sentry (Sentinel) yes Redis High availability solutions , It's really just A that runs in a special mode Redis process , When the master-slave database instance is running , It's running, too .
By one or more Sentinel Examples of Sentinel The system can monitor multiple master servers , And all the slave servers under these master servers , And when the monitored master server goes offline , Automatically upgrade a server under the offline master server to a new master server , Then the new master server will replace the offline master server to continue processing the command request .<<Redis Design and implementation >>
Get master server information
Sentinel By sending INFO
Command to obtain the current information of the master database and the information of all slave databases of the master database . As shown in the figure below :
When Sentinel After discovery from the Library ,Sentinel Will be right slave0、slave1 and slave2 Create command connection and subscription connection respectively .
After creating the command connection ,Sentinel Will send to the slave Library INFO command Get the following information :
- From the run of the library id;
- From the role of the Library
- The main library ip And port number ;
- The connection status of the main library ;
- From the priority of the library
- Copy offset from library
Based on this information ,Sentinel The instance structure of the slave library will be updated .
I understand Sentinel How to get the information of master and slave libraries , Now let's analyze the basic process of sentinel mechanism .
The basic process of sentry mechanism
The sentry is mainly responsible for three tasks : monitor 、 Elector ( Select the master library ) And notify the .
- monitor : adopt
PING
To monitor the master-slave server ; - Elector : The main library is down , Select a new master library in the slave library according to a certain mechanism ;
- notice : Inform other slave libraries and the client about the new master library ;
Let's look at the surveillance first .
monitor
By default , The sentinel will send to all master and slave libraries at a frequency of once per second PING
command , Check if they are still running online .
Sentinel In the configuration file down-after-millisenconds
Indicates that the sentinel judges that the instance enters Subjective offline The length of time required .
If an example is in down-after-millisenconds
In milliseconds , In succession Sentinel Return invalid reply , that Sentinel You will think that this instance has entered Subjective offline state .
Carefully, the classmate noticed , What we are talking about here is subjective offline . The sentinel has a problem with the offline judgment of the main library Subjective offline and Objective offline Two kinds of .
that , Why are there two kinds of judgments ? What are their differences and connections ?
Subjective offline and objective offline
Subjective offline : The sentinel process will use PING The command detects itself and the master 、 Network connection of slave library , Used to determine the state of an instance .
If the sentry finds the master or slave library to PING The response to the command timed out , that , The sentinel will mark it as 「 Subjective offline 」.
If it's a slave Library , that , The sentry simply marked it as 「 Subjective offline 」 That's it , Because the offline effect of the slave library is not too big ,
The external service of the cluster will not be interrupted .
however , If it's a master library , that , The Sentry can't simply mark it as 「 Subjective offline 」, Turn on the master-slave switch . Because it's possible that there is such a situation : That's the sentry Miscalculation 了 , In fact, there is no fault in the main database .
and , Once the master-slave switch is started , Subsequent selection and notification operations will bring additional computational and communication overhead . To avoid these unnecessary expenses , We need to pay special attention to misjudgments .
Miscalculation
Misjudgment is that the main database is actually not offline , But the sentry mistakenly thought it was offline .
Miscalculation usually occurs when the cluster network is under great pressure 、 Network congestion 、 Or when the main reservoir itself is under high pressure .
Once the sentinel decides that the main vault is offline , You'll start selecting the new master library , And let the slave database and the new master database to synchronize data , The process itself has costs .
for example , The sentinel will take time to select the new master library , The slave also takes time to synchronize with the new master .
And the truth is , The main database itself does not need to be switched , So the cost of this process is worthless .
Because of this , We need to judge if there is a miscalculation , And reduce misjudgment .
How to reduce miscarriage of Justice ?
In everyday life , When we have to judge something important , I often discuss it with my family or friends , Then make a decision .
The sentry mechanism is similar , We usually deploy in a cluster mode composed of multiple instances , Also known as sentinel clusters .
Introduce several sentinel examples to judge together , You can avoid a single sentry because your network is not good , And misjudge that the main database is offline .
meanwhile , The probability of multiple sentinel networks being unstable at the same time is small , They make decisions together , The miscalculation rate can also be reduced .
When Sentinel After judging a main database as subjective offline , In order to confirm whether it is misjudged , It will also monitor the main library to other Sentinel ask . Only most sentinel instances , It is judged that the master database has 「 Subjective offline 」 了 , The main database will be judged as 「 Objective offline 」, This name also shows that the offline of the main database has become an objective fact .
This judgment condition lies in Sentinel Configuration quorum
The value of the parameter , If Sentinel The configuration is as follows :
sentinel mointor master 127.0.0.1 6379 2
So as long as there are two Sentinel It is considered that the main library has entered the offline state , Then the main database is judged as 「 Objective offline 」.
Generally speaking , When there is N A sentinel instance , It's better to have N/2 + 1
The main database is 「 Subjective offline 」, In order to determine the main database as 「 Objective offline 」.
thus , You can reduce the probability of miscalculation , It can also avoid unnecessary master-slave switch caused by misjudgment .
With the help of the common judgment mechanism of multiple sentinel instances , We can more accurately determine whether the main database is offline . If the main library is offline , The sentinel is about to start the next decision-making process , That is, from many repositories , Select a slave library to be the new master library .
Then there is the second task of the sentinel , Elector .
How to select a new main library ?
After the main warehouse is hung ,Sentinel You need all slave libraries under the offline master database , Pick out one in good condition 、 Complete data from the Library , Make it the new main library .
After this step is completed , Now there is a new main database in the cluster .
Generally speaking , We put Sentinel The process of selecting a new master database is called 「 Screening + Scoring 」.
Simply speaking , We're in multiple repositories , First according to Certain screening conditions , Filter out the unqualified ones from the Library .
then , We'll follow Certain rules , Score the rest from the library one by one , Select the slave database with the highest score as the new master database .
Let's first look at the screening conditions .
filter
Sentinel All slave libraries under the offline master database will be saved to a list . Then follow the following rules , Screening :
1)、 In general , We must ensure that the selected slave library is still running online , So first, you need to delete the slave libraries that are offline or disconnected in the list .
however , When selecting master, slave database is online normally , This only means that the slave database is in good condition , It doesn't mean that it is the most suitable for the master library .
imagine , If in the election , A slave library works normally , We chose it as the new master library and started using it . But , Soon its network broke down , here , We're going to have to re elect . This is obviously not what we expect .
therefore , In the election , In addition to checking the current online status of the slave Library , We also need to judge the network connection status before it .
2)、 Delete all recent... From the list 5 I haven't replied in seconds INFO
Command slave , This can ensure that the remaining slave libraries have successfully communicated recently .
3)、 If the slave database is always disconnected from the master database , And the number of disconnection times exceeds a certain threshold , We have reason to believe that , The network condition of this slave database is not very good , You can sift this out of the library .
How to judge specifically ?
Use configuration items down-after-milliseconds * 10
.
among ,down-after-milliseconds
It is the maximum connection timeout that we determine that the master-slave database is disconnected .
If in down-after-milliseconds
In milliseconds , The master and slave nodes are not connected through the network , We can think that the master-slave node is disconnected . If the disconnection occurs more than 10 Time , This shows that the network condition of the slave database is not good , Not suitable as a new master library .
Okay , In this way, we filter out the slave database that is not suitable for the master database , The screening is done .
Scoring
The next step is to grade the rest of the slaves . We can grade three rounds according to the three rules , The three rules are From library priority 、 Copy progress from library and from library ID Number .
Just in one round , The highest score from the library , So it's the main library , This concludes the selection process . If there is no slave with the highest score , So go on to the next round .
The first round : The one with the highest priority gets the highest score from the database .
We can go through slave-priority
Configuration item , Set different priorities for different slaves .
slave-priority 100
such as , You have two slaves , They have different memory sizes , You can manually set a high priority for instances with large memory .
In the election , Sentinels will give high priority slaves high marks , If there is a slave library with the highest priority , So the main library is the new one .
If the priority of the slave library is the same , So the sentry begins the second round of scoring .
The second round : The highest copy offset is obtained from the Library .
The rule is based on , Sort all slave libraries with the same highest priority , Select the slave database with the largest offset, that is, the one closest to the data synchronization of the old master database, as the master database .
How to judge the synchronization progress between the slave database and the old master database ?
We mentioned in the previous article , There is a command propagation process during master-slave synchronization .
In the process , The main library will use master_repl_offset
Record the current latest write operation in repl_backlog_buffer
Position in ,
And from the library will use slave_repl_offset
This value records the current replication progress .
here , What we're looking for is from the library , its slave_repl_offset
Need to be closest to master_repl_offset
.
If in all slave Libraries , There are... From the library slave_repl_offset
Nearest master_repl_offset
, Then its score is the highest , It can be used as a new main library .
As shown in the figure below , From the old main library master_repl_offset
yes 1000,
Slave Library 1、2 and 3 Of slave_repl_offset
Namely 950、990 and 900, that , Slave Library 2 Should be selected as the new master library .
If there are two from the library slave_repl_offset
The value size is the same ( for example , Slave Library 1 And from the library 2 Of slave_repl_offset Values are 990),
We need to score them in the third round .
The third round : function ID The smallest score from the Library .
Each instance will have a ID, This ID It's similar to the slave library number here . at present ,Redis When selecting the master database , There is a default rule :
With the same priority and replication schedule , function ID The one with the smallest number gets the highest score from the library , Will be selected as the new master library .
Come here , The new master library was selected ,「 Elector 」 This process is done .
Let's review the process again :
- First , The sentinel will be on line 、 Network state , Filter out a part of the library that does not meet the requirements ;
- then , In order of priority 、 Replication progress 、ID Number size and then score the rest of the slave library , As long as there is the highest score from the library , Choose it as the new main library .
notice
When the new main database appears , next step , The Sentry will perform the last task : notice . That is, let all slave libraries that have been offline copy the new master database .
By letting the slave execute slaveof
command , Connect to the new master library , Data replication .
meanwhile , The sentinel will notify the client of the connection information of the new main database , Let them send the request operation to the new main library . And the offline master database will be set as the slave database of the new master database .
Okay , We have analyzed the basic process of sentinel mechanism work here . If you want to see more quality original articles , Welcome to my official account. 「ShawnBlog」.
边栏推荐
- Riddle 1
- Deep discussion on the decoding of sent protocol
- MVVM framework part I lifecycle
- [cloud native | kubernetes] actual battle of ingress case (13)
- Four operations and derivative operations of MATLAB polynomials
- The survey shows that traditional data security tools cannot resist blackmail software attacks in 60% of cases
- ABAP table lookup program
- A new WiFi option for smart home -- the application of simplewifi in wireless smart home
- Codeforces Round #804 (Div. 2)
- Pytorch weight decay and dropout
猜你喜欢
MySQL index - extended data
Four operations and derivative operations of MATLAB polynomials
Pytorch weight decay and dropout
Multi table operation - Auto Association query
Check the debug port information in rancher and do idea remote JVM debug
Hiengine: comparable to the local cloud native memory database engine
Select drop-down box realizes three-level linkage of provinces and cities in China
1 plug-in to handle advertisements in web pages
Get all stock data of big a
多表操作-子查询
随机推荐
Reinforcement learning - learning notes 3 | strategic learning
Check the debug port information in rancher and do idea remote JVM debug
Simply solve the problem that the node in the redis cluster cannot read data (error) moved
July Huaqing learning-1
想问问,如何选择券商?在线开户是很安全么?
Get data from the database when using JMeter for database assertion
【ijkplayer】when i compile file “compile-ffmpeg.sh“ ,it show error “No such file or directory“.
Just a coincidence? The mysterious technology of apple ios16 is actually the same as that of Chinese enterprises five years ago!
Pytorch softmax regression
Tabbar configuration at the bottom of wechat applet
1. Laravel creation project of PHP
Multi table operation - sub query
MySQL storage engine
Handwriting blocking queue: condition + lock
图像超分实验:SRCNN/FSRCNN
你做自动化测试为什么总是失败?
[configuration method of win11 multi-user simultaneous login remote desktop]
[untitled]
Understand kotlin from the perspective of an architect
Swift - enables textview to be highly adaptive