当前位置:网站首页>8 pits of redis distributed lock
8 pits of redis distributed lock
2022-07-01 03:01:00 【Lingxiyun】
List of articles
In distributed systems , because
Redis Distributed lock
Compared with simpler and more efficient , Became the first of the distributed locks , It is used in many actual business scenarios . But it doesn't mean it's used Redis Distributed lock , You can have a good sleep , If you don't use it well or use it right , It will also lead to some unexpected problems .
Let's talk today Redis Some pits of distributed locks , Give a reference to friends in need .
Non atomic operation
Use Redis Distributed locks for , The first thing that comes to mind may be setNx command
.
if (jedis.setnx(lockKey, val) == 1) {
jedis.expire(lockKey, timeout);
}
Easy to , You can write the code by dividing three by five .
This code can indeed lock successfully , but
Have you found any problems ?
The lock operation is separated from the subsequent setting timeout , It's not an atomic operation .
If the lock is successful , But set up
Timeout failed , The lockKey It will never fail
. IfIn high concurrency scenarios , A large number of lockKey Locked successfully , But it won't fail , May directly lead to redis Memory space is not
foot .
that , Is there a lock command that guarantees atomicity ?
The answer is : Yes , Please see the following .
Forgot to release the lock
It's about using setNx Command lock
Operation and setting timeout are separated , It's not an atomic operation .
And in the Redis There is also set command
, This command can specify multiple parameters .
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) {
return true;
}
return false;
- lockKey: Identification of lock
- requestId: request id
- NX: Only if the bond doesn't exist , To set the key .
- PX: Set the expiration time of the key to millisecond millisecond .
- expireTime: Expiration time
set Command is atomic operation
, Lock and set timeout , One command can be easily done .
Use set Command lock
, There seems to be no problem on the surface . But if you think about it , After lock up , Each time, the timeout period must be reached before releasing the lock , Is it a little unreasonable ?
After the lock , If the lock is not released in time , There will be many problems .
A more reasonable use of distributed locks is :
- Lock by hand
- Business operations
- Release the lock manually
- If the manual release of the lock fails , The timeout period is reached ,redis Will automatically release the lock .
So here comes the question , How to release the lock ?
The pseudocode is as follows :
try{
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) {
return true;
}
return false;
} finally {
unlock(lockKey);
}
You need to catch exceptions of business code , And then in finally Middle release lock . In other words : Whether the code execution succeeds or fails , All need to release the lock .
here , Some friends may ask : If you happen to release the lock , The system was restarted , Or the network is disconnected , Or the computer room is broken , Failure to release the lock will also result ?
This is a good question , Because this small probability problem does exist .
But remember setting the timeout for the lock before ? Even if an exception occurs, the lock release fails , But it's time to set the timeout , The lock will still be Redis Automatic release .
but Only in finally Middle release lock , Is that enough? ?
Released someone else's lock
Be kind to others , Answer the above questions first : Only in finally Middle release lock , Of course, it's not enough , Because the position of releasing the lock , It's still not right .
What's wrong ?
answer :
In a multithreaded scenario , It may happen that someone else's lock is released .
Some friends may retort : Suppose in a multithreaded scenario , Threads A Got the lock , But if the thread A No release lock , here , Threads B You can't get the lock , Why did you release the lock of others ?
answer :
If a thread A And thread B, All use lockKey Lock . Threads A Locked successfully , But because business functions take a long time , The set timeout has been exceeded . Now ,Redis It will be released automatically lockKey lock . here , Threads B Can give lockKey Locked successfully , Next, perform its business operations . Just at this time , Threads A After performing business functions , Next , stay finally Method to release the lock lockKey. This is not a problem , Threads B Lock of , Threaded A The release of the .
This is the time , Threads B Must be crying in the toilet , And there's a good word in your mouth .
that , How to solve this problem ?
I wonder if I noticed ? In the use of set Command lock , Besides using lockKey Lock logo , One more parameter is set :requestId, Why do you need to record requestId Well ?
answer :requestId It's used when releasing the lock .
The pseudocode is as follows :
if (jedis.get(lockKey).equals(requestId)) {
jedis.del(lockKey);
return true;
}
return false;
When releasing the lock , Get the value of the lock first ( The previously set value is requestId), then Judge whether the value is the same as that set before , If it is the same, it is allowed to delete the lock , Return to success . If different , Then directly return to failure .
In other words : I can only release the lock I added , It is not allowed to release the lock added by others .
Why do I use requestId, use userId Not line? ?
answer : If you use userId Words , Not unique to requests , Multiple different requests , May use the same userId. and requestId It's the only thing in the world , There is no confusion of locking and releasing locks .
Besides , Use lua Script , It can also solve the problem of releasing other people's locks :
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
lua The script can ensure that querying whether the lock exists and deleting the lock are atomic operations , It's better to use it to release the lock .
Speaking of lua Script , In fact, locking operation is also recommended lua Script :
if (redis.call('exists', KEYS[1]) == 0) then
redis.call('hset', KEYS[1], ARGV[2], 1);
redis.call('pexpire', KEYS[1], ARGV[1]);
return nil;
end
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1)
redis.call('hincrby', KEYS[1], ARGV[2], 1);
redis.call('pexpire', KEYS[1], ARGV[1]);
return nil;
end;
return redis.call('pttl', KEYS[1]);
This is a Redisson The locking code of the frame , Well written , You can learn from .
Interesting , What other interesting things are there ?
A large number of failed requests
The above locking method seems to be ok , But if you think about it , If there is 1 Wan's request to compete for the lock at the same time , Maybe only one request is successful , The rest 9999 Every request will fail .
In the second kill scene , What's the problem ?
answer : Every time 1 Million requests , Yes 1 A success . Again 1 Million requests , Yes 1 A success . Go on like this , Until inventory is low . This becomes a uniformly distributed second kill , It's different from what you think .
How to solve this problem ?
Besides , There's another scenario :
such as , There are two threads uploading files to sftp, Create a directory before uploading files . Suppose that the directory name to be created by both threads is the date of the day , such as :20210920, If you don't do any control , Create directories directly and concurrently , The second thread is bound to fail .
At this time, some friends may say : It's not easy , Add one more Redis Distributed locks can solve the problem , Besides, judge again , If the directory already exists, do not create , You need to create a directory only if it does not exist .
The pseudocode is as follows :
try {
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) {
if(!exists(path)) {
mkdir(path);
}
return true;
}
} finally{
unlock(lockKey,requestId);
}
return false;
Everything seems beautiful , But it can't stand careful deliberation .
A question from the soul : If the second request fails to lock , Next , It's a return failure , Or return to success ?
Obviously, the second request , It must not return failure , If the return fails , This problem is still not solved .
If the file has not been uploaded successfully , Returning directly to success will have a bigger problem . Have a headache , How to solve it ?
answer : Use
spinlocks
.
try {
Long start = System.currentTimeMillis();
while(true) {
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) {
if(!exists(path)) {
mkdir(path);
}
return true;
}
long time = System.currentTimeMillis() - start;
if (time>=timeout) {
return false;
}
try {
Thread.sleep(50);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
} finally{
unlock(lockKey,requestId);
}
return false;
At the appointed time , such as 500 In milliseconds , Spin keeps trying to lock ( To put it bluntly , It's in a dead circle , Keep trying to lock ), If successful, go straight back to . If you fail , Then sleep 50 millisecond , Launch a new round of attempts . If the timeout expires , Not locked yet , Then directly return to failure .
ok , Learned a move , anything else ?
Lock reentry problem
We all know Redis Distributed locks are mutually exclusive
. If for a key Locked , If it's time to key The corresponding lock hasn't failed yet , Use the same key Lock it up , Probably fail .
you 're right , Most scenes are OK .
Why most scenes ?
Because there are such scenes :
Suppose in a request , You need to get a menu tree or classification tree that meets the conditions . With
menu
For example , This requires starting from the root node in the interface , Recursively traverse all the child nodes that meet the conditions , Then assemble it into a menu tree .It should be noted that the menu is not static , Operating students in the background system can dynamically add 、 Modify and delete menus . To ensure that in the case of concurrency , It is possible to obtain the latest data every time , You can add
Redis Distributed lock
.Add
Redis Distributed lock
The idea is right . But then came the question ,Recursively traverses multiple times in a recursive method , The same lock is added every time . Of course, the first layer of recursion can lock successfully , But the second layer of recursion 、 The third level ... The first N layer , No, it will fail to lock ?
The pseudo code of locking in the recursive method is as follows :
private int expireTime = 1000;
public void fun(int level,String lockKey,String requestId){
try{
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) {
if(level<=10){
this.fun(++level,lockKey,requestId);
} else {
return;
}
}
return;
} finally {
unlock(lockKey,requestId);
}
}
If you use it directly , Looks like there's no problem . But after the final execution of the program, it was found that , There is only one result waiting for you : Something unusual happened
.
because Start at the root node , The first layer of recursive locking succeeded , The lock hasn't been released yet , It goes directly to the second level of recursion . Because the lock is called lockKey, And the value of requestId The lock already exists , Therefore, the second level recursion probability will fail to lock , Then return to the first layer . The first layer then releases the lock normally , Then the whole recursive method directly returns .
Now , You know what's wrong ?
you 're right , The recursive method actually only performs the first level of recursion and returns , Other layer recursion failed due to locking , There's no way to execute .
So how to solve this problem ?
answer : Use
Reentrant lock
.
With Redisson Framework for example , Its interior realizes the function of reentrant lock
.
thus it can be seen ,Redisson stay Redis The Jianghu status of distributed lock is very high .
The pseudocode is as follows :
private int expireTime = 1000;
public void run(String lockKey) {
RLock lock = redisson.getLock(lockKey);
this.fun(lock,1);
}
public void fun(RLock lock,int level){
try{
lock.lock(5, TimeUnit.SECONDS);
if(level<=10){
this.fun(lock,++level);
} else {
return;
}
} finally {
lock.unlock();
}
}
The code above may not be perfect , Here is just a general idea , If you have such needs , The above code is for reference only .
Next , Chat Redisson Implementation principle of reentrant lock .
Locking is mainly realized through the following script :
if (redis.call('exists', KEYS[1]) == 0)
then
redis.call('hset', KEYS[1], ARGV[2], 1); redis.call('pexpire', KEYS[1], ARGV[1]);
return nil;
end;
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1)
then
redis.call('hincrby', KEYS[1], ARGV[2], 1);
redis.call('pexpire', KEYS[1], ARGV[1]);
return nil;
end;
return redis.call('pttl', KEYS[1]);
- KEYS[1]: Lock name
- ARGV[1]: Expiration time
- ARGV[2]:uuid + “:” + threadId, May be considered as requestId
- First judge if the lock name does not exist , Lock it .
- Next , Judge if the lock name and requestId Values exist , Then use hincrby The command gives the lock name and requestId The value of count , Every time 1. Pay attention to the , Here is the key to re-enter the lock , If the lock is re entered once, the value will be increased 1.
- If the lock name exists , But it's not worth it requestId, The expiration time is returned .
Releasing the lock is mainly realized through the following script :
if (redis.call('hexists', KEYS[1], ARGV[3]) == 0)
then
return nil
end
local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1);
if (counter > 0)
then
redis.call('pexpire', KEYS[1], ARGV[2]);
return 0;
else
redis.call('del', KEYS[1]);
redis.call('publish', KEYS[2], ARGV[1]);
return 1;
end;
return nil
- First judge if the lock name and requestId Value does not exist , Then return directly .
- If the lock name and requestId The value is , Then re-enter the lock reduction 1.
- If less 1 after , Reentry lock value The value is also greater than 0, Instructions and references , Then retry setting the expiration time .
- If less 1 after , Reentry lock value The value is also equal to 0, You can delete the lock , Then send a message to inform the waiting thread to grab the lock .
Again , If the system can tolerate temporary data inconsistency , Some scenes can be unlocked , Here's just an example , This section does not apply to all scenarios .
Lock competition problem
If there are a large number of business scenarios that need to write data , Use ordinary redis There is no problem with distributed locks .
But if Some business scenarios , Fewer write operations , Instead, there are a lot of read operations . In this way, you can directly use ordinary Redis Distributed lock , Will it be a bit of a waste of performance ?
We all know , The coarser the granularity of the lock , The more intense the competition when multiple threads grab locks , The longer it takes to cause multiple thread locks to wait , The worse the performance .
therefore , promote Redis The first step in distributed lock performance , Is to reduce the granularity of the lock
.
Read-write lock
as everyone knows , The purpose of locking is to ensure , Security of reading and writing data in concurrent environment , That is, there will be no data errors or inconsistencies .
But in most actual business scenarios , Generally, the frequency of reading data is much higher than that of writing data . Concurrent read operations between threads do not involve concurrency security , We don't need to add mutexes to read operations , Just read and write 、 Write concurrent operations are mutually exclusive , This can improve the performance of the system .
With Redisson Framework for example , It has realized the function of read-write lock .
Read the lock
The pseudo-code is as follows :
RReadWriteLock readWriteLock = redisson.getReadWriteLock("readWriteLock");
RLock rLock = readWriteLock.readLock();
try {
rLock.lock();
// Business operations
} catch (Exception e) {
log.error(e);
} finally {
rLock.unlock();
}
Write lock
The pseudo-code is as follows :
RReadWriteLock readWriteLock = redisson.getReadWriteLock("readWriteLock");
RLock rLock = readWriteLock.writeLock();
try {
rLock.lock();
// Business operations
} catch (InterruptedException e) {
log.error(e);
} finally {
rLock.unlock();
}
Separate the read lock from the write lock , The biggest benefit is to improve the performance of read operations , Because reading and reading are shared , There is no mutex
. But in the actual business scenario , Most data operations are read operations . therefore , If you improve the performance of read operations , It will also improve the performance of the whole lock .
The following summarizes the characteristics of a read-write lock :
- Reading and reading are shared , Not mutually exclusive
- Read and write are mutually exclusive
- Write and write are mutually exclusive
Lock segment
Besides , In order to reduce the granularity of locks , The more common way is to lock the big lock : piecewise
.
stay java in ConcurrentHashMap
, Is to divide the data into 16 paragraph
, Each section has a separate lock , And the data in different lock segments do not interfere with each other , In order to improve the performance of the lock .
Put it in the actual business scenario , You can do this :
For example, in the scenario of second kill and inventory deduction , There are... In stock now 2000 A commodity , Users can second kill . To prevent oversold , Usually , You can lock the inventory . If there is 1W Users compete for the same lock , Obviously, the system throughput will be very low .
In order to improve system performance , Sure
Segment inventory
, such as :It is divided into 100 paragraph , So each paragraph has 20 A commodity can participate in the second kill
.In the process of second kill ,
First put the user id obtain hash value , Then divide by 100 modulus . Die for 1 User access to page 1 Segment inventory , Die for 2 User access to page 2 Segment inventory , Die for 3 User access to page 3 Segment inventory , And so on , In the end, the module is 100 User access to page 100 Segment inventory
.
In this way , In a multithreaded environment , It can greatly reduce lock conflicts . Previously, multiple threads could only compete at the same time 1 The lock , Especially in the second kill scene , The competition is so fierce , It can be described as terrible , The consequence is that most threads are waiting for the lock . Now multiple threads compete at the same time 100 The lock , Fewer waiting threads , Thus, the system throughput is improved .
notes : Segmenting the lock can improve the performance of the system , But it will also increase the complexity of the system a lot . Because it requires the introduction of additional
Routing algorithm , Cross section statistics
And so on . In a real business scenario , It needs to be considered comprehensively , Not that the lock must be segmented .
Lock timeout problem
As mentioned earlier , If the thread A Locked successfully , But because business functions take a long time , The set timeout has been exceeded , Now Redis Will automatically release the thread A Lock added .
Some friends may say : It's time out , When the lock is released, it is released , It doesn't affect the function .
answer : wrong , wrong , wrong . It actually has an impact on the function .
Usually, the purpose of locking is : To prevent access to critical resources , Abnormal data occurs
. such as : Threads A Modifying data C Value , Threads B Also modifying data C Value , If you don't control , In the case of concurrency , data C The value of will go wrong
.
In order to ensure a certain method , Or the mutuality of the code , That is, if the thread A Executed a piece of code , Other threads are not allowed to execute at the same time at a certain time , It can be used synchronized
Keyword lock .
But this kind of lock has great limitations , Only the mutual exclusion of a single node can be guaranteed . If you need to maintain mutual exclusion in multiple nodes , It needs to be used. Redis Distributed lock
.
So much bedding has been done , Now back to the point .
Assuming that thread A Add Redis Distributed lock
Code for , Include code 1 And code 2 Two pieces of code .
Because the business operations to be performed by this thread are very time-consuming , The program is executing the code 1 When the , The set timeout has expired ,Redis The lock is automatically released . The code 2 It hasn't been executed yet .
here , Code 2 Equivalent to running naked , Mutual exclusion cannot be guaranteed . If it accesses critical resources , And other threads also access the resource , Data exceptions may occur .( Access critical resources , It doesn't just mean reading , It also includes writing
)
that , How to solve this problem ?
answer : If the timeout is reached , But the business code has not been executed yet , Need to renew the lock automatically .
have access to TimerTask
class , To realize the function of automatic renewal :
Timer timer = new Timer();
timer.schedule(new TimerTask() {
@Override
public void run(Timeout timeout) throws Exception {
// Automatic renewal logic
}
}, 10000, TimeUnit.MILLISECONDS);
After getting the lock , Automatically start a scheduled task , every other 10 Second , Automatically refresh one expiration time . This mechanism is in Redisson frame
in , There is a more domineering name :watch dog
, The legendary watchdog .
Of course, the automatic renewal function , It is preferred to use lua Script
Realization , such as :
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then
redis.call('pexpire', KEYS[1], ARGV[1]);
return 1;
end;
return 0;
What should be noted is that : When implementing the automatic renewal function , You also need to set a total expiration time , Can follow Redisson bring into correspondence with , Set to 30 second . If the business code reaches the total expiration time , It's not finished , No automatic renewal .
The function of automatic renewal is to open a scheduled task after obtaining the lock , every other 10 Seconds to determine whether the lock exists , If there is , Then refresh the expiration time . If you renew 3 Time , That is to say 30 Seconds later , The business method is still not finished , It won't be renewed .
The problem of master-slave replication
The content that has spent so much time on the introduction , For single Redis There is no problem with examples .
but, If Redis There are multiple instances . such as : I've been a master and a slave , Or sentinel mode , be based on Redis The function of distributed lock , There will be problems
.
What's the problem ?
hypothesis redis Now use the master-slave mode ,1 individual master node ,3 individual slave node .master The node is responsible for writing data ,slave The node is responsible for reading data .
It was supposed to be harmonious coexistence , It's all right .Redis Lock operation , All in master on , After locking successfully , Then asynchronously synchronize to all slave.
And then one day ,master Nodes for some irreversible reason , Hang up
.
such I need to find one slave Upgrade to new master node
, If slave1 Was elected .
If There is a lock A It's sad , Just locked successfully master I hung up. , I haven't had time to sync to slave1
.
This will lead to new master Locks in nodes A lost . Back , If there is a new thread , Using locks A Lock , You can still succeed , Distributed lock failed .
that , How to solve this problem ?
answer :Redisson Framework to solve this problem , Provides a specialized class :RedissonRedLock, Used Redlock Algorithm .
RedissonRedLock The idea to solve the problem is as follows :
- We need to build several sets of independent Redis Environmental Science , If we build here 5 set .
- Each environment has a
Redisson node
node .- Multiple
Redisson node
Nodes make upRedissonRedLock
.- Environment contains : stand-alone 、 Master-slave 、 Sentinel and cluster mode , It can be one or more mixtures .
Take master-slave as an example , The architecture is as follows :
RedissonRedLock
The locking process is as follows :
- Get all
Redisson node
Node information , Loop to allRedisson node
Node locking , hypothesisThe number of nodes is N
, In the exampleN be equal to 5
.- If in N Among the nodes , Yes
N/2 + 1
Node locking succeeded , So the wholeRedissonRedLock Lock
yessuccess
Of .- If in N Among the nodes ,
Less than N/2 + 1
Nodes locked successfully , So the wholeRedissonRedLock Lock
yesFailure
Of .- If the total time of locking each node is found in the middle , Greater than or equal to the set maximum waiting time , Then directly return to failure .
As can be seen from the above , Use Redlock Algorithm
, It can really solve the problem of , If master The node is hung , Problems that cause distributed locks to fail .
But it also raises some new problems , such as :
- Need to build more than one environment , Apply for more resources , We need to evaluate the cost and cost performance .
- If there is N individual redisson node node , It needs to be locked N Time , At least it needs to be locked N/2+1 Time , not have understood until then redlock Whether the locking is successful . obviously , Added additional time cost , It's a bit of a loss .
thus it can be seen , In the actual business scenario , Especially in highly concurrent business ,RedissonRedLock
In fact, it is not used much .
In a distributed environment ,CAP It's impossible to get around .
CAP In a distributed system :
- Uniformity (Consistency)
- Usability (Availability)
- Partition tolerance (Partition tolerance)
These three elements can only achieve two points at most , It's impossible to combine the three .
If the actual business scenario , What is more needed is to ensure data Uniformity
. Then please Use CP Type of distributed lock
, such as :zookeeper
, It is disk based , The performance may not be that good , But data is generally not lost .
If your actual business scenario , What is more needed is to ensure high data Usability
. Then please Use AP Type of distributed lock
, such as :Redis
, It's memory based , Good performance , But there's a risk of losing data .
Actually , In most of our distributed business scenarios , Use Redis Distributed locks are enough , Really, don't be too serious . Because the data is inconsistent , It can be solved through the final consistency solution . But if the system is not available , For users, it is critical hit with 10000 damage .
边栏推荐
- Scale SVG to container without mask / crop
- Dart training and sphygmomanometer inflation pump power control DPC
- 最新接口自动化面试题
- [applet project development -- Jingdong Mall] classified navigation area of uni app
- Mnasnet learning notes
- ssh配置免密登录时报错:/usr/bin/ssh-copy-id: ERROR: No identities found 解决方法
- 股票开户安全吗?上海股票开户步骤。
- So easy 将程序部署到服务器
- Servlet [first introduction]
- 鼠标悬停效果一
猜你喜欢
产业互联网中,「小」程序有「大」作为
[small program project development -- Jingdong Mall] the home page commodity floor of uni app
Record a service deployment failure troubleshooting
Restcloud ETL WebService data synchronization to local
So easy 将程序部署到服务器
Share Creators萌芽人才培養計劃來了!
Youmeng (a good helper for real-time monitoring of software exceptions: crash) access tutorial (the easiest tutorial for Xiaobai with some foundation)
第03章_用戶與權限管理
【机器学习】向量化计算 -- 机器学习路上必经路
彻底解决Lost connection to MySQL server at ‘reading initial communication packet
随机推荐
Evaluation of the entry-level models of 5 mainstream smart speakers: apple, Xiaomi, Huawei, tmall, Xiaodu, who is better?
Introduction to core functions of webrtc -- an article on understanding SDP PlanB unifiedplan (migrating from PlanB to unifiedplan)
Mouse over effect 8
Scale SVG to container without mask / crop
JS to find duplicate elements in two arrays
Multithreaded printing
Od modify DLL and exe pop-up contents [OllyDbg]
通信协议——分类及其特征介绍
Résumé des styles de développement d'applets Wechat
MySQL index --01--- design principle of index
Detailed explanation of pointer array and array pointer (comprehensive knowledge points)
servlet【初识】
STM32——一线协议之DS18B20温度采样
Ipmitool download address and possible problems during compilation and installation
PTA 1016
Mouse over effect IV
鼠标悬停效果五
Restcloud ETL WebService data synchronization to local
Xception learning notes
【PR #5 A】双向奔赴(状压DP)