当前位置：网站首页>8 pits of redis distributed lock

8 pits of redis distributed lock

2022-07-01 03:01:00 【Lingxiyun】

List of articles

Non atomic operation
Forgot to release the lock
Released someone else's lock
A large number of failed requests
Lock reentry problem
Lock competition problem
- Read-write lock
- Lock segment
Lock timeout problem
The problem of master-slave replication

In distributed systems , because Redis Distributed lock Compared with simpler and more efficient , Became the first of the distributed locks , It is used in many actual business scenarios .

But it doesn't mean it's used Redis Distributed lock , You can have a good sleep , If you don't use it well or use it right , It will also lead to some unexpected problems .

Let's talk today Redis Some pits of distributed locks , Give a reference to friends in need .

Non atomic operation

Use Redis Distributed locks for , The first thing that comes to mind may be setNx command .

if (jedis.setnx(lockKey, val) == 1) {
    
   jedis.expire(lockKey, timeout);
}

Easy to , You can write the code by dividing three by five .

This code can indeed lock successfully , but Have you found any problems ？
The lock operation is separated from the subsequent setting timeout , It's not an atomic operation .
If the lock is successful , But set up Timeout failed , The lockKey It will never fail . If In high concurrency scenarios , A large number of lockKey Locked successfully , But it won't fail , May directly lead to redis Memory space is not foot .

that , Is there a lock command that guarantees atomicity ？

The answer is ： Yes , Please see the following .

Forgot to release the lock

It's about using setNx Command lock Operation and setting timeout are separated , It's not an atomic operation .

And in the Redis There is also set command , This command can specify multiple parameters .

String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) {
    
    return true;
}
return false;

lockKey： Identification of lock
requestId： request id
NX： Only if the bond doesn't exist , To set the key .
PX： Set the expiration time of the key to millisecond millisecond .
expireTime： Expiration time

set Command is atomic operation , Lock and set timeout , One command can be easily done .

Use set Command lock , There seems to be no problem on the surface . But if you think about it , After lock up , Each time, the timeout period must be reached before releasing the lock , Is it a little unreasonable ？ After the lock , If the lock is not released in time , There will be many problems .

A more reasonable use of distributed locks is ：

Lock by hand
Business operations
Release the lock manually
If the manual release of the lock fails , The timeout period is reached ,redis Will automatically release the lock .

So here comes the question , How to release the lock ？

The pseudocode is as follows ：

try{
    
  String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
  if ("OK".equals(result)) {
    
      return true;
  }
  return false;
} finally {
    
    unlock(lockKey);
}

You need to catch exceptions of business code , And then in finally Middle release lock . In other words ： Whether the code execution succeeds or fails , All need to release the lock .

here , Some friends may ask ： If you happen to release the lock , The system was restarted , Or the network is disconnected , Or the computer room is broken , Failure to release the lock will also result ？

This is a good question , Because this small probability problem does exist .

But remember setting the timeout for the lock before ？ Even if an exception occurs, the lock release fails , But it's time to set the timeout , The lock will still be Redis Automatic release .

but Only in finally Middle release lock , Is that enough? ？

Released someone else's lock

Be kind to others , Answer the above questions first ： Only in finally Middle release lock , Of course, it's not enough , Because the position of releasing the lock , It's still not right .

What's wrong ？

answer ： In a multithreaded scenario , It may happen that someone else's lock is released .

Some friends may retort ： Suppose in a multithreaded scenario , Threads A Got the lock , But if the thread A No release lock , here , Threads B You can't get the lock , Why did you release the lock of others ？

answer ： If a thread A And thread B, All use lockKey Lock . Threads A Locked successfully , But because business functions take a long time , The set timeout has been exceeded . Now ,Redis It will be released automatically lockKey lock . here , Threads B Can give lockKey Locked successfully , Next, perform its business operations . Just at this time , Threads A After performing business functions , Next , stay finally Method to release the lock lockKey. This is not a problem , Threads B Lock of , Threaded A The release of the .

This is the time , Threads B Must be crying in the toilet , And there's a good word in your mouth .

that , How to solve this problem ？

I wonder if I noticed ？ In the use of set Command lock , Besides using lockKey Lock logo , One more parameter is set ：requestId, Why do you need to record requestId Well ？

answer ：requestId It's used when releasing the lock .

The pseudocode is as follows ：

if (jedis.get(lockKey).equals(requestId)) {
    
    jedis.del(lockKey);
    return true;
}
return false;

When releasing the lock , Get the value of the lock first （ The previously set value is requestId）, then Judge whether the value is the same as that set before , If it is the same, it is allowed to delete the lock , Return to success . If different , Then directly return to failure .

In other words ： I can only release the lock I added , It is not allowed to release the lock added by others .

Why do I use requestId, use userId Not line? ？

answer ： If you use userId Words , Not unique to requests , Multiple different requests , May use the same userId. and requestId It's the only thing in the world , There is no confusion of locking and releasing locks .

Besides , Use lua Script , It can also solve the problem of releasing other people's locks ：

if redis.call('get', KEYS[1]) == ARGV[1] then 
 return redis.call('del', KEYS[1]) 
else 
  return 0 
end

lua The script can ensure that querying whether the lock exists and deleting the lock are atomic operations , It's better to use it to release the lock .

Speaking of lua Script , In fact, locking operation is also recommended lua Script ：

if (redis.call('exists', KEYS[1]) == 0) then
    redis.call('hset', KEYS[1], ARGV[2], 1); 
    redis.call('pexpire', KEYS[1], ARGV[1]); 
 return nil; 
end
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1)
   redis.call('hincrby', KEYS[1], ARGV[2], 1); 
   redis.call('pexpire', KEYS[1], ARGV[1]); 
  return nil; 
end; 
return redis.call('pttl', KEYS[1]);

This is a Redisson The locking code of the frame , Well written , You can learn from .

Interesting , What other interesting things are there ？

A large number of failed requests

The above locking method seems to be ok , But if you think about it , If there is 1 Wan's request to compete for the lock at the same time , Maybe only one request is successful , The rest 9999 Every request will fail .

In the second kill scene , What's the problem ？

answer ： Every time 1 Million requests , Yes 1 A success . Again 1 Million requests , Yes 1 A success . Go on like this , Until inventory is low . This becomes a uniformly distributed second kill , It's different from what you think .

How to solve this problem ？

Besides , There's another scenario ：

such as , There are two threads uploading files to sftp, Create a directory before uploading files . Suppose that the directory name to be created by both threads is the date of the day , such as ：20210920, If you don't do any control , Create directories directly and concurrently , The second thread is bound to fail .

At this time, some friends may say ： It's not easy , Add one more Redis Distributed locks can solve the problem , Besides, judge again , If the directory already exists, do not create , You need to create a directory only if it does not exist .

The pseudocode is as follows ：

try {
    
  String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
  if ("OK".equals(result)) {
    
    if(!exists(path)) {
    
       mkdir(path);
    }
    return true;
  }
} finally{
    
    unlock(lockKey,requestId);
}  
return false;

Everything seems beautiful , But it can't stand careful deliberation .

A question from the soul ： If the second request fails to lock , Next , It's a return failure , Or return to success ？

Obviously, the second request , It must not return failure , If the return fails , This problem is still not solved . If the file has not been uploaded successfully , Returning directly to success will have a bigger problem . Have a headache , How to solve it ？

answer ： Use spinlocks .

try {
    
  Long start = System.currentTimeMillis();
  while(true) {
    
     String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
     if ("OK".equals(result)) {
    
        if(!exists(path)) {
    
           mkdir(path);
        }
        return true;
     }
     
     long time = System.currentTimeMillis() - start;
      if (time>=timeout) {
    
          return false;
      }
      try {
    
          Thread.sleep(50);
      } catch (InterruptedException e) {
    
          e.printStackTrace();
      }
  }
} finally{
    
    unlock(lockKey,requestId);
}  
return false;

At the appointed time , such as 500 In milliseconds , Spin keeps trying to lock （ To put it bluntly , It's in a dead circle , Keep trying to lock ）, If successful, go straight back to . If you fail , Then sleep 50 millisecond , Launch a new round of attempts . If the timeout expires , Not locked yet , Then directly return to failure .

ok , Learned a move , anything else ？

Lock reentry problem

We all know Redis Distributed locks are mutually exclusive . If for a key Locked , If it's time to key The corresponding lock hasn't failed yet , Use the same key Lock it up , Probably fail .

you 're right , Most scenes are OK .

Why most scenes ？

Because there are such scenes ：

Suppose in a request , You need to get a menu tree or classification tree that meets the conditions . With menu For example , This requires starting from the root node in the interface , Recursively traverse all the child nodes that meet the conditions , Then assemble it into a menu tree .
It should be noted that the menu is not static , Operating students in the background system can dynamically add 、 Modify and delete menus . To ensure that in the case of concurrency , It is possible to obtain the latest data every time , You can add Redis Distributed lock .
Add Redis Distributed lock The idea is right . But then came the question , Recursively traverses multiple times in a recursive method , The same lock is added every time . Of course, the first layer of recursion can lock successfully , But the second layer of recursion 、 The third level ... The first N layer , No, it will fail to lock ？

The pseudo code of locking in the recursive method is as follows ：

private int expireTime = 1000;

public void fun(int level,String lockKey,String requestId){
    
  try{
    
     String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
     if ("OK".equals(result)) {
    
        if(level<=10){
    
           this.fun(++level,lockKey,requestId);
        } else {
    
           return;
        }
     }
     return;
  } finally {
    
     unlock(lockKey,requestId);
  }
}

If you use it directly , Looks like there's no problem . But after the final execution of the program, it was found that , There is only one result waiting for you ： Something unusual happened .

because Start at the root node , The first layer of recursive locking succeeded , The lock hasn't been released yet , It goes directly to the second level of recursion . Because the lock is called lockKey, And the value of requestId The lock already exists , Therefore, the second level recursion probability will fail to lock , Then return to the first layer . The first layer then releases the lock normally , Then the whole recursive method directly returns .

Now , You know what's wrong ？

you 're right , The recursive method actually only performs the first level of recursion and returns , Other layer recursion failed due to locking , There's no way to execute .

So how to solve this problem ？

answer ： Use Reentrant lock .

With Redisson Framework for example , Its interior realizes the function of reentrant lock .

thus it can be seen ,Redisson stay Redis The Jianghu status of distributed lock is very high .

The pseudocode is as follows ：

private int expireTime = 1000;

public void run(String lockKey) {
    
  RLock lock = redisson.getLock(lockKey);
  this.fun(lock,1);
}

public void fun(RLock lock,int level){
    
  try{
    
      lock.lock(5, TimeUnit.SECONDS);
      if(level<=10){
    
         this.fun(lock,++level);
      } else {
    
         return;
      }
  } finally {
    
     lock.unlock();
  }
}

The code above may not be perfect , Here is just a general idea , If you have such needs , The above code is for reference only .

Next , Chat Redisson Implementation principle of reentrant lock .

Locking is mainly realized through the following script ：

if (redis.call('exists', KEYS[1]) == 0) 
then  
   redis.call('hset', KEYS[1], ARGV[2], 1);        redis.call('pexpire', KEYS[1], ARGV[1]); 
   return nil; 
end;
if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) 
then  
  redis.call('hincrby', KEYS[1], ARGV[2], 1); 
  redis.call('pexpire', KEYS[1], ARGV[1]); 
  return nil; 
end;
return redis.call('pttl', KEYS[1]);

KEYS[1]： Lock name
ARGV[1]： Expiration time
ARGV[2]：uuid + “:” + threadId, May be considered as requestId

First judge if the lock name does not exist , Lock it .
Next , Judge if the lock name and requestId Values exist , Then use hincrby The command gives the lock name and requestId The value of count , Every time 1. Pay attention to the , Here is the key to re-enter the lock , If the lock is re entered once, the value will be increased 1.
If the lock name exists , But it's not worth it requestId, The expiration time is returned .

Releasing the lock is mainly realized through the following script ：

if (redis.call('hexists', KEYS[1], ARGV[3]) == 0) 
then 
  return nil
end
local counter = redis.call('hincrby', KEYS[1], ARGV[3], -1);
if (counter > 0) 
then 
    redis.call('pexpire', KEYS[1], ARGV[2]); 
    return 0; 
 else 
   redis.call('del', KEYS[1]); 
   redis.call('publish', KEYS[2], ARGV[1]); 
   return 1; 
end; 
return nil

First judge if the lock name and requestId Value does not exist , Then return directly .
If the lock name and requestId The value is , Then re-enter the lock reduction 1.
If less 1 after , Reentry lock value The value is also greater than 0, Instructions and references , Then retry setting the expiration time .
If less 1 after , Reentry lock value The value is also equal to 0, You can delete the lock , Then send a message to inform the waiting thread to grab the lock .

Again , If the system can tolerate temporary data inconsistency , Some scenes can be unlocked , Here's just an example , This section does not apply to all scenarios .

Lock competition problem

If there are a large number of business scenarios that need to write data , Use ordinary redis There is no problem with distributed locks .

But if Some business scenarios , Fewer write operations , Instead, there are a lot of read operations . In this way, you can directly use ordinary Redis Distributed lock , Will it be a bit of a waste of performance ？

We all know , The coarser the granularity of the lock , The more intense the competition when multiple threads grab locks , The longer it takes to cause multiple thread locks to wait , The worse the performance .

therefore , promote Redis The first step in distributed lock performance , Is to reduce the granularity of the lock .

Read-write lock

as everyone knows , The purpose of locking is to ensure , Security of reading and writing data in concurrent environment , That is, there will be no data errors or inconsistencies .

But in most actual business scenarios , Generally, the frequency of reading data is much higher than that of writing data . Concurrent read operations between threads do not involve concurrency security , We don't need to add mutexes to read operations , Just read and write 、 Write concurrent operations are mutually exclusive , This can improve the performance of the system .

With Redisson Framework for example , It has realized the function of read-write lock .

Read the lock The pseudo-code is as follows ：

RReadWriteLock readWriteLock = redisson.getReadWriteLock("readWriteLock");
RLock rLock = readWriteLock.readLock();
try {
    
    rLock.lock();
    // Business operations 
} catch (Exception e) {
    
    log.error(e);
} finally {
    
    rLock.unlock();
}

Write lock The pseudo-code is as follows ：

RReadWriteLock readWriteLock = redisson.getReadWriteLock("readWriteLock");
RLock rLock = readWriteLock.writeLock();
try {
    
    rLock.lock();
    // Business operations 
} catch (InterruptedException e) {
    
   log.error(e);
} finally {
    
    rLock.unlock();
}

Separate the read lock from the write lock , The biggest benefit is to improve the performance of read operations , Because reading and reading are shared , There is no mutex . But in the actual business scenario , Most data operations are read operations . therefore , If you improve the performance of read operations , It will also improve the performance of the whole lock .

The following summarizes the characteristics of a read-write lock ：

Reading and reading are shared , Not mutually exclusive
Read and write are mutually exclusive
Write and write are mutually exclusive

Lock segment

Besides , In order to reduce the granularity of locks , The more common way is to lock the big lock ： piecewise .

stay java in ConcurrentHashMap, Is to divide the data into 16 paragraph , Each section has a separate lock , And the data in different lock segments do not interfere with each other , In order to improve the performance of the lock .

Put it in the actual business scenario , You can do this ：

For example, in the scenario of second kill and inventory deduction , There are... In stock now 2000 A commodity , Users can second kill . To prevent oversold , Usually , You can lock the inventory . If there is 1W Users compete for the same lock , Obviously, the system throughput will be very low .
In order to improve system performance , Sure Segment inventory , such as ： It is divided into 100 paragraph , So each paragraph has 20 A commodity can participate in the second kill .
In the process of second kill , First put the user id obtain hash value , Then divide by 100 modulus . Die for 1 User access to page 1 Segment inventory , Die for 2 User access to page 2 Segment inventory , Die for 3 User access to page 3 Segment inventory , And so on , In the end, the module is 100 User access to page 100 Segment inventory .

In this way , In a multithreaded environment , It can greatly reduce lock conflicts . Previously, multiple threads could only compete at the same time 1 The lock , Especially in the second kill scene , The competition is so fierce , It can be described as terrible , The consequence is that most threads are waiting for the lock . Now multiple threads compete at the same time 100 The lock , Fewer waiting threads , Thus, the system throughput is improved .

notes ： Segmenting the lock can improve the performance of the system , But it will also increase the complexity of the system a lot . Because it requires the introduction of additional Routing algorithm , Cross section statistics And so on . In a real business scenario , It needs to be considered comprehensively , Not that the lock must be segmented .

Lock timeout problem

As mentioned earlier , If the thread A Locked successfully , But because business functions take a long time , The set timeout has been exceeded , Now Redis Will automatically release the thread A Lock added .

Some friends may say ： It's time out , When the lock is released, it is released , It doesn't affect the function .
answer ： wrong , wrong , wrong . It actually has an impact on the function .

Usually, the purpose of locking is ： To prevent access to critical resources , Abnormal data occurs . such as ： Threads A Modifying data C Value , Threads B Also modifying data C Value , If you don't control , In the case of concurrency , data C The value of will go wrong .

In order to ensure a certain method , Or the mutuality of the code , That is, if the thread A Executed a piece of code , Other threads are not allowed to execute at the same time at a certain time , It can be used synchronized Keyword lock .

But this kind of lock has great limitations , Only the mutual exclusion of a single node can be guaranteed . If you need to maintain mutual exclusion in multiple nodes , It needs to be used. Redis Distributed lock .

So much bedding has been done , Now back to the point .

Assuming that thread A Add Redis Distributed lock Code for , Include code 1 And code 2 Two pieces of code .

Because the business operations to be performed by this thread are very time-consuming , The program is executing the code 1 When the , The set timeout has expired ,Redis The lock is automatically released . The code 2 It hasn't been executed yet .

here , Code 2 Equivalent to running naked , Mutual exclusion cannot be guaranteed . If it accesses critical resources , And other threads also access the resource , Data exceptions may occur .（ Access critical resources , It doesn't just mean reading , It also includes writing ）

that , How to solve this problem ？

answer ： If the timeout is reached , But the business code has not been executed yet , Need to renew the lock automatically .

have access to TimerTask class , To realize the function of automatic renewal ：

Timer timer = new Timer(); 
timer.schedule(new TimerTask() {
    
    @Override
    public void run(Timeout timeout) throws Exception {
    
      // Automatic renewal logic 
    }
}, 10000, TimeUnit.MILLISECONDS);

After getting the lock , Automatically start a scheduled task , every other 10 Second , Automatically refresh one expiration time . This mechanism is in Redisson frame in , There is a more domineering name ：watch dog, The legendary watchdog .

Of course, the automatic renewal function , It is preferred to use lua Script Realization , such as ：

if (redis.call('hexists', KEYS[1], ARGV[2]) == 1) then 
   redis.call('pexpire', KEYS[1], ARGV[1]);
  return 1; 
end;
return 0;

What should be noted is that ： When implementing the automatic renewal function , You also need to set a total expiration time , Can follow Redisson bring into correspondence with , Set to 30 second . If the business code reaches the total expiration time , It's not finished , No automatic renewal .

The function of automatic renewal is to open a scheduled task after obtaining the lock , every other 10 Seconds to determine whether the lock exists , If there is , Then refresh the expiration time . If you renew 3 Time , That is to say 30 Seconds later , The business method is still not finished , It won't be renewed .

The problem of master-slave replication

The content that has spent so much time on the introduction , For single Redis There is no problem with examples .

but, If Redis There are multiple instances . such as ： I've been a master and a slave , Or sentinel mode , be based on Redis The function of distributed lock , There will be problems .

What's the problem ？

hypothesis redis Now use the master-slave mode ,1 individual master node ,3 individual slave node .master The node is responsible for writing data ,slave The node is responsible for reading data .

It was supposed to be harmonious coexistence , It's all right .Redis Lock operation , All in master on , After locking successfully , Then asynchronously synchronize to all slave.

And then one day ,master Nodes for some irreversible reason , Hang up .

such I need to find one slave Upgrade to new master node , If slave1 Was elected .

If There is a lock A It's sad , Just locked successfully master I hung up. , I haven't had time to sync to slave1.

This will lead to new master Locks in nodes A lost . Back , If there is a new thread , Using locks A Lock , You can still succeed , Distributed lock failed .

that , How to solve this problem ？

answer ：Redisson Framework to solve this problem , Provides a specialized class ：RedissonRedLock, Used Redlock Algorithm .

RedissonRedLock The idea to solve the problem is as follows ：

We need to build several sets of independent Redis Environmental Science , If we build here 5 set .
Each environment has a Redisson node node .
Multiple Redisson node Nodes make up RedissonRedLock.
Environment contains ： stand-alone 、 Master-slave 、 Sentinel and cluster mode , It can be one or more mixtures .

Take master-slave as an example , The architecture is as follows ：

RedissonRedLock The locking process is as follows ：

Get all Redisson node Node information , Loop to all Redisson node Node locking , hypothesis The number of nodes is N, In the example N be equal to 5.
If in N Among the nodes , Yes N/2 + 1 Node locking succeeded , So the whole RedissonRedLock Lock yes success Of .
If in N Among the nodes , Less than N/2 + 1 Nodes locked successfully , So the whole RedissonRedLock Lock yes Failure Of .
If the total time of locking each node is found in the middle , Greater than or equal to the set maximum waiting time , Then directly return to failure .

As can be seen from the above , Use Redlock Algorithm , It can really solve the problem of , If master The node is hung , Problems that cause distributed locks to fail .

But it also raises some new problems , such as ：

Need to build more than one environment , Apply for more resources , We need to evaluate the cost and cost performance .
If there is N individual redisson node node , It needs to be locked N Time , At least it needs to be locked N/2+1 Time , not have understood until then redlock Whether the locking is successful . obviously , Added additional time cost , It's a bit of a loss .

thus it can be seen , In the actual business scenario , Especially in highly concurrent business ,RedissonRedLock In fact, it is not used much .

In a distributed environment ,CAP It's impossible to get around .

CAP In a distributed system ：
Uniformity （Consistency）
Usability （Availability）
Partition tolerance （Partition tolerance）

These three elements can only achieve two points at most , It's impossible to combine the three .

If the actual business scenario , What is more needed is to ensure data Uniformity . Then please Use CP Type of distributed lock , such as ：zookeeper, It is disk based , The performance may not be that good , But data is generally not lost .

If your actual business scenario , What is more needed is to ensure high data Usability . Then please Use AP Type of distributed lock , such as ：Redis, It's memory based , Good performance , But there's a risk of losing data .

Actually , In most of our distributed business scenarios , Use Redis Distributed locks are enough , Really, don't be too serious . Because the data is inconsistent , It can be solved through the final consistency solution . But if the system is not available , For users, it is critical hit with 10000 damage .

原网站

版权声明
本文为[Lingxiyun]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/182/202207010256573799.html