当前位置：网站首页>Scenario interview: ten questions and ten answers about distributed locks

Scenario interview: ten questions and ten answers about distributed locks

2022-07-05 21:33:00 【InfoQ】

Interviewer asked ： In a product iteration , The product manager puts forward a new demand , It is required to send a blessing message at 10 a.m. on the user's birthday , How would you implement this function ？

The candidate

： This requirement is a typical scenario for timed tasks , Use the scheduled task to scan the list of qualified users at the specified time point , And call the interface of sending SMS circularly .

interviewer

： good , This scheduled task service will deploy at least two instances , To avoid a single point of failure , How to avoid duplicate messages caused by two instances sending messages to the same user at the same time ？

The candidate

： The essence of this problem is that only one instance can run a scheduled task at a time , It is a typical distributed locking scenario .

interviewer

： So why do we need distributed locks ？

The candidate

： The distributed lock is actually an extension of the single machine lock in the distributed scenario , Before explaining why distributed locks are needed , Let me first briefly introduce the concept of lower lock , Lock is the basic primitive of operating system , It is used for concurrency control , Can ensure that in many CPU 、 In a multithreaded environment , At a certain point in time , Only one thread can enter

Critical area code

, So as to ensure the consistency of operation data in the critical area ; When extending the usage scenario to a distributed environment , That is, across machines and processes , Distributed locks , In essence, it is to solve the problem of accessing the critical area code between processes , The code of sending SMS to be executed in the above timing task is the code of critical area .

interviewer

： What characteristics does a relatively complete distributed lock need ？

The candidate

： Implement a distributed lock , First of all, determine where the lock is stored ？ For a single lock, we can use different values of an integer in memory to represent the state of locking or unlocking ; For distributed locks , Because this lock needs to be accessed by processes on different machines , therefore , Generally, the lock is stored in shared storage , For example, a relational database 、 Distributed cache, etc . Determine the storage position of the lock , Next We need to consider the core features of distributed locks , To sum up, there are mainly ：

Mutual exclusivity
： Under the condition of distributed high concurrency , Only one thread can obtain a lock at a time .

Timeout mechanism
： The lock service and the lock request service are scattered on different machines , They communicate with each other through the network , So we need to use the timeout mechanism , To avoid node failure or network exceptions that obtain locks , The lock it holds cannot be released , There is a deadlock situation .

Reentrancy
： If a thread of a node has obtained this lock , Then this thread is holding the lock , You can successfully acquire the lock again .

Fairness
： According to the concrete implementation , Locks can be divided into fair locks and unfair locks , Suppose there are currently three threads competing to agree to lock , Threads A Successful lock acquisition , Threads B And thread C Failed to get and block waiting A Release the lock , And thread B Ahead of thread C Block waiting , So in the thread A After releasing the lock , This lock will be locked by the thread with the longest waiting time B get , On a first come, first served basis , Then this lock is a fair lock , The opposite is unfair lock .

Complete lock interface
： That is, the interface definition of the lock , The locking operation should also provide a blocking interface lock And non blocking interface tryLock, Unlocking operation shall be provided release Interface .

interviewer

： For the timeout mechanism mentioned above , If the node holding the lock processes the code of the critical area, it is time-consuming , The time required is greater than the timeout of the lock , At this time, there will be a critical area, and the lock will be released before the code is processed , Eventually, other nodes can acquire the lock and execute the critical area code , The problem that causes mutual exclusion to fail , How to solve it ？

The candidate

： This problem can be solved by lock renewal , That is to say, another thread will continuously extend the timeout of the lock through the heartbeat mechanism .

interviewer

： good , How to realize the reentrancy of the lock ？

The candidate

： Since we are implementing the same thread, we can repeatedly acquire a lock , therefore , After the lock is added successfully , We need to record the node that obtained the lock id+ Threads id, Bind the combination of the two as a unique identifier to the lock ; And before the locking logic is executed , Add a judgment , If the currently requested node id+ Threads id It is the same as the one currently holding the lock , Then directly return to success , Otherwise, execute the normal locking logic .

interviewer

： What are the implementation methods of distributed locks ？

The candidate

： There are three mainstream implementation methods for distributed locks , Namely ：

Based on relational database （ for example MySQL）： Create a table to record shared resource information , Make uniqueness constraints on critical resources , Lock a resource by adding a record , Release the lock by deleting the record .

Distributed cache based Redis ： By calling Redis function SETNX+EXPIRE Realization , At the same time, in order to ensure atomicity , Can pass Lua Script to achieve lock settings and expiration time atomicity . stay Redis 2.6.12 After version SETNX Added expiration time parameter , You can also use this overloaded method directly .SETNX Method returns 1 Indicates acquisition key The lock represented , return 0 Indicates failed to acquire lock

Based on distributed coordination service ZooKeeper ： At the corresponding persistent node shared_lock For each process, create a
Temporary order node
, Then check which process has the smallest node number , The most novel Ming style was first created , So get the lock , otherwise , Wait for the lowest numbered node to release the lock .

interviewer

： The advantages and disadvantages of these three implementation methods 、 How to use the scene ？

The candidate

： The advantage of database implementation is simple , The disadvantage is that it is prone to a single point of failure , The deadlock problem , And the performance and reliability are low ;Redis The advantage of the implementation method is high performance , It can be deployed across clusters , No single point of failure ; The disadvantage is that the control of lock failure time is unstable , Reliability is not as good as that based on ZooKeeper Way to achieve high ;ZooKeeper The advantage of this method is that there is no single point of failure 、 The deadlock problem , High reliability ; The disadvantage is that the performance is not Redis High mode . From the use scenario , The database mode is suitable for scenarios with small system concurrency and low performance requirements ;Redis This method is suitable for scenarios with high concurrency and high performance requirements ;ZooKeeper This method is applicable to most scenes （ In addition to scenes that require extremely high performance ）.

interviewer

： If it's in Redis In a cluster environment , because Redis When the cluster data is synchronized to each node, it is asynchronous , If in Master After the node obtains the lock , Before synchronizing to other nodes ,Master Node crashed , At this time, the newly elected Master Nodes can still acquire locks , This will cause multiple application instances to obtain locks at the same time , The mutex of the lock is invalid , How to solve this problem ？

The candidate

： It does exist , therefore , Generally based on Redis We recommend using the distributed lock implemented by the cluster RedLock Algorithm , Open source Reddison The function library implements this algorithm . Use a single instance to obtain locks on different nodes , And every time you get a lock, you have a timeout , If the request times out , Think of it as Redis Node unavailable . When the application service successfully obtains the lock Redis More than half of nodes （N/2+1,N For the node number ) when , And the actual time spent acquiring the lock does not exceed the expiration time of the lock , The lock is obtained successfully . Once the lock is obtained successfully , The time to release the lock will be recalculated , This time is the time taken to release the lock minus the time taken to acquire the lock ; And if the lock acquisition fails , The client will still release the node that has successfully obtained the lock .

interviewer

： The usage scenarios of distributed lock , In addition to the scheduled tasks we mentioned above , What other common usage scenarios ？

The candidate

： In the second kill , In order to prevent oversold inventory, you can use .

Aside

： About distributed locks , There is a good open source implementation ,

lock4j

, be based on Spring AOP Declarative and programmatic distributed locks , Support RedisTemplate、Redisson、Zookeeper etc. , Other distributed lock implementations can also be extended .