当前位置:网站首页>Record a failure caused by a custom redis distributed lock
Record a failure caused by a custom redis distributed lock
2022-06-27 20:25:00 【Ten years of training experience】
background
The enterprise and micro alarm group continuously sends out production environment error warnings , The core information of error reporting is as follows :
redis setNX error java.lang.NumberFormatException: For input string: "null"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
......
Locate by abnormal information , Discovery is customized in the project Redis Distributed lock error , And the exception occurred suddenly after the recent requirement was launched , And accompanied by the exception , There is also the problem of partial disorder of the business data involved in the requirements .
Problem analysis
Old rules , First post the code involved
// section
public class RedisLockAspect{
public void around(ProceedingJoinPoint pjp) {
String key = "...";
try {
// Blocking , Until the lock is acquired
while (!JedisUtil.lock(key, timeOut)) {
Thread.sleep(10);
}
// Execute business logic
pjp.proceed();
}finally {
JedisUtil.unLock(key);
}
}
}
The above is customized Redis Facets of distributed locks , Don't look at the details , Just look at the overall logic , No big problem , Let's look at the actual locking method .
public class JedisUtil{
public static boolean lock(String key, long timeOut){
long currentTimeMillis = System.currentTimeMillis();
long newExpireTime = currentTimeMillis + timeOut;
RedisConnection connection = null;
try {
connection = getRedisTemplate().getConnectionFactory().getConnection();
Boolean setNxResult = connection.setNX(key.getBytes(StandardCharsets.UTF_8), String.valueOf(newExpireTime).getBytes(StandardCharsets.UTF_8));
// Location 1
if(setNxResult){
expire(key,timeOut, TimeUnit.MILLISECONDS);
return true;
}
// Location 2
Object objVal = getRedisTemplate().opsForValue().get(key);
String currentValue = String.valueOf(objVal);
// Location 3, The abnormal position is if In judgment Long.parseLong(currentValue),currentValue by null String
if (currentValue != null && Long.parseLong(currentValue) < currentTimeMillis) {
String oldExpireTime = (String) getAndSet(key, String.valueOf(newExpireTime));
if (oldExpireTime != null && oldExpireTime.equals(currentValue)) {
return true;
}
}
}
return false;
}
public static void unLock(String key){
getRedisTemplate().delete(key);
}
}
Experienced boss sees this code , I guess I can't help being rude , But let's leave it alone , Look at the wrong position first .
Abnormal information can be seen ,currentValue The value of is string “null”, namely String.valueOf(objVal) Medium objVal The object is null, That is to say Redis in ,key Corresponding value non-existent , Now think about it ,key Corresponding value non-existent , There are only two cases :
- key Be actively deleted ;
- key Out of date .
Continue to follow the code up , It is found that setNx command , And back to setNxResult Indicates whether it was successful . Normally , When setNxResult by false When , Locking failed , At this point, the code should not go down , But in this code , But continue to go down ! Asked relevant colleagues , It is said that it is to make a reentrant lock ......( Weak roast , But the re-entry lock doesn't work like this ...)
In fact, this analysis , You can already know what caused the abnormal fault , That's what it says ,key Be actively deleted 、key Due to expiration , Let's assume that there are two threads , To the same key Lock , Corresponding to the above two situations respectively :
key Being deleted voluntarily , Occurs after the distributed locking logic is executed , call unlock Method , See above RedisLockAspect Class finally part , Here's the picture :
key Past due , Mainly after the thread is locked and the expiration time is set , The time spent executing business code exceeds the set lock expiration time , And before the lock expires , Lock not renewed :
Solution
From the code above , It's not simple anymore Long.parseLong("null") Problem. , This is the whole thing Redis The problem of distributed lock implementation , And the distributed lock is widely used in the whole project , It is conceivable that the problem is very serious , If it's just a solution Long.parseLong("null") The problem of , There is no doubt that it is tickling between boots , It doesn't make any sense .
In general , Customize Redis Distributed locks are prone to the following problems :
- setNx Lock release problem ;
- setNx Expire Atomic question ;
- Lock expiration problem ;
- Multi thread lock release problem ;
- Reentrant problem ;
- Spinlock problem in case of a large number of failures ;
- Lock data synchronization under master-slave architecture ;
Combined with the above fault codes , You can find Redis The implementation of distributed locks is hardly correct Redis Consider the distributed lock problem , The following are the main problems and corresponding solutions :
- setNx and expire Atomic manipulation : Use Lua Script , In a Lua In the script command , perform setNx And expire command , Guaranteed atomicity ;
- Lock expiration problem : To prevent the lock from automatically expiring , Before the lock expires , Periodically renew the lock expiration time .
- Reentrant problem : The granularity of reentrant design needs to reach the thread level , Thread uniqueness can be added to the lock id.
- Lock spin problem : Reference resources JDK in AQS Design , To achieve the maximum waiting time when acquiring a lock .
For the problems in the project and the solution implementation of each problem ,baidu There are a lot of references at once , No more about . At present, the more mature comprehensive solution is to use Redisson client , The following is simple pseudocode demo:
public class RedisLockAspect{
@Autowired
private Redisson redisson;
public void around(ProceedingJoinPoint pjp) {
String key = "...";
Long waitTime = 3000L;
// Get the lock
RLock lock = redisson.getLock(key);
boolean lockSuccess = false;
try {
// Lock and set timeout , Prevent infinite spin . The watchdog function is enabled by default ( Automatically renew locks )
lockSuccess = lock.tryLock(waitTime);
// Execute business logic
pjp.proceed();
}finally {
// Unlock , Prevent other thread locks from being released
if (lock.isLocked() && lock.isHeldByCurrentThread() && lockSuccess){
lock.unlock();
}
}
}
}
Use Redisson It can quickly solve the problems in the current project Redis Problems with distributed locks . besides , about Redis Lock problem caused by data synchronization in master-slave architecture , Corresponding solutions RedLock, The corresponding implementation is also provided .
See the official documents for more information github.com/liulongbiao…
summary
For distributed locks , The realizable scheme is far more than Redis This implementation approach , For example, based on Zookeeper、 be based on Etcd And so on , But for the purpose , They all go the same way , The point is , How to safely 、 Use these solutions correctly , Make sure the business is normal .
For the R & D team , For similar problems , Technical partners need to be trained , Keep improving technology , We need to pay more attention to codereview Work , Identify risks in a timely manner , Avoid serious loss caused by failure ( This failure caused dirty data repair to take more than a week ).
Fear technology , Loyal to business .
边栏推荐
- Golang map 并发读写问题源码分析
- Kotlin微信支付回调后界面卡死并抛出UIPageFragmentActivity WindowLeaked
- Grasp the detailed procedure of function call stack from instruction reading
- 1025 PAT Ranking
- Linux系统ORACLE 19C OEM监控管理
- UE4 realizes long press function
- SQL报了一个不常见的错误,让新来的实习生懵了
- 嵌入式软件开发中必备软件工具
- 数据库索引
- Connection integration development theme month | drivers of enterprise master data governance
猜你喜欢
![[required reading for high-quality products] sub query of Oracle database in Linux system](/img/eb/ddbbdbca4c9c8a69691cb1062c2ecc.png)
[required reading for high-quality products] sub query of Oracle database in Linux system

智联招聘的基于 Nebula Graph 的推荐实践分享

CSDN 技能樹使用體驗與產品分析(1)
一段时间没用思源,升级到最新的 24 版后反复显示数据加密问题

谈谈我写作生涯的画图技巧

元宇宙虚拟数字人离我们更近了|华锐互动

database engine

安全才是硬道理,沃尔沃XC40 RECHARGE

Data intelligence enters the "deep water area", and data governance is the key

数仓的字符截取三胞胎:substrb、substr、substring
随机推荐
连接集成开发专题月 | 企业主数据治理的驱动因素
[debug] platform engineering interface debugging
数智化进入“深水区”,数据治理是关键
Batch insert data using MySQL bulkloader
难怪大家丢掉了postman而选择 Apifox
[required reading for high-quality products] sub query of Oracle database in Linux system
PyCharm常用功能 - 断点调试
Longitude and latitude analysis
数据库事务
Database lock problem
linux系统笑着玩Oracle数据库多表查询-连接查询
数仓的字符截取三胞胎:substrb、substr、substring
Pfsense plus22.01 Chinese customized version release
ABAP essay - get new crown data through API
SQL reported an unusual error, which confused the new interns
MASS幸运哈希游戏系统开发丨冲突解决方法(代码分析)
[login interface]
Common shell script commands (III)
MongoDB简介及典型应用场景
低代码开发平台是什么?为什么现在那么火?