当前位置:网站首页>Oom caused by improper use of multithreading
Oom caused by improper use of multithreading
2022-07-26 13:20:00 【Sharp surge】
Description of the accident
from 6 spot 32 Start a small number of users to access App Home page access exception will appear , To 7 spot 20 The sub home page service is not available on a large scale ,7 spot 36 Solve problems by problems .
The whole process
6:58 Discovery alarm , At the same time, it is found that the network is busy on the feedback home page of the group , Considering that the store list service was launched a few nights ago , So consider rolling back the code to deal with the problem urgently .
7:07 Start contacting XXX View and solve problems .
7:36 Code rollback finished , Service is back to normal .
The root cause of the accident
Accident code simulation :
public static void test() throws InterruptedException, ExecutionException {
Executor executor = Executors.newFixedThreadPool(3);
CompletionService<String> service = new ExecutorCompletionService<>(executor);
service.submit(new Callable<String>() {
@Override
public String call() throws Exception {
return "HelloWorld--" + Thread.currentThread().getName();
}
});
}
The root is ExecutorCompletionService It didn't call take、poll Method .
The correct wording is as follows :
public static void test() throws InterruptedException, ExecutionException {
Executor executor = Executors.newFixedThreadPool(3);
CompletionService<String> service = new ExecutorCompletionService<>(executor);
service.submit(new Callable<String>() {
@Override
public String call() throws Exception {
return "HelloWorld--" + Thread.currentThread().getName();
}
});
service.take().get();
}
One line of code causes a murder , And it's not easy to find . because OOM It's a process of slow memory growth , A little carelessness will ignore . If the number of calls to this code block is small , It's likely that a thunderstorm will occur in a few days or even months .
The operator rollback or restart the server is indeed the fastest way . But if you don't analyze it quickly afterwards OOM Code for , And unfortunately, the rollback version also comes with OOM Code , It's sad . As I said just now , The flow is small 、 Rolling back or restarting can release memory ; But when the flow is large , Unless you roll back to the normal version , otherwise GG.
Explore the root cause of the problem
For better understanding ExecutorCompletionService Of “ tricks ”, We use it ExecutorService For comparison , It can make us better understand what scenarios to use ExecutorCompletionService.
First look at ExecutorService Code ( It is recommended to run by yourself after downloading )
public static void test1() throws Exception{
ExecutorService executorService = Executors.newCachedThreadPool();
ArrayList<Future<String>> futureArrayList = new ArrayList<>();
System.out.println(" The company asked you to inform everyone of the dinner You drive to pick someone up ");
Future<String> future10 = executorService.submit(() -> {
System.out.println(" President : I have a large size at home I've had slow diarrhea recently To squat 1 It takes hours to get out Come and pick me up later ");
TimeUnit.SECONDS.sleep(10);
System.out.println(" President :1 Hour I'm finished with the tuba . You pick it up ");
return " The president has finished the tuba ";
});
futureArrayList.add(future10);
Future<String> future3 = executorService.submit(() -> {
System.out.println(" Research and development : I have a large size at home I'm faster To squat 3 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(3);
System.out.println(" Research and development :3 minute I'm finished with the tuba . You pick it up ");
return " The research and development is finished, and the large size ";
});
futureArrayList.add(future3);
Future<String> future6 = executorService.submit(() -> {
System.out.println(" Middle management : I have a large size at home To squat 10 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(6);
System.out.println(" Middle management :10 minute I'm finished with the tuba . You pick it up ");
return " The middle management has finished the big size ";
});
futureArrayList.add(future6);
TimeUnit.SECONDS.sleep(1);
System.out.println(" It's all over , Wait to answer .");
try {
for (Future<String> future : futureArrayList) {
String returnStr = future.get();
System.out.println(returnStr + ", You pick him up ");
}
Thread.currentThread().join();
} catch (Exception e) {
e.printStackTrace();
}
}
Three tasks , The execution time of each task is 10s、3s、6s . adopt JDK Thread pool submit Submit these three Callable Type of task .
First step : The main thread submits three tasks to the thread pool , Return the corresponding Future Put it in List Save it inside , And then execute “ It's all over , Wait to answer .” This line outputs the statement ;
The second step : Execute in a loop
future.get()operation , Block waiting .
The final result is as follows :

Inform the president first , Also take the president first It's enough to wait 1 Hours , After receiving the president, go to pick up R & D and middle management , Even though they've already done it , I have to wait for the president to go to the bathroom ~~
The most time-consuming -10s Asynchronous tasks enter first list perform . So get this in the loop 10 s When the mission results ,get The operation will be blocked all the time , until 10s The asynchronous task is completed . Even if 3s、5s The task of has long been completed, but it must also be blocked , wait for 10s Mission accomplished .
See here , In particular, students who do gateway business may resonate . Generally speaking , gateway RPC Will call downstream N Multiple interfaces , Here's the picture :

If they all follow ExecutorService This way, , And it happens that the interfaces called by the first few tasks take a long time , While blocking waiting , Then it's more sad . therefore ExecutorCompletionService Come out in response to the situation . It acts as a reasonable controller of task threads ,“ Mission Planner ” Is worthy of its name .
The same scene ExecutorCompletionService Code :
public static void test2() throws Exception {
ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService<String> completionService = new ExecutorCompletionService<>(executorService);
System.out.println(" The company asked you to inform everyone of the dinner You drive to pick someone up ");
completionService.submit(() -> {
System.out.println(" President : I have a large size at home I've had slow diarrhea recently To squat 1 It takes hours to get out Come and pick me up later ");
TimeUnit.SECONDS.sleep(10);
System.out.println(" President :1 Hour I'm finished with the tuba . You pick it up ");
return " The president has finished the tuba ";
});
completionService.submit(() -> {
System.out.println(" Research and development : I have a large size at home I'm faster To squat 3 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(3);
System.out.println(" Research and development :3 minute I'm finished with the tuba . You pick it up ");
return " The research and development is finished, and the large size ";
});
completionService.submit(() -> {
System.out.println(" Middle management : I have a large size at home To squat 10 You can come out in minutes Come and pick me up later ");
TimeUnit.SECONDS.sleep(6);
System.out.println(" Middle management :10 minute I'm finished with the tuba . You pick it up ");
return " The middle management has finished the big size ";
});
TimeUnit.SECONDS.sleep(1);
System.out.println(" It's all over , Wait to answer .");
// submitted 3 Asynchronous tasks )
for (int i = 0; i < 3; i++) {
String returnStr = completionService.take().get();
System.out.println(returnStr + ", You pick him up ");
}
Thread.currentThread().join();
}
The results are as follows :

This time it's relatively efficient . Although inform the president first , But according to the speed at which everyone goes to the tuba , Who pulls first, who picks up first , You don't have to wait for the oldest president ( The first one is recommended in real life , Without waiting for the consequences of the President emmm Ha ha ha ).
Put them together and compare the output results :

The difference between the two pieces of code is very small When you get results ExecutorCompletionService Used :
completionService.take().get();
Why use take() And then again get() Well ?
Let's look at the source code :
CompletionService Interface and its implementation class
1、ExecutorCompletionService yes CompletionService Implementation class of interface

2、 Then follow ExecutorCompletionService Construction method of .
You can see that the input parameter needs to pass a thread pool object , The default queue used is LinkedBlockingQueue, However, there is another constructor that can specify the queue type , Here are two pictures , There are two constructors . Default LinkedBlockingQueue Construction method of .

Construction method of optional queue type :

3、Submit There are two ways to submit tasks , All have return values , The first one is used in our example Callable Method of type .
4、 contrast ExecutorService and ExecutorCompletionService Of submit Method can see the difference .

5、 The difference is QueueingFuture.
What is the function of this , Let's keep going :
QueueingFutureInherited fromFutureTask, And the position marked by the red line , Rewrote done() Method ;hold task Put it in
completionQueueInside the queue . When the task is completed ,task Will be put in the queue ;At the moment ,
completionQueueIn the queue task All havedone()It's done task. And this task That's what we got one by one future result ;If the
completionQueueOf task Method , Will block waiting tasks . What we wait for must be finished future, We call.get()Method You can get results right away .

See here , I believe the big guys should understand more or less :
We are using
ExecutorService submitAfter submitting a task, you need to pay attention to the return of each task future. HoweverCompletionServiceFor these future Tracked , And rewritten done Method , Let you wait completionQueue It must be finished in the queue task;As gateway RPC layer , We don't have to drag down all requests because of the slow response of an interface , Can be used in business scenarios that handle the fastest response
CompletionService.
But notice ! It is also the core problem of this accident .
Only called ExecutorCompletionService Below 3 When any one of the methods , Block... In the queue task The execution result will be removed from the queue , Free heap memory .
Because the business does not need to use the return value of the task , There is no call take、poll Method , As a result, heap memory is not freed . Heap memory will continue to grow as the amount of calls increases .

therefore , There is no need to use the return value of the task in the business scenario , Don't use it for nothing CompletionService. If used , Remember to remove from the blocking queue task Execution results , avoid OOM!
summary
Know the cause of the accident , Let's summarize the methodology . After all, Confucius said : Introspection , I often think about my past , Be good at cultivating your body !
Before going online
Strict code review habit , Be sure to give it to back People go to see it , After all, you can't see the problem with your own code , I believe every program ape has this confidence ;
Online record : Note: the last package version that can be rolled back ( Leave yourself a way back );
Confirm the rollback before going online , Whether the business can be degraded . If it cannot be degraded , We must strictly lengthen the monitoring cycle of this launch .
After the launch
Continue to pay attention to memory growth ( This part can easily be ignored , People pay less attention to memory than CPU Usage rate );
Continuous attention CPU Usage growth
GC situation 、 Whether the number of threads increases 、 Whether there are frequent Full GC etc. ;
Pay attention to service performance alarm ,TP99、999 、MAX Whether there is a significant increase .
source :juejin.cn/post/7064376361334358046
边栏推荐
- Sword finger offer (x): rectangular coverage
- [typescript] typescript common types (Part 1)
- Example of establishing socket communication with Siemens PLC based on C # open TCP communication
- B+ tree index uses (7) matching column prefix, matching value range (19)
- 学习pinia 介绍-State-Getters-Actions-Plugins
- Precautions for triggering pytest.main() from other files
- Outline design specification
- panic: Error 1045: Access denied for user ‘root‘@‘117.61.242.215‘ (using password: YES)
- 父组件访问子组件的方法或参数 (子组件暴漏出方法defineExpose)
- Win11+VS2019配置YOLOX
猜你喜欢

Hcip day 11 comparison (BGP configuration and release)

《Kotlin系列》之MVVM架构封装(kotlin+mvvm)

HCIP第十一天比较(BGP的配置、发布)

基于C#实现的学生考试系统

8 年产品经验,我总结了这些持续高效研发实践经验 · 研发篇

Kubernetes Flannel:HOST-GW模式
![[upper computer tutorial] Application of integrated stepping motor and Delta PLC (as228t) under CANopen communication](/img/d4/c677de31f73a0e0a4b8b10b91e984a.png)
[upper computer tutorial] Application of integrated stepping motor and Delta PLC (as228t) under CANopen communication
![[typescript] typescript common types (Part 1)](/img/80/5c8c51b92d3a9d76f38beba7be0aa6.png)
[typescript] typescript common types (Part 1)
![[flower carving hands-on] fun music visualization series small project (12) -- meter tube fast rhythm light](/img/99/6581b8a576e59a13aa4e977e3a1b70.jpg)
[flower carving hands-on] fun music visualization series small project (12) -- meter tube fast rhythm light

Slam 02. overall framework
随机推荐
学习pinia 介绍-State-Getters-Actions-Plugins
Target detection network r-cnn series
Precautions for triggering pytest.main() from other files
B+ tree selection index (1) -- MySQL from entry to proficiency (22)
LeetCode 2119. 反转两次的数字
B+树挑选索引(2)---mysql从入门到精通(二十三)
File upload and download performance test based on the locust framework
[upper computer tutorial] Application of integrated stepping motor and Delta PLC (as228t) under CANopen communication
AI-理论-知识图谱1-基础
1312_适用7z命令进行压缩与解压
Some practical operations of vector
B+树挑选索引(1)---mysql从入门到精通(二十二)
Tupu 3D visual national style design | collision between technology and culture "cool" spark“
如何面对科技性失业?
MVVM architecture encapsulation of kotlin series (kotlin+mvvm)
B+树索引使用(8)排序使用及其注意事项(二十)
Solution 5g technology helps build smart Parks
How to face scientific and technological unemployment?
Square root of leetcode 69. x
基于BERT的情感分析模型