当前位置：网站首页>[JVM Series 5] JVM tuning instance

[JVM Series 5] JVM tuning instance

2022-06-13 03:08:00 【Louzai】

Reprint an article , Explain JVM Common tuning strategies , And some problems encountered in work JVM Tuning examples .

Previous selections （ Welcome to forward ~~）

This article is not original , I feel that the actual operation of the original text is relatively strong , Reproduced here , Keep it for later use , Original address ：https://juejin.cn/post/6949806402743304206#heading-21

For the need to JVM tuning , Or meet JVM Related issues , Students who don't know how to solve it , This article is really worth reading .

Preface

JVM It sounds very tall , But realize that ,JVM Tuning should be Java The last bullet in performance optimization .

I agree with Mr. Liao Xuefeng , Realize that JVM Tuning is not a routine tool , Generally, the first choice for performance problems is to optimize the program , The final choice is to do JVM tuning .

Common tuning strategies

I still want to mention , Make sure in time to JVM tuning , And don't fall into “ Knowledge and insight barrier ”, After the analysis , Find ways to improve performance by optimizing programs , The optimizer is still preferred .

Choose the right garbage collector

CPU Single core , So there's no doubt about it Serial The garbage collector is your only choice .
CPU Multicore , Focus on throughput , So choose PS+PO Combine .
CPU Multicore , Focus on user pause time ,JDK edition 1.6 perhaps 1.7, So choose CMS.
CPU Multicore , Focus on user pause time ,JDK1.8 And above ,JVM Available memory 6G above , So choose G1.

Parameter configuration ：

 // Set up Serial Garbage collector （ The new generation ）
  Turn on ：-XX:+UseSerialGC
 
 // Set up PS+PO, New generation use function Parallel Scavenge  The older generation will use Parallel Old The collector 
  Turn on  -XX:+UseParallelOldGC
 
 //CMS Garbage collector （ Old age ）
  Turn on  -XX:+UseConcMarkSweepGC
 
 // Set up G1 Garbage collector 
  Turn on  -XX:+UseG1GC

Resize memory

The phenomenon ： Garbage collection is very frequent .

reason ： If memory is too small , It will lead to frequent garbage collection to free up enough space to create new objects , So the effect of increasing the heap memory size is very obvious .

Be careful ： If the frequency of garbage collection is very frequent , But there are very few objects that can be recycled each time , So this time is not too small memory , It may be a memory leak that causes objects not to be recycled , Thus, frequent GC.

Parameter configuration ：

 // Set the initial value of the heap 
  Instructions 1：-Xms2g
  Instructions 2：-XX:InitialHeapSize=2048m
 
 // Set heap maximum 
  Instructions 1：`-Xmx2g` 
  Instructions 2： -XX:MaxHeapSize=2048m
 
 // New generation memory configuration 
  Instructions 1：-Xmn512m
  Instructions 2：-XX:MaxNewSize=512m

Set the expected pause time

The phenomenon ： The blocking of procedural Indirectness

reason ： If there is no exact pause time set , The garbage collector is mainly throughput , Then the garbage collection time will be unstable .

Be careful ： Don't set unrealistic pause times , The shorter the single time, the more GC The number of times to recycle the original amount of garbage .

Parameter configuration ：

 //GC Pause time , The garbage collector will try to achieve this time by various means 
 -XX:MaxGCPauseMillis

Adjust the memory area size ratio

The phenomenon ： In a certain area GC frequent , Everything else is normal .

reason ： If there is not enough space in the corresponding area , Lead to the need for frequent GC To free up space , stay JVM If heap memory cannot be increased , You can adjust the size ratio of the corresponding area .

Be careful ： Maybe it's not lack of space , But because of the memory leak, the memory cannot be recycled . Which leads to GC frequent .

Parameter configuration ：

 //survivor Area and Eden Area size ratio 
  Instructions ：-XX:SurvivorRatio=6  //S Area and Eden The ratio of area to Cenozoic is 1:6, Two S District 2:6
 
 // The proportion of the new generation and the old generation 
 -XX:NewRatio=4  // Represent the Cenozoic era : Old age  = 1:4  That is, the older generation takes up the whole pile of 4/5; The default value is =2

Adjust the age of the object's aging years

The phenomenon ： The older generation is more frequent GC, There are many objects to recycle each time .

reason ： If they are younger , The objects of the new generation will soon enter the old generation , Leading to more objects in the old age , In fact, these objects can be recycled in a very short period of time , At this time, you can adjust the upgrade age of the object , Let the object not so easy to enter the old age, solve the old age space shortage frequently GC problem .

Be careful ： After increasing the age , These objects will be in the Cenozoic for a longer time, which may lead to the Cenozoic GC The frequency increases , And copy these objects frequently GC It could be longer .

Configuration parameters ：

// Into the old age, the youngest GC Age , Minimum age value of young generation object converted to old age object , The default value is 7
 -XX:InitialTenuringThreshol=7

Adjust the standard of large objects

The phenomenon ： The older generation is more frequent GC, There are many objects to recycle each time , And the volume of a single object is relatively large .

reason ： If a large number of large objects are directly allocated to the old age , It makes the old generation easy to be filled and causes frequent GC, You can set the standard for objects to enter the old age directly .

Be careful ： After these large objects enter the Cenozoic, they may make the Cenozoic GC Frequency and time increase .

Configuration parameters ：

 // The largest object that the new generation can accommodate , If it is greater than that, it will be distributed to the old age ,0 There is no limit to the representation .
  -XX:PretenureSizeThreshold=1000000

adjustment GC The trigger time of

The phenomenon ：CMS,G1 often Full GC, Program stuck seriously .

reason ：G1 and CMS part GC Phases are concurrent , Business threads and garbage collection threads work together , This means that the business thread will generate new objects in the process of garbage collection , So in GC You need to reserve some memory space to hold the newly generated objects , If the memory space is not enough to hold the newly generated object at this time , that JVM Will stop concurrent collection and suspend all business threads （STW） To ensure the normal operation of garbage collection . It can be adjusted at this time GC Trigger time （ For example, in the older generation 60% It triggers GC）, In this way, enough space can be reserved for the objects created by the business thread to have enough space allocation .

Be careful ： Trigger early GC It's going to increase the number of older people GC The frequency of .

Configuration parameters ：

 // What proportion of the old age is used CMS collect , The default is 68%, If it happens frequently SerialOld Carton , It should be turned down 
 -XX:CMSInitiatingOccupancyFraction
 
 //G1 Set occupancy thresholds for old areas to be included in the mixed garbage collection cycle . The default occupancy rate is  65%
 -XX:G1MixedGCLiveThresholdPercent=65

adjustment JVM Local memory size

The phenomenon ：GC The number of times 、 Time and recycled objects are normal , There's plenty of heap memory , But the newspaper OOM

reason ： JVM In addition to heap memory, there is also an extra heap memory , This memory is also called local memory , However, this memory area is insufficient and will not trigger automatically GC, Only when the heap memory area is triggered will the local memory be reclaimed , Once the local memory allocation is insufficient, it will directly report OOM abnormal .

Be careful ： When the local memory is abnormal, in addition to the above phenomena , The exception information may be OutOfMemoryError：Direct buffer memory. The solution is to adjust the local memory size , You can also catch when this exception occurs , Manual trigger GC（System.gc()）.

Configuration parameters ：

 XX:MaxDirectMemorySize

JVM Tuning examples

Here are some of the things that we've sorted out from the Internet JVM Tuning examples ：

After the website traffic has increased dramatically , Website response page is very slow

1、 The problem is conjecture ： In the test environment, the test speed is relatively fast , But when it comes to production, it slows down , So it is speculated that the business thread may be stalled due to garbage collection .

2、 location ： To confirm the correctness of the conjecture , Go online through jstat -gc Instructions notice JVM Conduct GC The frequency is very high ,GC It takes a very long time , So the basic inference is that GC The frequency is very high , So the business thread often stops , So the response of the web page is very slow .

3、 Solution ： Because of the high number of page visits , So object creation is very fast , Causes the heap memory to fill easily, thus frequently GC, So the problem here is that the memory of the new generation is too small , So here you can add JVM Just memory , So initially from the original 2G Memory increased to 16G Memory .

4、 The second question is ： After increasing the memory, the normal request is faster , But there's another problem , It's irregular, it's intermittently stuck , And it takes a lot longer than before to get stuck .

5、 The problem is conjecture ： The practice is that the previous optimization increased the memory , So I guess it's probably because of the increase in memory , This leads to a single GC It's going to take a long time to get stuck indirectly .

6、 location ： Or through jstat -gc Instructions Check out You bet FGC The number of times is not very high , But it costs FGC The time on the Internet is very high , according to GC journal See the single FGC There are dozens of seconds of time .

7、 Solution ： because JVM The default is PS+PO The combination of ,PS+PO The garbage marking and collection phases are STW, So when the memory increases , It takes longer to recycle , So here we want to avoid a single GC drawn-out , So you need to replace the collector of the concurrency class , Because of the current JDK Version is 1.7, So the last choice is CMS Garbage collector , Set an expected pause time based on the previous garbage collection , After going online, the website no longer has the Caton problem .

Caused by exporting data in the background OOM

** Problem description ：** The company's back office system , Accidental initiation OOM abnormal , Heap memory overflow .

1、 Because it's accidental , So for the first time, I simply thought that the lack of heap memory caused , So unilaterally increased the heap memory from 4G To adjust to 8G.

2、 But the problem remains unsolved , You can only start with heap memory information , By opening -XX:+HeapDumpOnOutOfMemoryError Parameters Get heap memory dump file .

3、VisualVM Yes Pile up dump Document analysis , adopt VisualVM The object that takes up the most memory is String object , I wanted to follow String Object to find its reference , but dump The file is too large , It's always stuck when I'm tracking in , and String It takes up a lot of objects and it's normal , At the beginning, I didn't think it was the problem here , So we find the breakthrough point from the thread information .

4、 Analysis through threads , First, we found several running business threads , Then follow up the business thread one by one and look at the code , I found a way to get my attention , Export order information .

5、 Because the order information export method may have tens of thousands of data , First of all, we need to find out the order information from the database , And then generate the order information excel, This process produces a lot of String object .

6、 To test one's conjecture , So I'm ready to log in to the background to test , Results in the process of testing, it was found that the front end of the button everywhere didn't do the interactive event of graying the button after clicking , The results button can be pressed all the time , Because exporting order data is very slow , Users may find that the page doesn't respond long after clicking , It turns out that it's just a little bit , As a result, a large number of requests go into the background , Heap memory generates a lot of order objects and EXCEL object , And the method execution is very slow , As a result, these objects cannot be recycled during this period of time , So it eventually leads to a memory overflow .

7、 When you know the problem, it's easy to solve it , In the end, there was no adjustment JVM Parameters , Just add the gray status on the front-end export order button , After the back-end responds, the button can be clicked , Then we reduce the unnecessary fields to query the order information to reduce the volume of the generated object , Then the problem is solved .

The system caused by too large single cache data CPU Soaring

1、 After the system was released, it was found that CPU All the way up to 600%, After finding this problem, the first thing to do is to locate which application is occupying CPU high , adopt top Found the corresponding one java The application to take up CPU resources 600%.

2、 If it's applied CPU Soaring , So basically, it can be positioned. It may be lock resource competition , Or frequent GC Caused by the .

3、 So I'm going to start with GC Check the situation of , If GC If normal, check from the perspective of thread , use first jstat -gc PID The instructions print out GC Information about , The result is GC Statistical information has obvious anomalies , The app has only been running for a few minutes GC It takes up a lot of time 482 second , Well, it's obviously frequent GC As a result of CPU Soaring .

4、 It's positioned to be GC The problem of , So the next step is to find frequent GC The reason why the , So we can position it in two ways , Maybe it's somewhere that creates objects frequently , Or there is a memory leak that causes the memory to be recycled .

5、 According to this idea, we decided to put the heap memory information dump Come down and have a look , Use jmap -dump The instruction takes the heap memory information dump Come down （ If the heap memory space is large, use this instruction carefully, otherwise it will affect the application , Because of our heap memory space 2G So I didn't think about it ）.

6、 Put the heap memory information dump After the down , Just use visualVM Off line analysis , First, find... From the objects that use the most memory , It turns out to be the third place to see a business VO It takes up about 10% Space , Obviously there is something wrong with this object .

7、 The corresponding business code is found through the business object , Through the analysis of the code, we found a suspicious place , This business object is the object generated by viewing news information , Because I want to improve the efficiency of query , So I saved the news information to redis cache , Every time I call the information interface, I get it from the cache .

8、 Save the news to redis There is no problem with this method in cache , The problem is the news 50000 Multiple pieces of data are stored in one key Inside , As a result, every time the query news interface is called, it will start from redis Inside the handle 50000 Multiple pieces of data , And then do the screening 10 Return to the front end .50000 Multiple data means that 50000 Multiple objects , Each object 280 In bytes or so ,50000 One object has 13.3M, This means that just looking at the news once will produce at least 13.3M The object of , Then the number of concurrent requests only needs to reach 10, So it happens every second 133M The object of , And this kind of big object will be directly assigned to the old age , In that case, one 2G The size of old memory , It only takes a few seconds to fill up , triggering GC.

9、 When you know the problem, it's easy to solve it , The problem is that a single cache is too large , So just reduce the cache , Here we just need to cache the cache at the granularity of the page , Every key cache 10 As a return to the front end 1 Pages of data , In this way, every time you query news information, it will only be taken out of the cache 10 Data , To avoid this problem produce .

CPU often 100% Problem location

Problem analysis ：CPU Gao must have been occupied by some program for a long time CPU resources .

1、 So we need to find out which one to occupy first CPU high .

top   List the resource usage of each process in the system .

2、 Then find out which thread is occupied in the corresponding process CPU high .

top -Hp  process ID    List the resources occupied by threads in the corresponding process

3、 Find the corresponding thread ID after , Print out the stack information of the corresponding thread

printf "%x\n"  PID     Thread ID Convert to 16 Base number .
jstack PID  Print out all thread information of the process , From the printed thread information, find the previous step converted to 16 Base thread ID The corresponding thread information .

4、 Finally, according to the stack information of the thread, locate the specific business method , Find the problem in the code logic .

 Check to see if there are threads with long time watting  or blocked
 If the thread is in watting State, ,  Focus on watting on xxxxxx, Indicates that the thread is waiting for the lock , Then find the thread holding the lock according to the address of the lock .

High memory problem positioning

analysis ： If it happens in java In progress , This is usually caused by the creation of a large number of objects , The continuous surge indicates that garbage collection can't keep up with the speed of object creation , Or memory leaks cause objects to not be recycled .

1、 First, look at garbage collection

jstat -gc PID 1000  see GC frequency , Time and other information , Print every second .
jmap -histo PID | head -20    Look at the top of the heap with the largest memory footprint 20 Object types , You can initially see which object occupies the memory .

If every time GC Frequent , And the memory space recovered each time is also normal , That means that the speed of object creation leads to high memory consumption ; If you recycle very little memory at a time , Well, it's probably because of a memory leak that the memory can't be recycled all the time .

2、 Export heap memory file snapshot

jmap -dump:live,format=b,file=/home/myheapdump.hprof PID  dump Heap memory information to file .

3、 Use visualVM Yes dump Offline analysis of files , Find the object with high memory consumption , Then find the business code location where the object was created , Identify specific issues from code and business scenarios .

Data analysis platform system is frequent Full GC

The platform is mainly for users in App Conduct regular analysis and statistics for middle behavior , And support report export , Use CMS GC Algorithm .

The data analyst finds that the system page is often stuck in use , adopt jstat Command discovery system every time Young GC After about 10% Of those who survive enter the old age .

It's because Survivor The space is too small , Every time Young GC After the survival of the object in Survivor The area won't hold , Enter the old generation ahead of time .

By increasing Survivor District , bring Survivor The area can hold Young GC Post survival objects , The object is Survivor The district has experienced many times Young GC Reaching the age threshold before entering the old age .

Every time after adjustment Young GC Only a few hundred of the surviving objects in the backward and old age run stably Kb,Full GC The frequency is greatly reduced .

Service docking gateway OOM

Gateway mainly consumes Kafka data , Do data processing calculations and forward them to another Kafka queue , When the system runs for a few hours OOM, After a few hours of restarting the system OOM.

adopt jmap Export heap memory , stay eclipse MAT Tool analysis to find out why ： In the code, some business Kafka Of topic Data is printed asynchronously , This business has a large amount of data , A lot of objects are piled up in memory waiting to be printed , Lead to OOM.

The authentication system is frequent for a long time Full GC

The system provides various account authentication services , It is found that the system service is often unavailable when using , adopt Zabbix The monitoring platform monitoring found that the system frequently occurs in the growth time Full GC, And when triggered, the old heap memory is usually not full , It turns out that it was called in the business code System.gc().

Welcome to like it more , More articles , Please follow the WeChat public account “ Louzi's way to advancement ”, Focus , Neverlost ~~