当前位置:网站首页>JVM garbage collection mechanism and common garbage collectors

JVM garbage collection mechanism and common garbage collectors

2022-06-11 00:16:00 Xujunsheng

The opening

In the previous post , We introduced JVM And garbage collection algorithm . Today we are going to talk about JVM The most important heap memory is how to use the garbage collector for garbage collection , And how to use commands to configure and use these garbage collectors .

Before that , Let's do a brief review of the garbage collection mechanism .

Garbage collection

According to the semantic meaning , Garbage collection , First of all, we need to find the garbage , Then recycle it . however GC The process is just the opposite , It is to find the active object first , Then identify other inactive objects as garbage , Then delete .

So garbage collection Only with active objects , It has nothing to do with the size of the heap .

The implementation details of various garbage collectors are different , But overall , Garbage collectors focus on two things :

  1. Find all living objects
  2. Discard the rest , Dead object , Objects that are no longer in use

Mark reachable objects (Marking Reachable Objects)

modern JVM All of the GC Algorithm , The first step is to find all the surviving objects . As shown in the figure below :

 Insert picture description here

  1. First , Some specific objects are specified as Garbage Collection Roots(GC Root element ).
  2. secondly ,GC Traverse (traverses) Overall object relationship diagram in memory (object graph), from GC The root element starts scanning , Direct reference to , And other objects ( Through the property field of the object ). all GC All accessed objects are marked as living objects .
  3. After the marking phase is complete , All living objects are marked ( Living objects are represented in blue in the above figure ). Other objects ( The gray data structure in the figure above ) It's from GC The root element is not reachable , That is, programs can no longer use these unreachable objects (unreachable object). Such objects are considered garbage ,GC They will be cleared in the next phase .

There is one thing to pay attention to during the marking phase : In the marking phase , Need to pause all application threads , To traverse the reference relationships of all objects . Because you can't keep track of changing reference diagrams without pausing . This scenario is called Stop The World( abbreviation STW, The whole line stopped ), The point at which a thread can be safely suspended is called safer (safe point), then ,JVM You can concentrate on cleaning up . The safety point may be triggered by many factors , At present ,GC Is the most common reason for triggering a safety point .

The time this phase is suspended , And heap memory size , The total number of objects is not directly related , It is By the surviving object (alive objects) To determine the number of . therefore Increasing the size of heap memory does not directly affect the time taken by the marking phase .

After the marking phase is complete ,GC Proceed to the next step , Delete unreachable objects .

Delete unreachable objects (Removing Unused Objects)

Various GC The algorithm is slightly different when deleting unreachable objects , But in general, it can be divided into three categories : eliminate (sweeping)、 Arrangement (compacting) And copy (copying).

eliminate (Sweeping)

Mark and Sweep( Mark — eliminate ) The concept of the algorithm is very simple : Just ignore all the garbage . That is, after the marking phase , Memory space occupied by all unreachable objects , Are considered idle , So it can be used to allocate new objects .

This algorithm requires the use of free tables (free-list), To record all free areas , And the size of each area .

Maintaining free tables increases the overhead of object allocation . There is another weakness —— That's it Debris problem .

For example, I applied 1k、2k、3k、4k、5k Of memory .

 Insert picture description here
For some reason 2k and 4k Of memory , No longer use , You need to hand it over to the garbage collector for recycling .

 Insert picture description here
This is the time , I should have enough 6k Of free space . Next , I am going to apply for a 5k Space , As a result, the system told me that I was out of memory .

And the longer the system runs , The more pieces there are .

Arrangement (Compacting)

Mark — eliminate — Sorting algorithm (Mark-Sweep-Compact), All marked objects ( The living object ), Migrate to the beginning of memory space , Eliminated 「 Mark — Clear algorithm 」 The shortcomings of .

The corresponding disadvantage is GC The pause time will increase , Because you need to copy all the objects to another place , Then modify the references to these objects .

The advantages of this algorithm are also obvious , After defragmentation , Assigning new objects is simple , Just go through 「 Pointer collision 」(pointer bumping) that will do . Use this algorithm , The amount of memory space remaining is always clear , No more memory fragmentation problems .

hypothesis Java The memory in the heap is absolutely regular , All used memory is put in one edge , Free memory is put on the other side , There is a pointer in the middle as an indicator of the dividing point , Then the allocated memory is just that A pointer moves a distance equal to the size of the object in the direction of free space , This distribution is called “ Pointer collision ”(Bump The Pointer).

Copy (Copying)

Mark — Copy algorithm (Mark and Copy) and 「 Mark — Sorting algorithm 」(Mark and Compact) Very similar : Both move all living objects .

The difference lies in ,「 Mark — Copy algorithm 」 Is to move memory to another space : Survival zone .

「 Mark — Replication method 」 The advantage is : Tagging and copying can be done at the same time . The drawback is that you need an extra memory section , To store all living objects .

The generational Hypothesis

We briefly introduce some common memory recycling algorithms , at present ,JVM The garbage collector , It is the development of several simple algorithms . Take a brief look at their characteristics :

  • Mark - eliminate (Mark-Sweep): The efficiency of general , The disadvantage is that it will cause memory fragmentation .
  • Copy algorithm (Copy): Replication algorithm is the most efficient of all algorithms , The disadvantage is that it will cause a certain waste of space .
  • Mark - Arrangement (Mark-Compact): The efficiency is worse than the first two , But no space is wasted , It also eliminates the memory fragmentation problem .

therefore , There is no optimal algorithm , Only the most appropriate algorithm .

JVM It's computing nodes , Not storage nodes . The ideal situation , After the object is used up , Its life cycle ends immediately . And those resources that are accessed frequently , We want it to reside in memory .

Studies have shown that , Most of the objects , It can be divided into two categories :

  • Most objects have a short life cycle
  • Other objects are likely to survive for a long time

Most die fast . We call this hypothesis the weak generational hypothesis (weak generational hypothesis).

Corresponding to this is the strong generational hypothesis (Strong Generational Hypothesis): The more times an object survives the garbage collection process, the more difficult it is for an object to die .

Now the garbage collector , Both physically and logically , Distinguish between these two types of objects :

  • We take the area occupied by the fast dead objects , called The younger generation (Young generation)
  • Take the area occupied by other living long objects , called Old age (Old generation), The old age is also called... In some places Tenured Generation.

The younger generation

The garbage collection algorithm used by young generation is replication algorithm . Because the younger generation GC after , Very few objects will survive , It's very efficient to copy this part of the object .

We also learned that the replication algorithm will cause a certain waste of space , So there will be many regions among the young generation .

 Insert picture description here
As shown in the figure , The young generation is divided into :Eden District , Two Survivor District .

In the early Qing Dynasty Eden Distinguish when the match is full , It will trigger the younger generation to GC(Minor GC). The specific process is as follows :

  • stay Eden The zone executed the first time GC after , The surviving objects will enter one of them through the replication algorithm Survivor Partition ( hereinafter referred to as from);
  • Eden District again GC, take Eden and from Clean up the area together , Live objects are copied to to District ; Next , Just empty from It's OK .

In the process , There is always one Survivor The partition is empty .Eden、from、to The default scale for is 8:1:1, So it will only cause 10% Waste of space . The proportion of , By parameters -XX:SurvivorRatio Configured ( The default is 8).

In general , We just need to understand this level OK 了 . But in an ordinary interview , Another point will often be mentioned , Although the frequency is not too high , It is TLAB, Here we also briefly introduce .

TLAB

TLAB The full name is Thread Local Allocation Buffer,JVM By default, each thread is opened with one buffer Area , Used to speed up object allocation . This buffer Just put it in Eden In the area .

stay Java Many of the objects in the program are small objects and can be lost after use , They don't exist for thread sharing and are suitable for being fast GC, So for small objects, usually JVM Priority will be given to TLAB On , also TLAB The allocation on is thread private , So there is no lock overhead . Therefore, in practice, it is usually more efficient to allocate multiple small objects than to allocate one large object .

This truth and Java In language ThreadLocal similar , Avoid the operation of public areas , And some lock competition .

 Insert picture description here

The allocation of objects takes precedence in TLAB The distribution of , but TLAB It's usually very small , So when the object is relatively large , Will be in Eden The shared area of the zone is allocated .

TLAB It's an optimization technique , Similar optimizations include allocation of objects on the stack ( This leads to the topic of escape analysis , Default on ).

Old age

In the old days 「 Mark - eliminate 」、「 Mark - Arrangement 」 Algorithm , Because the survival rate of the old generation is generally relatively high , The space is relatively large , It's not worth copying , It's better to take the way of local collection .

that , How did the object enter the old age ? There are many ways .

(1) promote (Promotion)

If the object is old enough , Will pass 「 promote 」 Into the old age .

About whether the object is old or not , Through its age (age) To determine the . Whenever it happens Minor GC, The age of the surviving objects will increase 1. Until a certain threshold is reached , They're going to put these “ Obstinate ” To the old age .

If these objects become unreachable , Until the old days GC When , Will be cleaned up .

This threshold , You can use the parameter ‐XX:+MaxTenuringThreshold To configure , The maximum is 15, Because it uses 4bit Stored .

(2) Distribution guarantee

Look at the picture of the young generation , Every time you live , Will be placed in one of the surviving areas , The default scale for this area is 10%.

But we can't guarantee that the number of objects that survive each time is less than 10%, When Survivor Space is not enough , It depends on other memory ( The old age ) Distribution guarantee . This is the time , Objects are also allocated directly to the older generation .

(3) Big objects are distributed directly to the elderly

Objects beyond a certain size will be allocated directly in the older generation . This value is determined by the parameter -XX:PretenureSizeThreshold Configured . The default is 0, It means all of them Eden District Distribution .

(4) Dynamic object age determination

Some garbage collection algorithms , There is no demand for age Must be achieved 15 To rise to the old age , It uses some dynamic calculation methods . such as , If the sum of the sizes of objects of the same age in the surviving area , More than half of the survival zone , Greater than or equal to age We're going to go straight to the older generation .

Card mark (card marking)

We can see , Object references are a huge mesh . Some objects may be in Eden District , Some may be in the older generation , Then this kind of How cross generational references are handled What about ?

because Minor GC It happened alone , If an old age object references it , How to ensure that young people can survive ?

For yes 、 No judgment , We usually use Bitmap( Bitmap ) And bloom filter to speed up the search .

JVM In a similar way . Actually , The old age is divided into many card pages (card page) Of ( The general quantity is 2 Power of power ).

Card table (Card Table) Is a collection used to mark the status of a card page , Each card entry corresponds to a card page .

If the younger generation has a target assignment , And there was an object in the old age that pointed to this new object , Then the memory card page corresponding to the old age object , It will be marked as dirty, The card table only needs very small storage space to keep these states .

Garbage collection , You can read this card and watch first , Make a quick judgment .

HotSpot Garbage collector

So let's start with HotSpot Several garbage collectors , Each type of recycler has its own characteristics . We're in the normal GC To optimize the , Be sure to find out what kind of garbage collector you are using now .

Before that , We put the above generation garbage collection into a big picture , When introducing the following collectors , You can map their positions .
 Insert picture description here

Here we briefly introduce some concepts :

  • Minor GC: It happened in the younger generation GC.
  • Major GC: It happened in the old days GC.
  • Full GC: Recycle the whole pile of garbage . such as Metaspace Area causes recycling of young and old generations .

The younger generation of garbage collectors

Serial Garbage collector

Serial( Serial ) Is the most basic garbage collector , Using the replication algorithm , Used to be JDK1.3.1 The only garbage collector of the previous generation .

Serial It's a single threaded collector , It will not only use one CPU Or a thread to complete garbage collection , And suspend all user threads during garbage collection , Until the end of garbage collection . This phenomenon is called STW(Stop-The-World).

STW It is automatically initiated and completed by the virtual machine in the background , Stop all the threads that the user works normally when the user is not visible , This is unacceptable for many applications .

Serial Although the garbage collector needs to pause all other working threads during garbage collection , But it's simple and efficient , For a limited single CPU In terms of environment , There is no overhead of thread interaction , Can get the highest single thread garbage collection efficiency , therefore Serial The garbage collector is still Java The virtual machine is running in Client Default younger generation garbage collector in mode .

ParNew Garbage collector

ParNew yes Serial Multithreaded version of . By multiple GC Threads do garbage cleaning in parallel . The cleanup process still needs to stop the user thread .

ParNew Pursuit “ Low pause time ”, And Serial The only difference is the use of multithreading for garbage collection , In a multiple CPU Performance ratio in environment Serial There will be a certain degree of improvement ; But thread switching requires extra overhead , So in the single CPU Not as good as in the environment Serial.

ParNew Collectors are many that run in Server Virtual machine in mode , In especial JDK 7 The preferred new generation collector in previous legacy systems , There is one with the function 、 Performance independent but important reasons are , except Serial Outside the collector , At present, only ParNew Can and CMS The collector works with .

Parallel Scavenge Garbage collector

Parallel Scavenge The collector is a new generation collector , It's also a collector that uses replication algorithms , Is a parallel multithreaded collector ,Java1.8 The default new generation garbage collector .

It focuses on the program achieving a Controllable throughput (Thoughput,CPU Time to run user code /CPU Total time consumed , namely throughput = Run user code time /( Run user code time + Garbage collection time )), Throughput first .

High throughput can be used most efficiently CPU Time , Finish the operation task of the program as soon as possible , It is mainly applicable to tasks that operate in the background without too much interaction with users .

It is associated with ParNew The main difference is :

  • Parallel Scavenge: Pursuit CPU throughput , Be able to complete the assigned task in a short time , Suitable for background computing without interaction . Weak interaction and strong computing .
  • ParNew: Seeking to reduce user pause time , Suitable for interactive applications . Strong interaction and weak computation .

Garbage collectors in the old days

Serial Old Garbage collector

With the younger generation Serial The garbage collector corresponds to , All single threaded versions , It is also suitable for clients .

Young generation Serial, Using the replication algorithm .

Old age Serial Old , Use the tag - Sorting algorithm .
 Insert picture description here

Parallel Old

Parallel Scavenge The old version , Support multi thread concurrent collection , Based on tags - Finishing algorithm implementation .

In the case of paying attention to throughput or processor resource scarcity , Priority can be given to Parallel Scavenge Add Parallel Old Collector combination .

 Insert picture description here

CMS Garbage collector

CMS(Concurrent Mark Sweep) The collector is to get the shortest GC Collector with pause time as target , It makes user threads and GC Threads can execute concurrently , So in the process of garbage collection, users will not feel obvious stuck .

You know from the name ,CMS It's based on markers - Clear algorithm implementation . Its operation process is more complicated than the previous collectors , The whole process is divided into seven steps , Include :

  1. Initial marker (Initial Mark)
  2. Concurrent Tags (Concurrent Mark)
  3. Concurrent pre cleanup (Concurrent Preclean)
  4. Concurrent cancelable pre cleanup (Concurrent Abortable Preclean)
  5. Final marker (Final Remark)
  6. Concurrent elimination (Concurrent Sweep)
  7. Concurrent reset (Concurrent Reset)

In the long term ,CMS Garbage collector , Is to be G1 When the garbage collector is replaced . stay Java8 after , Using it will throw a warning .

Java HotSpot 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.

G1

G1 The full name is Garbage­First GC,G1 Our goal is to kill CMS Of , It is also a soft real-time garbage collector . comparison CMS,G1 The use of is more humanized . such as ,CMS The relevant parameters of the garbage collector are 72 individual , and G1 The only parameters for are 26 individual .

G1 stay JDK1.7 Version officially launched , yes JDK 9 Future default garbage collector , To replace the CMS Recyclers .

Other recyclers , It's an overall collection of a certain era , Collection time is naturally difficult to control .G1 Cut the pile into many pieces , Think of each as a small goal , Some goals are easy to achieve .

There's another interview question :G1 Is there a distinction between the younger generation and the older generation ?

 Insert picture description here

As shown in the figure ,G1 There is a Eden Area and Survivor The concept of zone , But they are not continuous in memory , It's made up of small portions .

The size of this small area is fixed , It's called a small pile area (Region). The small pile area can be Eden District , It can also be Survivor District , It can also be Old District . therefore G1 The concepts of the younger generation and the older generation are logical .

Every piece Region, The size is the same , Its value is in 1M To 32M One... Between bytes 2 The power of .

But if my object is too big , One Region What if I can't let it go ? Notice that there is a large yellow area in the picture , Its name is Humongous Region, Larger than Region 50% The object of , Will be assigned here .

Region Size , It can be set through parameters :-XX:G1HeapRegionSize=32M.

that , When it's recycled , Which small heap areas are recycled ? Is it random ?

Of course not . in fact , The small dump area with the most garbage , Will be collected first . This is it. G1 The origin of the name .

Configuration parameters

adopt -XX:+PrintCommandLineFlags Parameters , You can view the current Java The version uses the default garbage collector . such as , In my system Java8 The default collectors are as follows :

java -XX:+PrintCommandLineFlags -version
-XX:InitialHeapSize=265277248 -XX:MaxHeapSize=4244435968 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:-UseLargePagesIndividualAllocation -XX:+UseParallelGC
java version "1.8.0_321"
Java(TM) SE Runtime Environment (build 1.8.0_321-b07)
Java HotSpot(TM) 64-Bit Server VM (build 25.321-b07, mixed mode)

Here are some configuration parameters :

  • -XX:+UseSerialGC Parameter specifies that both the new generation and the old generation use a serial garbage collector , namely Serial + Serial Old
  • -XX:+UseParNewGC The new generation uses ParNew, Used in the old days Serial Old
  • -XX:+UseParallelGC The new generation uses Parallel Scavenge, In the old days, it was used by default Serial Old
  • -XX:+UseParallelOldGC The new generation uses Parallel Scavenge, Used in the old days Parallel Old
  • -XX:+UseConcMarkSweepGC, Indicates that the younger generation uses ParNew, In the old days CMS
  • -XX:+UseG1GC Use G1 Garbage collector
  • -XX:+UseZGC Use ZGC Garbage collector

In order to make a better impression , Please look at the chart below. . Their relationship is still complex . Pay special attention to -XX:+UseParNewGC This parameter , Already in Java9 Was abandoned in .

 Insert picture description here

There are so many garbage collectors and parameters , Then what exactly do we use ? Where to optimize ?

at present , although Java It's a higher version of , But the most used is Java8. from Java8 Upgrade to a higher version of Java system , There is a certain cost , therefore CMS The garbage collector will last for some time .

The most used garbage collector online , There is CMS and G1, as well as Java8 default Parallel Scavenge.

  • CMS Setting parameters of :-XX:+UseConcMarkSweepGC.
  • Java 8 Default parameters :-XX:+UseParallelGC.
  • Java 13 Default parameters :-XX:+UseG1GC.

STW

We mentioned above STW, If during garbage collection ( Whether it's marking, sorting, copying ), What if new objects enter ?

In order to ensure that the program will not be disorderly , The best way is to pause all threads of the user . In this period of time , You can't new Object's , Can only wait for . Performance in JVM It's a short Caton , Nothing can be done . This headache , It's called Stop the world. abbreviation STW.

Here's an interview question : GC In the process , Can you still create new objects ?
The answer is No .

Marking stage , Most are to STW Of . If you don't pause the user process , When marking objects , It is possible that other user threads will generate some new objects and references , Cause confusion .

Now the garbage collector , Will try to reduce this process . But even the most advanced ZGC, There will also be a short STW The process . What we have to do is build on the existing infrastructure , Try to reduce GC pause .

Stop the World I believe you have a very deep understanding through the last article

You may be right STW There is no concept of the impact of , Let me give an example to illustrate .

The peak traffic of a highly concurrent service is 10 Ten thousand times / second , In the back 10 A load balancing machine , Then, on average, each machine needs 1w/s. If something happens to a machine during this time STW, lasted 1 second , So it was necessary 10ms You can return 1 Million requests , Need to wait at least 1 Second .

Performance in users , It's the system that gets stuck . If our GC Very often , This kind of Caton will be particularly obvious , Seriously affecting the user experience .

Although I say Java It provides us with a great automatic memory management mechanism , But it can't be abused , Because it has STW Hard wound .

in the final analysis , All kinds of garbage collectors are designed to solve the headache STW problem , Give Way GC Less time , Pause less , More throughput .

Reference material

  • Zhi-ming zhou 《 In depth understanding of Java virtual machine 》
  • Largo Education 《 Explain profound theories in simple language Java virtual machine 》
原网站

版权声明
本文为[Xujunsheng]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/162/202206102254372115.html