当前位置:网站首页>Go language GC implementation principle and source code analysis
Go language GC implementation principle and source code analysis
2022-06-24 17:23:00 【luozhiyun】
Please state the source of reprint ~, This article was published at luozhiyun The blog of :https://www.luozhiyun.com/archives/475
This article uses Go Source code 1.15.7
Introduce
Tricolor notation
The three color marking method divides the color of the object into black 、 ash 、 white , Three colors .
- black : The object has been marked , And all the attributes under the object have been marked ( Objects needed by the program );
- gray : The object has been marked , But the attributes under the object are not marked completely (GC You need to find garbage from this object );
- white : The object has not been marked ( Object garbage );
When the garbage collector starts working , from GC Roots Start traversal access , The access steps can be divided into the following steps :
- GC Roots The root object is grayed out ;
- Then get the object from the gray set , Mark it black , Mark the object to which the object refers as gray ;
- Repeat step 2, Until there is no grey set to mark ;
- After the end , The remaining unmarked white objects are GC Roots Unreachable , Can be recycled .
The process is as follows :
Now let's talk about the problems of three color marking method .
The problem of tricolor marking method
Multi label - Floating garbage problem
hypothesis E It's been marked ( It's gray ), here D and E Broken reference , In principle, the object E/F/G Should be recycled , But because E It's gray , It will continue to traverse as a live object . The end result is : This part of the object will still be marked as alive , This is the current round GC This memory will not be recycled .
This part should have been recycled But there is no recycled memory , It's called “ Floating garbage ”. The process is shown in the figure below :
Missing mark - Hanging pointer problem
In addition to the above problems , There is also the problem of missing labels . When GC The thread has traversed to E Turn grey ,D When it turns black , gray E Disconnect the white quote G , black D Quoted white G. Cut back at this point GC The thread continues to run , because E There is no right to G Of quoted , So it will not G Put it in the grey set . Although because of D Re quoted G, But because D It's already black , No more traversal processing .
The final result is :G Will stay in the white set all the time , Finally, it's cleaned up as garbage . This directly affects the correctness of the application , It's unacceptable , This is also Go Need to be in GC It's a problem to solve .
Memory barrier
in order to solve The hanging pointer problem above , We need to introduce barrier technology to ensure data consistency .
A memory barrier, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memoryoperations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.
Memory barrier , It's a barrier command , It enables CPU Or the compiler enforces memory operations issued before and after the barrier instruction Sort constraints , Operations performed before the memory barrier must precede those performed after the memory barrier .
So in order to ensure the correctness in the marking algorithm , Then we need to meet any of the following conditions :
- Strong trichromatic invariance (strong tri-color invariant): Black objects don't point to white objects , It only points to gray objects or black objects ;
- Weak trichromatic invariance (weak tri-color invariant): Even if the black object points to the white object , So starting from the grey object , There is always a path to find the white object ;
Depending on the type of operation , We can divide the memory barrier into Read barrier( Reading barrier ) and Write barrier( Write barriers ) Two kinds of , stay Go All of them use Write barrier( Write barriers ), The reason is 《Uniprocessor Garbage Collection Techniques》 It also mentions :
If a non copying collector is used the use of a read barrier is an unnecessary expense.there is no need to protect the mutator from seeing an invalid version of a pointer. Write barrier techniques are cheaper, because heap writes are several times less common than heap reads
For a garbage collector that doesn't need to copy objects , Read barrier( Reading barrier ) The price is high , Because for this kind of garbage collector, there is no need to save the version pointer of the read operation . relatively speaking Write barrier( Write barriers ) The code is smaller , Because the write operations in the heap are much smaller than the read operations in the heap .
Let's take a look at Write barrier( Write barriers ) How does this work .
Dijkstra Write barrier
Go 1.7 It used to be Dijkstra Write barrier( Write barriers ), The implementation used is similar to the following pseudo code :
writePointer(slot, ptr):
shade(ptr)
*slot = ptr If the object is white ,shade(ptr) Objects are marked gray . This guarantees strong trichromatic invariance , It will guarantee ptr The object the pointer points to is assigned to *slot The front is not white .
as follows , The root object points to D Objects are marked black and D Object refers to the object E It's marked in gray ; If D Disconnect right E References to , Change to quote B object , Then the trigger write barrier will B Objects are marked in gray .
Dijkstra Write barrier Although the implementation is very simple , It also guarantees strong trichromatic invariance , But in 《Proposal: Eliminate STW stack re-scanning》 It has some disadvantages :
In particular, it presents a trade-off for pointers on stacks: either writes to pointers on the stack must have write barriers, which is prohibitively expensive, or stacks must be permagrey.
Because objects on the stack are also considered root objects in garbage collection , So you can either add objects to the stack , But this can add a lot of extra overhead to writing pointers ; Or when a write on the stack occurs , Mark stack as constant gray (permagrey).
Go 1.7 The choice is to mark the stack as constant gray , But it needs to be in the tag termination phase STW Rescan these stacks when you need to (re-scan). The reason is described below :
without stack write barriers, we can‘t ensure that the stack won’t later contain a reference to a white object, so a scanned stack is only black until its goroutine executes again, at which point it conservatively reverts to grey. Thus, at the end of the cycle, the garbage collector must re-scan grey stacks to blacken them and finish marking any remaining heap pointers.
Yuasa Write barrier
Yuasa Write barrier yes Yuasa stay 《Real-time garbage collection on general-purpose machines》 A deletion barrier proposed in (deletion barrier) technology . The idea is when the assignor removes the white pointer from a gray or white object , This behavior is notified to the concurrent collector through the write barrier .
The algorithm will use the following write barrier to ensure the correctness of the program in incremental or concurrent garbage collection , The pseudo code is implemented as follows :
writePointer(slot, ptr)
shade(*slot)
*slot = ptrTo prevent losing the path from gray objects to white objects , It should be assumed that slot It could turn black , To make sure ptr Will not be assigned to slot It turns white before ,shade(slot) Will be will be slot Marked in gray , Then the write always creates a path from gray to gray or gray to white objects , In this way, removing the write barrier ensures weak trichromatic invariance , The downstream objects referenced by old objects must be referenced by gray objects .
Hybrid write barrier
It says it's here Go 1.7 It used to be Dijkstra Write barrier( Write barriers ) To keep the three color invariance .Go When rescanning, you must ensure that the reference of the object does not change , So there will be a pause (STW)、 Mark all stack objects gray and rescan , It usually consumes 10~100 Time in milliseconds .
adopt Proposal: Eliminate STW stack re-scanning https://go.googlesource.com/proposal/+/master/design/17503-eliminate-rescan.md Introduction to , We can know that in order to eliminate the performance loss caused by rescanning ,Go stay 1.8 When you use Hybrid write barrier( Hybrid write barrier ), Combined with the Yuasa write barrier and Dijkstra write barrier , The pseudo code of the implementation is as follows :
writePointer(slot, ptr):
shade(*slot)
if current stack is grey:
shade(ptr)
*slot = ptrThis not only simplifies GC The process of , At the same time, it reduces the cost of rescanning in the tag termination phase . The basic idea of hybrid writing barrier is :
the write barrier shades the object whose reference is being overwritten, and, if the current goroutine's stack has not yet been scanned, also shades the reference being installed.
Which translates as : Coloring the object being covered , And if the current stack is not scanned , The pointer is also colored .
meanwhile , stay GC All newly assigned objects will immediately turn black during the process , In memory allocation go\src\runtime\malloc.go Of mallocgc You can see in the function :
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
...
dataSize := size
// obtain mcache, It is used to deal with the allocation of micro objects and small objects
c := gomcache()
var x unsafe.Pointer
// Indicates whether the object contains a pointer ,true Indicates that there is no pointer in the object
noscan := typ == nil || typ.ptrdata == 0
// maxSmallSize=32768 32k
if size <= maxSmallSize {
// maxTinySize= 16 bytes
if noscan && size < maxTinySize {
...
} else {
...
}
// Greater than 32 Kb Memory allocation , adopt mheap Distribute
} else {
...
}
...
// stay GC New objects assigned during the period are marked black
if gcphase != _GCoff {
gcmarknewobject(span, uintptr(x), size, scanSize)
}
...
return x
}In the marking phase of garbage collection , Mark the new object black , Prevent objects in newly allocated stack memory and heap memory from being incorrectly recycled .
analysis
GC phase Garbage collection stage
GC The relevant code is in runtime/mgc.go Under the document . Through the introduction of notes, we can know that GC Divided into 4 Stages :
- sweep termination( Clean up terminated )
- Will trigger STW , be-all P( processor ) Will enter safe-point( safer );
- Clean up what has not been cleaned up span , Don't know what is span My classmates can have a look at my : Detailed explanation Go Memory allocation source code implementation https://www.luozhiyun.com/archives/434;
- the mark phase( Marking stage )
- take
_GCoffGC state Change to_GCmark, Turn on Write Barrier ( Write barrier )、mutator assists( Assist thread ), Team the root object ; - Restore program execution ,mark workers( Tag process ) and mutator assists( Assist thread ) Will start marking objects in memory concurrently . For any pointer write and new pointer value , Will be covered by a writing barrier , All newly created objects are marked directly in black ;
- GC Execute the marking of the root node , This includes scanning all stacks 、 Global objects and runtime data structures that are not in the heap . scanning goroutine Stack painting leads to goroutine stop it , And gray all pointers found on the stack , And then go ahead and do it goroutine.
- GC When traversing the gray object queue , It turns gray objects into black , And grays out the object that the object points to ;
- GC The distributed algorithm will terminate (distributed termination algorithm) To detect when there are no more root tag jobs or gray objects , If not GC It's going to be mark termination( Mark termination );
- take
- mark termination( Mark termination )
- STW, And then GC The stage changes to
_GCmarktermination, close GC Worker threads and mutator assists( Assist thread ); - Perform cleanup , Such as flush mcache;
- STW, And then GC The stage changes to
- the sweep phase( Clean up phase )
- take GC The state changes to
_GCoff, Initialize cleanup state and close Write Barrier( Write barrier ); - Restore program execution , From then on, the newly created objects are all white ;
- Background concurrent cleaning of all memory management units
- take GC The state changes to
It should be noted that , It's mentioned above that mutator assists, Because there's a situation :
during the collection that the Goroutine dedicated to GC will not finish the Marking work before the heap memory in-use reaches its limit
because GC The job of tagging is distribution 25% Of CPU To carry out GC operation , So it's possible GC The tag worker thread of is slower than the allocated memory of the application , Lead to endless marking , At this time, you need the thread of the application to help complete the marking work :
If the collector determines that it needs to slow down allocations, it will recruit the application Goroutines to assist with the Marking work. This is called a Mutator Assist. One positive side effect of Mutator Assist is that it helps to finish the collection faster.
The next time GC opportunity
The next time GC The timing can be determined by an environment variable GOGC To control , The default is 100 , Growth 100% It will trigger only when there is more memory in the heap GC.
This value represents a ratio of how much new heap memory can be allocated before the next collection has to start.
The official explanation is , If you are currently using 4M Memory , So next time GC Will reach... In memory 8M When . Let's take a look at a specific example :
package main
import (
"fmt"
)
func main() {
fmt.Println("start.")
container := make([]int, 8)
fmt.Println("> loop.")
for i := 0; i < 32*1000*1000; i++ {
container = append(container, i)
}
fmt.Println("< loop.")
}It should be noted that , It is recommended to use when doing experiments Linux Environmental Science , without Linux The environment can be like me in win10 Next came a virtual machine , And then use vscode Remote to Linux The experiment was carried out , Let's have a try .
After compiling , have access to gctrace track GC situation :
[[email protected] gotest]# go build main.go [[email protected] gotest]# GODEBUG=gctrace=1 ./main start. > loop. gc 1 @0.004s 4%: 0.22+1.4+0.021 ms clock, 1.7+0.009/0.40/0.073+0.16 ms cpu, 4->5->1 MB, 5 MB goal, 8 P gc 2 @0.006s 4%: 0.10+1.6+0.020 ms clock, 0.83+0.12/0/0+0.16 ms cpu, 4->6->1 MB, 5 MB goal, 8 P gc 3 @0.009s 16%: 0.035+5.5+2.2 ms clock, 0.28+0/0.47/0.007+18 ms cpu, 4->15->15 MB, 5 MB goal, 8 P ... < loop.
It shows 3 Time GC The situation of , Now let's see what it means :
gc 1 @0.004s 4%: 0.22+1.4+0.021 ms clock, 1.7+0.009/0.40/0.073+0.16 ms cpu, 4->5->1 MB, 5 MB goal, 8 P gc 1 : For the first time since the program started 1 Time GC @0.004s: The time since the program started 4%: So far ,GC It's used for marking work CPU Time accounts for the total CPU Percent of Garbage collection time 0.22 ms: Mark the beginning STW Time 1.4 ms: Concurrent mark time 0.021 ms: Mark termination STW Time Garbage collection takes up cpu Time 1.7 ms: Mark the beginning STW Time 0.009 ms:mutator assists Time taken up 0.40 ms: Time spent marking threads 0.073 ms:idle mark workers Time taken up 0.16 ms: Mark termination STW Time Memory 4 MB: Heap size before marking starts 5 MB: The size of the heap after the mark 1 MB: The size of the surviving heap after marking is complete 5 MB goal: Target size of heap memory in use after marking is complete 8 P: How many processors are used
From the above GC In memory information you can see , stay GC Before the tag starts, the heap size is 4MB, Because tagging is done concurrently , So when the tag is complete, the size used in the heap is 5MB, This means that there is 1MB The allocation of memory occurs in GC period . Finally, we can see GC After marking, the surviving heap size is only 1MB, This also means that the heap can occupy up to 2MB Start the next round at the end of the day GC.
We can see from above Goal Part of the memory size is 5MB, And the actual In-Use After Part of the memory footprint is equal , In many complex cases, it's not equal , because Goal Part of the memory size is calculated based on the current memory usage .
the goal is calculated based on the current amount of the heap memory in-use, the amount of heap memory marked as live, and timing calculations about the additional allocations that will occur while the collection is running.
Trigger GC Conditions
Trigger GC The condition is that gcTrigger.test To check , So let's see gcTrigger.test How to determine if garbage collection needs to be triggered :
func (t gcTrigger) test() bool {
if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
return false
}
switch t.kind {
case gcTriggerHeap:
// The allocation of heap memory reaches the trigger heap size calculated by the controller
return memstats.heap_live >= memstats.gc_trigger
case gcTriggerTime:
if gcpercent < 0 {
return false
}
lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
// If it doesn't trigger in a certain period of time , It will trigger a new cycle
return lastgc != 0 && t.now-lastgc > forcegcperiod
case gcTriggerCycle:
// Ask for a new round of GC, If started, skip
return int32(t.n-work.cycles) > 0
}
return true
}gcTriggerTime The trigger time is determined by forcegcperiod Decisive , The default is 2 minute . Let's take a look at the heap memory size trigger GC The situation of .
gcTriggerHeap The allocation of heap memory reaches the trigger heap size calculated by the controller ,heap_live Values are calculated when memory is allocated ,gc_trigger The calculation of is in runtime.gcSetTriggerRatio Function .
func gcSetTriggerRatio(triggerRatio float64) {
// gcpercent By environment variable GOGC decision
if gcpercent >= 0 {
// The default is 1
scalingFactor := float64(gcpercent) / 100
// maximal maxTriggerRatio yes 0.95
maxTriggerRatio := 0.95 * scalingFactor
if triggerRatio > maxTriggerRatio {
triggerRatio = maxTriggerRatio
}
// maximal minTriggerRatio yes 0.6
minTriggerRatio := 0.6 * scalingFactor
if triggerRatio < minTriggerRatio {
triggerRatio = minTriggerRatio
}
} else if triggerRatio < 0 {
triggerRatio = 0
}
memstats.triggerRatio = triggerRatio
trigger := ^uint64(0)
if gcpercent >= 0 {
// Multiply the size of the current tag to survive 1+ coefficient triggerRatio
trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio))
...
}
memstats.gc_trigger = trigger
...
}gcSetTriggerRatio The function will work out according to the calculation triggerRatio To get the next trigger GC What's the heap size of .triggerRatio Every time GC It will be adjusted after , Calculation triggerRatio The function of is gcControllerState.endCycle In the ,gcControllerState.endCycle Will be in MarkDone Called in .
func (c *gcControllerState) endCycle() float64 {
const triggerGain = 0.5
// The goal is Heap growth rate = ( The next time GC After that, the heap size - Heap survival size )/ Heap survival size
goalGrowthRatio := float64(memstats.next_gc-memstats.heap_marked) / float64(memstats.heap_marked)
// actual Heap growth rate , Equal to the total size of / Survival size -1
actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1
// GC The usage time of the marking phase
assistDuration := nanotime() - c.markStartTime
// GC Marking stage of CPU Occupancy rate , The target value is 0.25
utilization := gcBackgroundUtilization
// Add assist utilization; avoid divide by zero.
if assistDuration > 0 {
// assistTime yes G auxiliary GC Total time spent marking objects
// additional CPU Occupancy rate = auxiliary GC The total time the object was marked / (GC Mark usage time * P The number of )// additional CPU Occupancy rate = auxiliary GC The total time the object was marked / (GC Mark usage time * P The number of )
utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs))
}
// Trigger factor offset value = Target growth rate - The original trigger factor - CPU Occupancy rate / The goal is CPU Occupancy rate * ( Real growth rate - The original trigger factor )
triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio)
// Adjust the trigger factor according to the offset value , Adjust only half of the offset at a time
triggerRatio := memstats.triggerRatio + triggerGain*triggerError
return triggerRatio
}about triggerRatio Generally speaking, it's more complicated , We can tell from the deviation :
- The greater the real growth rate , The smaller the trigger factor offset is , Less than 0 Trigger next time GC It's going to be early ;
- CPU The more occupancy , The smaller the trigger factor offset is , Less than 0 Trigger next time GC It's going to be early ;
- The bigger the original trigger coefficient , The smaller the trigger factor offset is , Less than 0 Trigger next time GC It's going to be early ;
Through the above analysis , It also explains why GODEBUG=gctrace=1 In the analysis, it is clear that the heap memory has not reached 2 Times were executed ahead of time , Mainly affected by triggerError The effect of offset leads to .
Start GC
We can call runtime.GC To trigger manually GC. But actually , Trigger GC The entry to is not normally called manually . Normal trigger GC It should be called when applying for memory runtime.mallocgc Or is it Go Monitoring thread in the background sysmon Time check call runtime.forcegchelper.
func GC() {
// obtain GC cycles
n := atomic.Load(&work.cycles)
// Wait for the tag of the previous loop to terminate 、 The mark and clear termination phase is complete
gcWaitOnMark(n)
// Trigger a new round of GC
gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
// ditto
gcWaitOnMark(n + 1)
// Waiting to clean up all pending memory management units
for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) {
sweep.nbgsweep++
// Give up P
Gosched()
}
for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) != 0 {
Gosched()
}
mp := acquirem()
cycle := atomic.Load(&work.cycles)
if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) {
// Publish the snapshot of heap memory state in this phase ( heap profile)
mProf_PostSweep()
}
releasem(mp)
}- First of all, I will get GC The number of cycles , And then call gcWaitOnMark Wait for the tag of the previous loop to terminate 、 The mark and clear termination phase is complete ;
- call gcStart Trigger a new round of GC, And will call gcWaitOnMark Wait for the tag of the current loop to terminate 、 The mark and clear termination phase is complete ;
- call sweepone Waiting to clean up all pending memory management units , And then call Gosched Give up P;
- After finishing the cleaning work of this round of garbage collection , call mProf_PostSweep Publish the snapshot of heap memory state in this phase ;
GC start-up
The picture below is more complete GC technological process , It can be used as a navigation when looking at the source code :
gcStart The function is longer , Let's take a look at the following sections gcStart:
func gcStart(trigger gcTrigger) {
...
// Verify garbage collection conditions , And clean up the memory units that have been marked
for trigger.test() && sweepone() != ^uintptr(0) {
sweep.nbgsweep++
}
// Get global startSema Semaphore
semacquire(&work.startSema)
// Verify the garbage collection conditions again
if !trigger.test() {
semrelease(&work.startSema)
return
}
// Check if it's called manually runtime.GC
work.userForced = trigger.kind == gcTriggerCycle
semacquire(&gcsema)
semacquire(&worldsema)
// Start the background tag task
gcBgMarkStartWorkers()
// Reset flag related state
systemstack(gcResetMarkState)
// work Initialization work
work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
if work.stwprocs > ncpu {
work.stwprocs = ncpu
}
work.heap0 = atomic.Load64(&memstats.heap_live)
work.pauseNS = 0
work.mode = mode
// Record the start time
now := nanotime()
work.tSweepTerm = now
work.pauseStart = now
// Suspend program STW
systemstack(stopTheWorldWithSema)
// Before the concurrency tag , Make sure the cleanup is done
systemstack(func() {
finishsweep_m()
})
// clear sched.sudogcache as well as sync.Pools
clearpools()
// GC frequency
work.cycles++
// At the beginning GC Clean up the state of the controller before , Mark a new round GC Started
gcController.startCycle()
work.heapGoal = memstats.next_gc
// Set the GC Status as _GCmark
// Then enable the write barrier
setGCPhase(_GCmark)
// Initialize the state needed for background scanning
gcBgMarkPrepare() // Must happen before assist enable.
// Scan the stack 、 Global variables and other root objects and add them to the queue
gcMarkRootPrepare()
// Mark all tiny alloc Objects waiting to be merged
gcMarkTinyAllocs()
// Enable mutator assists( Assist thread )
atomic.Store(&gcBlackenEnabled, 1)
// Record the start time of the mark
gcController.markStartTime = now
mp = acquirem()
// Start the program , Background tasks also start to tag objects in the heap
systemstack(func() {
now = startTheWorldWithSema(trace.enabled)
// How long has the record stopped , And mark the start of the phase
work.pauseNS += now - work.pauseStart
work.tMark = now
})
semrelease(&worldsema)
...
}- Two calls
trigger.testCheck if the conditions for garbage collection are met , This function we talked about above ; - call
semacquire(&work.startSema)locked , callgcBgMarkStartWorkersStart the background tag task , We will focus on this later ; - Yes work The structure does the initialization work , Set up what garbage collection needs Goroutine Quantity and completed GC Times, etc ;
- At the beginning GC Previous call
gcController.startCycleClean up the state of the controller , Mark a new round GC Started ; - call setGCPhase Set the GC Status as _GCmark , Then enable the write barrier ;
- call gcBgMarkPrepare Initialize the state needed for background scanning ;
- call gcMarkRootPrepare Put the scan on the stack 、 Global variables and other root objects and add them to the queue ;
- call gcMarkTinyAllocs Mark all tiny alloc Memory block ;
- Set up gcBlackenEnabled , Enable mutator assists( Assist thread );
- After recording the start time of the mark , call startTheWorldWithSema Start the program , Background tasks also start to tag objects in the heap ;
The picture below shows gcStart State changes in the process , as well as STW The way to pause , Write barrier enable cycle :
The above is just a rough description of the functions , Here are some important functions .
startCycle
func (c *gcControllerState) startCycle() {
c.scanWork = 0
c.bgScanCredit = 0
c.assistTime = 0
c.dedicatedMarkTime = 0
c.fractionalMarkTime = 0
c.idleMarkTime = 0
// Set up next_gc minimum value
if memstats.next_gc < memstats.heap_live+1024*1024 {
memstats.next_gc = memstats.heap_live + 1024*1024
}
// gcBackgroundUtilization The default is 0.25
// yes GC Occupied P The target value
totalUtilizationGoal := float64(gomaxprocs) * gcBackgroundUtilization
// dedicatedMarkWorkersNeeded be equal to P The quantity of 25% add 0.5 Take out the decimal point
c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5)
utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1
const maxUtilError = 0.3
if utilError < -maxUtilError || utilError > maxUtilError {
if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal {
c.dedicatedMarkWorkersNeeded--
}
// yes gcMarkWorkerFractionalMode What's the share of P The target value (
c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(gomaxprocs)
} else {
c.fractionalUtilizationGoal = 0
}
if debug.gcstoptheworld > 0 {
c.dedicatedMarkWorkersNeeded = int64(gomaxprocs)
c.fractionalUtilizationGoal = 0
}
for _, p := range allp {
p.gcAssistTime = 0
p.gcFractionalMarkTime = 0
}
// Computational assistance GC Parameters of
c.revise()
}What needs to be noted here is dedicatedMarkWorkersNeeded And fractionalUtilizationGoal Calculation process , This will be calculating work The use of working mode .
Mark tiny alloc
func gcMarkTinyAllocs() {
for _, p := range allp {
// Mark each one P Medium mcache Medium tiny
c := p.mcache
if c == nil || c.tiny == 0 {
continue
}
_, span, objIndex := findObject(c.tiny, 0, 0)
gcw := &p.gcw
// Mark living objects , And add it to gcwork Tag queue
greyobject(c.tiny, 0, 0, span, gcw, objIndex)
}
}tiny block This data structure is also discussed in the memory allocation section , The main thing here is to put all P Medium mcache Medium tiny Find and mark , And then add it to gcwork Tag queue , As for what is gcwork Tag queue , We'll talk about it next in the execution tag .
write Barrier Write barriers
Set up GC When marking a stage, it will judge whether it needs to be turned on or not according to the current setting value write Barrier :
func setGCPhase(x uint32) {
atomic.Store(&gcphase, x)
writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
} The compiler will be there src\cmd\compile\internal\ssa\writebarrier.go Call in writebarrier function , As its notes say :
// writebarrier pass inserts write barriers for store ops (Store, Move, Zero) // when necessary (the condition above). It rewrites store ops to branches // and runtime calls, like // // if writeBarrier.enabled { // gcWriteBarrier(ptr, val) // Not a regular Go call // } else { // *ptr = val // }
In execution Store, Move, Zero Add a write barrier while waiting for the assembly operation .
We can go through dlv Breakpoint found gcWriteBarrier The location of the assembly code is go/src/runtime/asm_amd64.s:1395. The assembly function calls runtime.wbBufFlush take write barrier The cache task is added to GC In the work queue of .
func wbBufFlush(dst *uintptr, src uintptr) {
...
systemstack(func() {
...
wbBufFlush1(getg().m.p.ptr())
})
}
func wbBufFlush1(_p_ *p) {
// Get the pointer to the cache
start := uintptr(unsafe.Pointer(&_p_.wbBuf.buf[0]))
n := (_p_.wbBuf.next - start) / unsafe.Sizeof(_p_.wbBuf.buf[0])
ptrs := _p_.wbBuf.buf[:n]
_p_.wbBuf.next = 0
gcw := &_p_.gcw
pos := 0
for _, ptr := range ptrs {
// Find the object
obj, span, objIndex := findObject(ptr, 0, 0)
if obj == 0 {
continue
}
mbits := span.markBitsForIndex(objIndex)
// Judge whether it has been marked
if mbits.isMarked() {
continue
}
// marked
mbits.setMarked()
// Mark span.
arena, pageIdx, pageMask := pageIndexOf(span.base())
if arena.pageMarks[pageIdx]&pageMask == 0 {
atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
}
if span.spanclass.noscan() {
gcw.bytesMarked += uint64(span.elemsize)
continue
}
ptrs[pos] = obj
pos++
}
// Add objects to gcWork In line
gcw.putBatch(ptrs[:pos])
// Reset write barrier cache
_p_.wbBuf.reset()
}In fact, writing barrier here is the same routine as concurrent markup , You can look at the concurrency tag and then come back .wbBufFlush1 Can traverse write barrier cache , And then call findObject After finding the object, mark it with the flag bit , Finally, add the object to gcWork Scanning in the queue , and Reset write barrier cache .
stopTheWorldWithSema And startTheWorldWithSema
stopTheWorldWithSema And startTheWorldWithSema Is a pair of core functions for pausing and resuming programs .
func stopTheWorldWithSema() {
_g_ := getg()
lock(&sched.lock)
sched.stopwait = gomaxprocs
// Mark gcwaiting, When scheduling, when you see this flag, you will enter the wait
atomic.Store(&sched.gcwaiting, 1)
// Send preemption signals
preemptall()
// Pause the current P
_g_.m.p.ptr().status = _Pgcstop // Pgcstop is only diagnostic.
sched.stopwait--
// Traverse all P , modify P The status of is _Pgcstop Stop running
for _, p := range allp {
s := p.status
if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
if trace.enabled {
traceGoSysBlock(p)
traceProcStop(p)
}
p.syscalltick++
sched.stopwait--
}
}
// Stop being idle P list
for {
p := pidleget()
if p == nil {
break
}
p.status = _Pgcstop
sched.stopwait--
}
wait := sched.stopwait > 0
unlock(&sched.lock)
if wait {
for {
// wait for 100 us
if notetsleep(&sched.stopnote, 100*1000) {
noteclear(&sched.stopnote)
break
}
// Send the preemption signal again
preemptall()
}
}
// Safety inspection
bad := ""
if sched.stopwait != 0 {
bad = "stopTheWorld: not stopped (stopwait != 0)"
} else {
for _, p := range allp {
if p.status != _Pgcstop {
bad = "stopTheWorld: not stopped (status != _Pgcstop)"
}
}
}
if atomic.Load(&freezing) != 0 {
lock(&deadlock)
lock(&deadlock)
}
if bad != "" {
throw(bad)
}
} This method will pass sched.stopwait To see if all the P It's all suspended . First, by calling preemptall Send preemption signal to preempt all running G, Then traverse P Set all States to _Psyscall、 Idle P All suspended , If there's still something to stop P, Wait for them to stop .
func startTheWorldWithSema(emitTraceEvent bool) int64 {
mp := acquirem() // disable preemption because it can be holding p in a local var
// Judge what you receive netpoll Event and add the corresponding G To the queue to be run
if netpollinited() {
list := netpoll(0) // non-blocking
injectglist(&list)
}
lock(&sched.lock)
procs := gomaxprocs
if newprocs != 0 {
procs = newprocs
newprocs = 0
}
// Expand or shrink global processors
p1 := procresize(procs)
// Cancel GC Waiting for the mark
sched.gcwaiting = 0
// If sysmon ( Background monitoring thread ) Wake it up while waiting
if sched.sysmonwait != 0 {
sched.sysmonwait = 0
notewakeup(&sched.sysmonnote)
}
unlock(&sched.lock)
// Wake up a task that can be run P
for p1 != nil {
p := p1
p1 = p1.link.ptr()
if p.m != 0 {
mp := p.m.ptr()
p.m = 0
if mp.nextp != 0 {
throw("startTheWorld: inconsistent mp->nextp")
}
mp.nextp.set(p)
notewakeup(&mp.park)
} else {
// Start M to run P
newm(nil, p, -1)
}
}
startTime := nanotime()
if emitTraceEvent {
traceGCSTWDone()
}
// If there's something free P, And there's no spinning M Wake up or create a M
wakep()
releasem(mp)
return startTime
}startTheWorldWithSema It's much simpler , First of all, from the netpoller Get the task to be processed in and join the global queue ; Then traverse P Linked list , Wake up a task that can be run P.
Create background tags Worker
func gcBgMarkStartWorkers() {
// Traverse all of P
for _, p := range allp {
// If it has been started, it will not be started repeatedly
if p.gcBgMarkWorker == 0 {
// Create for each global processor a Goroutine
go gcBgMarkWorker(p)
// Wait for the task notification semaphore after startup bgMarkReady Go on
notetsleepg(&work.bgMarkReady, -1)
noteclear(&work.bgMarkReady)
}
}
}gcBgMarkStartWorkers It's going to be for everyone P Create... To perform background marking tasks Goroutine, every last Goroutine Will run gcBgMarkWorker,notetsleepg Will wait for gcBgMarkWorker Notification semaphore bgMarkReady Go on .
Although here for each P Started a background marking task , But the only thing that can work at the same time is 25%, The scheduler is scheduling the loop runtime.schedule By calling gcController.findRunnableGCWorker Methods to control .
Before we look at this method , Let's start with a concept , Mark Worker Mode Mark working mode , For now, there are three , These three methods are to ensure the utilization of the marking thread in the background .
type gcMarkWorkerMode int const ( // gcMarkWorkerDedicatedMode indicates that the P of a mark // worker is dedicated to running that mark worker. The mark // worker should run without preemption. gcMarkWorkerDedicatedMode gcMarkWorkerMode = iota // gcMarkWorkerFractionalMode indicates that a P is currently // running the "fractional" mark worker. The fractional worker // is necessary when GOMAXPROCS*gcBackgroundUtilization is not // an integer. The fractional worker should run until it is // preempted and will be scheduled to pick up the fractional // part of GOMAXPROCS*gcBackgroundUtilization. gcMarkWorkerFractionalMode // gcMarkWorkerIdleMode indicates that a P is running the mark // worker because it has nothing else to do. The idle worker // should run until it is preempted and account its time // against gcController.idleMarkTime. gcMarkWorkerIdleMode )
You can see from the code comments that :
- gcMarkWorkerDedicatedMode :P Specifically responsible for tagging objects , Will not be preempted by the scheduler ;
- gcMarkWorkerFractionalMode: The main reason is that the utilization rate of the default marking thread is 25%, So if CPU The kernel is not 4 Multiple , You can't divide by an integer , Start this type of work mode to help garbage collection achieve the utilization target ;
- gcMarkWorkerIdleMode: Express P Currently only the tag thread is running , There is nothing else to do G , It will run the tag task of garbage collection until it is preempted ;
func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
...
// Atoms decrease by the corresponding value , If the decrease is greater than or equal to 0 Then return to true, Otherwise return to false
decIfPositive := func(ptr *int64) bool {
if *ptr > 0 {
if atomic.Xaddint64(ptr, -1) >= 0 {
return true
}
// We lost a race
atomic.Xaddint64(ptr, +1)
}
return false
}
// Reduce dedicatedMarkWorkersNeeded, When successful, the mode of background marking task is Dedicated
if decIfPositive(&c.dedicatedMarkWorkersNeeded) {
_p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
} else if c.fractionalUtilizationGoal == 0 {
// No need for fractional workers.
return nil
} else {
// Time to perform the marking task
delta := nanotime() - gcController.markStartTime
if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
// Nope. No need to run a fractional worker.
return nil
}
_p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
}
gp := _p_.gcBgMarkWorker.ptr()
casgstatus(gp, _Gwaiting, _Grunnable)
return gp
} stay findRunnableGCWorker Will pass dedicatedMarkWorkersNeeded To decide whether to adopt gcMarkWorkerDedicatedMode Of Mark Worker Mode Mark working mode .dedicatedMarkWorkersNeeded Is in gcControllerState.startCycle Initialize in .
I won't paste the formula , stay gcControllerState.startCycle I've already said , Generally speaking, if it is 8 nucleus CPU, that dedicatedMarkWorkersNeeded by 2 , If it is 6 nucleus CPU, Because it can't be 4 to be divisible by , Calculated dedicatedMarkWorkersNeeded by 1, So it needs to be gcMarkWorkerFractionalMode Pattern to ensure CPU Utilization ratio .
gcMarkWorkerIdleMode It will be executed in the scheduler findrunnable Call when seizing :
func findrunnable() (gp *g, inheritTime bool) {
...
stop:
// be in GC Stage words , Access to perform GC Mark the task G
if gcBlackenEnabled != 0 && _p_.gcBgMarkWorker != 0 && gcMarkWorkAvailable(_p_) {
_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
gp := _p_.gcBgMarkWorker.ptr()
// Will local P Of GC Mark special G Position Grunnable
casgstatus(gp, _Gwaiting, _Grunnable)
return gp, false
}
...
}I've seen my 《 Detailed explanation Go Language scheduling loop source code implementation 》 I think all of you know that , When preemptive dispatch runs here , Usually P It's impossible to seize G 了 , I'm going to sleep , Therefore, it is safe to perform the marking task before hibernation .
Students who have not seen the scheduling cycle can see here : Detailed explanation Go Language scheduling loop source code implementation https://www.luozhiyun.com/archives/448 .
Concurrent scan marks
The concurrent scan mark can be summarized as the following parts :
- Put the current P Pack it up parkInfo , And then call gopark Let the current G Go to sleep , Before dormancy will P Of gcBgMarkWorker And G Binding , Waiting to wake up ;
- according to Mark Worker Mode Call different policy calls gcDrain Execution tag ;
- Determine whether all background marking tasks have been completed , And there are no more tasks , call gcMarkDone Ready to enter the completion marking phase ;
Background mark sleep wait
func gcBgMarkWorker(_p_ *p) {
gp := getg()
type parkInfo struct {
m muintptr
attach puintptr
}
gp.m.preemptoff = "GC worker init"
// initialization park
park := new(parkInfo)
gp.m.preemptoff = ""
// Set current M And it's forbidden to seize
park.m.set(acquirem())
// Set current P
park.attach.set(_p_)
// notice gcBgMarkStartWorkers You can continue to deal with
notewakeup(&work.bgMarkReady)
for {
// Let the current G Go to sleep
gopark(func(g *g, parkp unsafe.Pointer) bool {
park := (*parkInfo)(parkp)
releasem(park.m.ptr())
// Set the associated P
if park.attach != 0 {
p := park.attach.ptr()
park.attach.set(nil)
// Put the current G Set to P Of gcBgMarkWorker member
if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) {
return false
}
}
return true
}, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)
...
}
}stay gcBgMarkStartWorkers We see , It will traverse all P , Then for each P Create a team responsible for Mark Work Of G, Although here for each P Started a background marking task , But not everyone P They're going to do the tagging task , The default resource utilization rate of background marking task is 25%, therefore gcBgMarkWorker Will initialize park And will G and P Of gcBgMarkWorker Sleep after binding .
The scheduler is scheduling the loop runtime.schedule By calling gcController.findRunnableGCWorker Methods to control , Let what Mark Work It can be executed , The above code has been pasted , I won't repeat it here .
Background tags
After waking up , We will gcMarkWorkerMode Select different tag execution strategies , Different execution strategies call runtime.gcDrain :
func gcBgMarkWorker(_p_ *p) {
gp := getg()
...
for {
...
// Check P Of gcBgMarkWorker Whether with the current G Agreement , End current task when inconsistent
if _p_.gcBgMarkWorker.ptr() != gp {
break
}
// prohibit G Be seized
park.m.set(acquirem())
// Record the start time
startTime := nanotime()
_p_.gcMarkWorkerStartTime = startTime
decnwait := atomic.Xadd(&work.nwait, -1)
systemstack(func() {
// Set up G Is waiting so that its stack can be scanned
casgstatus(gp, _Grunning, _Gwaiting)
// Determine the mode of background marking task
switch _p_.gcMarkWorkerMode {
default:
throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
case gcMarkWorkerDedicatedMode:
// In this mode P You should focus on the execution of tags
gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
if gp.preempt {
// When preempted, all of the G All kicked to the global run queue
lock(&sched.lock)
for {
gp, _ := runqget(_p_)
if gp == nil {
break
}
globrunqput(gp)
}
unlock(&sched.lock)
}
// Continue with tag execution
gcDrain(&_p_.gcw, gcDrainFlushBgCredit)
case gcMarkWorkerFractionalMode:
// Execution tag
gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
case gcMarkWorkerIdleMode:
// Execution tag , Until it is preempted or reaches a certain amount
gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
}
// recovery G State to run
casgstatus(gp, _Gwaiting, _Grunning)
})
...
}
}I've talked about different Mark Worker Mode The difference between , If you don't remember, you can turn it up . The execution tag section is mainly in switch In judgment , Pass in different parameters according to different modes gcDrain Execute in function .
It should be noted that , The incoming to gcDrain One of them is gcWork The structure of the body , It's equivalent to every P Private cache space , Store objects that need to be scanned , Provides an abstraction of production and consumption tasks for the garbage collector ,, This structure holds two important working buffers wbuf1 and wbuf2:
When we add or delete objects to the structure , It always works first wbuf1 buffer , once wbuf1 Insufficient buffer space or no objects , Will trigger buffer switching , And when both buffers are out of space or empty , Objects are inserted or retrieved from the global working buffer :
func (w *gcWork) tryGet() uintptr {
wbuf := w.wbuf1
...
// wbuf1 When there is no data in the buffer
if wbuf.nobj == 0 {
// wbuf1 And wbuf2 Exchange objects
w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
wbuf = w.wbuf1
if wbuf.nobj == 0 {
owbuf := wbuf
// from work Of full Get... In the queue
wbuf = trygetfull()
...
}
}
wbuf.nobj--
return wbuf.obj[wbuf.nobj]
}Continue with the above gcBgMarkWorker Method , After the marking is finished, the marking is finished :
func gcBgMarkWorker(_p_ *p) {
gp := getg()
...
for {
...
// Add up the time taken
duration := nanotime() - startTime
switch _p_.gcMarkWorkerMode {
case gcMarkWorkerDedicatedMode:
atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
case gcMarkWorkerFractionalMode:
atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
case gcMarkWorkerIdleMode:
atomic.Xaddint64(&gcController.idleMarkTime, duration)
}
incnwait := atomic.Xadd(&work.nwait, +1)
// Determine whether all background marking tasks have been completed , And there are no more tasks
if incnwait == work.nproc && !gcMarkWorkAvailable(nil) {
// Cancel and P The associated
_p_.gcBgMarkWorker.set(nil)
// allow G Be seized
releasem(park.m.ptr())
// Ready to enter the completion marking phase
gcMarkDone()
// It's reconnected before hibernation P
// Because it's allowed to be preempted , When you get here, it may become something else P
// If you re relate P If you fail, the task will end
park.m.set(acquirem())
park.attach.set(_p_)
}
}
}gcBgMarkWorker Will be based on incnwait To see if it's the last worker, And then call gcMarkWorkAvailable Function to verify gcwork Have all the tasks and global tasks been processed , If you're sure it's ok , So called gcMarkDone Enter the completion marking stage .
Mark scan
Let's take a look at gcDrain:
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
gp := getg().m.curg
// Do you want to return when you see the preemption sign
preemptible := flags&gcDrainUntilPreempt != 0
// Whether to calculate the amount of scanning in the background to reduce the waiting time in the assist thread and wake-up G
flushBgCredit := flags&gcDrainFlushBgCredit != 0
// Whether to perform only a certain amount of work
idle := flags&gcDrainIdle != 0
// Record the initial number of scans
initScanWork := gcw.scanWork
checkWork := int64(1<<63 - 1)
var check func() bool
if flags&(gcDrainIdle|gcDrainFractional) != 0 {
// drainCheckThreshold Default 100000
checkWork = initScanWork + drainCheckThreshold
if idle {
check = pollWork
} else if flags&gcDrainFractional != 0 {
check = pollFractionalWorkerExit
}
}
// If the root object is not finished scanning , Scan the root object first
if work.markrootNext < work.markrootJobs {
// It circulates until it is preempted or STW
for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
// Take a value from the root object scan queue
job := atomic.Xadd(&work.markrootNext, +1) - 1
if job >= work.markrootJobs {
break
}
// Perform root object scanning
markroot(gcw, job)
if check != nil && check() {
goto done
}
}
}
...
}gcDrain Function at the beginning , Will be based on flags Choose different strategies for different reasons .
- gcDrainUntilPreempt: When G Return when seized ;
- gcDrainIdle: call
runtime.pollWork, When P Contains other pending G When to return to ; - gcDrainFractional: call
runtime.pollFractionalWorkerExit, When CPU More thanfractionalUtilizationGoalOf 20% When to return to ;
Set the check Variables are then executed runtime.markroot Scan the root object , Every time a scan is completed, it calls check Function to check whether the marking task should be exited , If so, jump to done Exit tag in code block .
After the task is completed, the tag will be obtained :
func gcDrain(gcw *gcWork, flags gcDrainFlags) {
...
// The root object is already in the tag queue , Consumer tag queue
// It circulates until it is preempted or STW
for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
// Put part of the local work back on the global queue
if work.full == 0 {
gcw.balance()
}
// Access to task
b := gcw.tryGetFast()
if b == 0 {
b = gcw.tryGet()
if b == 0 {
wbBufFlush(nil, 0)
b = gcw.tryGet()
}
}
// Can't get the object , Tag queue is empty , Out of the loop
if b == 0 {
break
}
// Scan the acquired object
scanobject(b, gcw)
// If you've scanned a certain number of objects ,gcCreditSlack The value is 2000
if gcw.scanWork >= gcCreditSlack {
// Add the number of objects scanned to the global
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
if flushBgCredit {
// Recording the number of bytes of memory scanned is used to reduce the workload of auxiliary marking
gcFlushBgCredit(gcw.scanWork - initScanWork)
initScanWork = 0
}
checkWork -= gcw.scanWork
gcw.scanWork = 0
if checkWork <= 0 {
checkWork += drainCheckThreshold
if check != nil && check() {
break
}
}
}
}
done:
// Add the number of objects scanned to the global
if gcw.scanWork > 0 {
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
if flushBgCredit {
// Recording the number of bytes of memory scanned is used to reduce the workload of auxiliary marking
gcFlushBgCredit(gcw.scanWork - initScanWork)
}
gcw.scanWork = 0
}
} This is called before getting the cache queue runtime.gcWork.balance, Will gcWork Cache part of the work back to the global queue , This method is mainly used to balance the differences P The load of .
And get gcWork Cache task for , And give the task to scanobject perform , This function will start scanning from the position passed in , And color the active objects you find .runtime.gcFlushBgCredit The number of memory bytes scanned will be recorded to reduce the workload of auxiliary marking .
Let me summarize gcWork In and out of the team .gcWork We're on top of the team scanobject Method , Will get gcWork Cache objects and execute , But at the same time, if you find an active object, you will join the team again gcWork in .
except scanobject outside , Write barriers 、 Both root object scan and stack scan will be directed to gcWork Add extra gray objects to wait for processing .
The root marker
func markroot(gcw *gcWork, i uint32) {
baseFlushCache := uint32(fixedRootCount)
baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
baseBSS := baseData + uint32(work.nDataRoots)
baseSpans := baseBSS + uint32(work.nBSSRoots)
baseStacks := baseSpans + uint32(work.nSpanRoots)
end := baseStacks + uint32(work.nStackRoots)
switch {
// Release mcache All in span, requirement STW
case baseFlushCache <= i && i < baseData:
flushmcache(int(i - baseFlushCache))
// Scan global variables for read and write
case baseData <= i && i < baseBSS:
for _, datap := range activeModules() {
markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
}
// Scan for uninitialized global variables
case baseBSS <= i && i < baseSpans:
for _, datap := range activeModules() {
markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
}
// scanning finalizers queue
case i == fixedRootFinalizers:
for fb := allfin; fb != nil; fb = fb.alllink {
cnt := uintptr(atomic.Load(&fb.cnt))
scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw, nil)
}
// Release suspended G The stack
case i == fixedRootFreeGStacks:
systemstack(markrootFreeGStacks)
// scanning MSpan.specials
case baseSpans <= i && i < baseStacks:
markrootSpans(gcw, int(i-baseSpans))
// Scan each one G The stack
default:
// Get the G
var gp *g
if baseStacks <= i && i < end {
gp = allgs[i-baseStacks]
} else {
throw("markroot: bad index")
}
// Record the time of waiting to start
status := readgstatus(gp) // We are not in a scan state
if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
gp.waitsince = work.tstart
}
// Handed over to g0 scan
systemstack(func() {
userG := getg().m.curg
selfScan := gp == userG && readgstatus(userG) == _Grunning
// If you scan your own , Then change your own g The state of
if selfScan {
casgstatus(userG, _Grunning, _Gwaiting)
userG.waitreason = waitReasonGarbageCollectionScan
}
// Hang up G, Let the corresponding G Stop running
stopped := suspendG(gp)
if stopped.dead {
gp.gcscandone = true
return
}
if gp.gcscandone {
throw("g already scanned")
}
// scanning g The stack
scanstack(gp, gcw)
gp.gcscandone = true
resumeG(stopped)
if selfScan {
casgstatus(userG, _Gwaiting, _Grunning)
}
})
}
}See the scan above BSS and Date I'm also very confused about the related memory blocks , We combine Wikipedia Data segment https://en.wikipedia.org/wiki/Data_segment We can see that :
The .data segment contains any global or static variables which have a pre-defined value and can be modified.
The BSS segment, also known as uninitialized data, is usually adjacent to the data segment.
Data Segments are usually global variables that are initialized in advance ,BSS Segments are usually uninitialized data .
Object scanning
func scanobject(b uintptr, gcw *gcWork) {
// obtain b Of heapBits object
hbits := heapBitsForAddr(b)
// obtain span
s := spanOfUnchecked(b)
// span Corresponding object size
n := s.elemsize
if n == 0 {
throw("scanobject n == 0")
}
// Maximum scan per time 128KB
if n > maxObletBytes {
// Large object. Break into oblets for better
// parallelism and lower latency.
if b == s.base() {
if s.spanclass.noscan() {
// Bypass the whole scan.
gcw.bytesMarked += uint64(n)
return
}
// Put more than 128KB Put the object back gcworker in , Next time we scan
for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
if !gcw.putFast(oblet) {
gcw.put(oblet)
}
}
}
n = s.base() + s.elemsize - b
if n > maxObletBytes {
n = maxObletBytes
}
}
var i uintptr
for i = 0; i < n; i += sys.PtrSize {
// Get the corresponding bit
// Find bits for this word.
if i != 0 {
hbits = hbits.next()
}
bits := hbits.bits()
// Check scan bit Determine whether to continue scanning
if i != 1*sys.PtrSize && bits&bitScan == 0 {
break // no more pointers in this object
}
// If it's not a pointer, continue
if bits&bitPointer == 0 {
continue // not a pointer
}
// Take the value of the pointer
obj := *(*uintptr)(unsafe.Pointer(b + i))
if obj != 0 && obj-b >= n {
// Search objects in heap according to address value
if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
// call greyobject Tag the object and put it in the tag queue
greyobject(obj, b, i, span, gcw, objIndex)
}
}
}
gcw.bytesMarked += uint64(n)
gcw.scanWork += int64(i)
}Auxiliary markers mutator assists
At the beginning of the analysis, we also mentioned something about mutator assists The role of , Mainly to prevent heap It's growing too fast , stay GC In the process of execution, if it runs at the same time G Allocated memory , So this G Will be asked to assist GC Do part of the work , It follows a very simple and simple principle , As much memory is allocated, as many tagging tasks need to be completed .
mutator assists The entrance is at go\src\runtime\malloc.go Of mallocgc Function :
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
...
var assistG *g
if gcBlackenEnabled != 0 {
assistG = getg()
if assistG.m.curg != nil {
assistG = assistG.m.curg
}
// Subtract the memory value
assistG.gcAssistBytes -= int64(size)
if assistG.gcAssistBytes < 0 {
// This G is in debt.
gcAssistAlloc(assistG)
}
}
...
return x
}mallocgc Every time I allocate memory, I check gcAssistBytes Whether the field is negative , This field stores the current Goroutine The number of object bytes for auxiliary tags . If it's negative , Then call gcAssistAlloc From the perspective of overall credit bgScanCredit In order to get :
func gcAssistAlloc(gp *g) {
...
retry:
// Count the number of tagging tasks that need to be done
debtBytes := -gp.gcAssistBytes
scanWork := int64(gcController.assistWorkPerByte * float64(debtBytes))
if scanWork < gcOverAssistWork {
scanWork = gcOverAssistWork
debtBytes = int64(gcController.assistBytesPerWork * float64(scanWork))
}
// Gets the number of bytes of the global auxiliary tag
bgScanCredit := atomic.Loadint64(&gcController.bgScanCredit)
stolen := int64(0)
if bgScanCredit > 0 {
if bgScanCredit < scanWork {
stolen = bgScanCredit
gp.gcAssistBytes += 1 + int64(gcController.assistBytesPerWork*float64(stolen))
} else {
stolen = scanWork
gp.gcAssistBytes += debtBytes
}
// Global credit deduction stolen points
atomic.Xaddint64(&gcController.bgScanCredit, -stolen)
scanWork -= stolen
// Reduced to 0 explain bgScanCredit With enough credit to handle scanWork
if scanWork == 0 {
return
}
}
// So let's go over here bgScanCredit Less than scanWork
// Need to call gcDrainN Complete the specified number of tag tasks and return
systemstack(func() {
// Perform the tagging task
gcAssistAlloc1(gp, scanWork)
})
completed := gp.param != nil
gp.param = nil
if completed {
gcMarkDone()
}
if gp.gcAssistBytes < 0 {
if gp.preempt {
Gosched()
goto retry
}
// If the overall credit is still insufficient, the current Goroutine Fall into hibernation
// Join the global auxiliary tag queue and wait for the background tag task to wake up
if !gcParkAssist() {
goto retry
}
}
}If the overall credit is still insufficient, the current Goroutine Fall into hibernation , Join the global auxiliary tag queue and wait for the background tag task to wake up .
Called when scanning memory gcFlushBgCredit Will be responsible for the wake-up AIDS Goroutine :
func gcFlushBgCredit(scanWork int64) {
// There is no waiting in the secondary queue Goroutine
if work.assistQueue.q.empty() {
// The current credit will be added directly to the global credit bgScanCredit
atomic.Xaddint64(&gcController.bgScanCredit, scanWork)
return
}
scanBytes := int64(float64(scanWork) * gcController.assistBytesPerWork)
lock(&work.assistQueue.lock)
// If the secondary queue is not empty
for !work.assistQueue.q.empty() && scanBytes > 0 {
gp := work.assistQueue.q.pop()
// Wake up the Goroutine
if scanBytes+gp.gcAssistBytes >= 0 {
scanBytes += gp.gcAssistBytes
gp.gcAssistBytes = 0
ready(gp, 0, false)
} else {
gp.gcAssistBytes += scanBytes
scanBytes = 0
work.assistQueue.q.pushBack(gp)
break
}
}
// There's still a lot left to mark , These tagging tasks are added to the global credit
if scanBytes > 0 {
scanWork = int64(float64(scanBytes) * gcController.assistWorkPerByte)
atomic.Xaddint64(&gcController.bgScanCredit, scanWork)
}
unlock(&work.assistQueue.lock)
}gcFlushBgCredit Will get the auxiliary queue for sleep Goroutine , If the current credit is enough , Then it will help Goroutine Wake up the , If there's anything left , Then all of these tagging tasks will be added to the global credit .
Generally speaking, it is a set of mechanisms as follows :
Finish marking
Up there we are gcBgMarkWorker Analysis of , After the tag completes, it calls gcMarkDone Execute the tag to complete the operation .
func gcMarkDone() {
semacquire(&work.markDoneSema)
top:
// Check again if the task is finished
if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
semrelease(&work.markDoneSema)
return
}
semacquire(&worldsema)
gcMarkDoneFlushed = 0
systemstack(func() {
gp := getg().m.curg
casgstatus(gp, _Grunning, _Gwaiting)
// Traverse all P
forEachP(func(_p_ *p) {
// take P Corresponding write barrier buffer Objects in are added to gcWork in
wbBufFlush1(_p_)
// take gcWork To join the global queue
_p_.gcw.dispose()
// Express gcWork All of the data has been migrated to In global queue
if _p_.gcw.flushedWork {
atomic.Xadd(&gcMarkDoneFlushed, 1)
_p_.gcw.flushedWork = false
} else if debugCachedWork {
...
}
...
})
casgstatus(gp, _Gwaiting, _Grunning)
})
if gcMarkDoneFlushed != 0 {
if debugCachedWork {
// Release paused gcWorks.
atomic.Xadd(&gcWorkPauseGen, 1)
}
semrelease(&worldsema)
goto top
}
// Record the start time of the completion marking phase and STW Start time
now := nanotime()
work.tMarkTerm = now
work.pauseStart = now
// prohibit G Be seized
getg().m.preemptoff = "gcing"
// STW
systemstack(stopTheWorldWithSema)
...
// No assistance GC And background marking task running
atomic.Store(&gcBlackenEnabled, 0)
// Wake up all because of assistance GC And dormant G
gcWakeAllAssists()
semrelease(&work.markDoneSema)
schedEnableUser(true)
// Calculate the next trigger gc Needed heap size
nextTriggerRatio := gcController.endCycle()
// Execution marks terminate
gcMarkTermination(nextTriggerRatio)
}gcMarkDone Would call forEachP Function traverses all of P , And the corresponding P Medium gcWork Move the task in to the global queue , If gcWork If there's a task in it, it will gcMarkDoneFlushed Add 1, All over P And then I'll judge if gcMarkDoneFlushed Not for 0, So jump to top The tag bit continues to loop , Until there are no tasks in the local queue .
Next, we'll close gcBlackenEnabled Set to 0, It means to turn off the auxiliary tag and background tag ; Wake up the blocked auxiliary marker protocol ; call schedEnableUser Recover users Goroutine The scheduling ; It should be noted that , At present STW Stage , So awakened Goroutine Not immediately , Will wait STW It's not done until it's over .
Last call gcMarkTermination Execution marks terminate .
Mark termination
func gcMarkTermination(nextTriggerRatio float64) {
// No assistance GC And background marking task running
atomic.Store(&gcBlackenEnabled, 0)
// take GC The phase switches to _GCmarktermination
setGCPhase(_GCmarktermination)
work.heap1 = memstats.heap_live
// Record the start time
startTime := nanotime()
mp := acquirem()
mp.preemptoff = "gcing"
_g_ := getg()
_g_.m.traceback = 2
gp := _g_.m.curg
// Set up G The state is waiting
casgstatus(gp, _Grunning, _Gwaiting)
gp.waitreason = waitReasonGarbageCollection
// Switch to g0 function
systemstack(func() {
// Start STW Tags in
gcMark(startTime)
})
systemstack(func() {
work.heap2 = work.bytesMarked
...
// Set up current GC Stage to close , And disable the write barrier
setGCPhase(_GCoff)
// Wake up the background cleaning task
gcSweep(work.mode)
})
_g_.m.traceback = 0
casgstatus(gp, _Gwaiting, _Grunning)
// Statistics and reset cleaning status related codes
...
// Statistics execution GC Number of times then wake up waiting to be cleaned G
lock(&work.sweepWaiters.lock)
memstats.numgc++
injectglist(&work.sweepWaiters.list)
unlock(&work.sweepWaiters.lock)
// Performance statistics
mProf_NextCycle()
// again start The World
systemstack(func() { startTheWorldWithSema(true) })
// Performance statistics
mProf_Flush()
// Mobile tag queue uses workbuf To free list, So they can be recycled
prepareFreeWorkbufs()
// Release unused stacks
systemstack(freeStackSpans)
// Make sure that each P Of mcache All be flush
systemstack(func() {
forEachP(func(_p_ *p) {
_p_.mcache.prepareForSweep()
})
})
...
semrelease(&worldsema)
semrelease(&gcsema)
releasem(mp)
mp = nil
}gcMarkTermination Mainly do some confirmation work and statistical work . Going into this method will first GC The stage is set to _GCmarktermination, And then call gcMark Method to confirm if all GC The tagging has been done . And then GC The stage is set to _GCoff, call gcSweep Start cleaning up . Next is the omitted data statistics related code , Including the size of memory in use 、GC Time 、CPU Utilization rate, etc. . Finally, do some confirmation work , Make sure that every P Of mcache All be flush , The stacks are all free ,workbuf All moved to free list For recycling and so on .
Backstage cleaning
func gcSweep(mode gcMode) {
if gcphase != _GCoff {
throw("gcSweep being done but phase is not GCoff")
}
lock(&mheap_.lock)
mheap_.sweepgen += 2
// Reset tag bit
mheap_.sweepdone = 0
...
mheap_.pagesSwept = 0
mheap_.sweepArenas = mheap_.allArenas
mheap_.reclaimIndex = 0
mheap_.reclaimCredit = 0
unlock(&mheap_.lock)
if go115NewMCentralImpl {
sweep.centralIndex.clear()
}
// If it's not parallel GC
if !_ConcurrentSweep || mode == gcForceBlockMode {
...
return
}
// Wake up the background cleaning task
lock(&sweep.lock)
if sweep.parked {
sweep.parked = false
ready(sweep.g, 0, true)
}
unlock(&sweep.lock)
}gcSweep The main task is to reset the state of the cleanup phase , Then wake up sweep Clean Goroutine . Background cleaning task is in initialization main Goroutine Called when bgsweep Set up :
gcenable
func gcenable() {
// Kick off sweeping and scavenging.
c := make(chan int, 2)
// Set up asynchronous sweep
go bgsweep(c)
<-c
}bgsweep
func bgsweep(c chan int) {
// Set up cleaning Goroutine
sweep.g = getg()
// Waiting to wake up
lockInit(&sweep.lock, lockRankSweep)
lock(&sweep.lock)
sweep.parked = true
c <- 1
goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)
// Cycle cleaning
for {
// Clean one up span, Then go to scheduling
for sweepone() != ^uintptr(0) {
sweep.nbgsweep++
Gosched()
}
// Release some unused tag queue buffers to heap
for freeSomeWbufs(true) {
Gosched()
}
lock(&sweep.lock)
// Judge sweepdone Whether the flag bit is equal to 0
// If the cleaning is not completed, continue the cycle
if !isSweepDone() {
unlock(&sweep.lock)
continue
}
// Otherwise, let the background cleaning task go to sleep
sweep.parked = true
goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)
}
}bgsweep The task of cleaning is actually done by sweepone On going , It looks in heap memory for the ones to be cleaned up span, And it will return to how much it cleaned up page To heap in , return ^uintptr(0) There is nothing to clean :
func sweepone() uintptr {
_g_ := getg()
sweepRatio := mheap_.sweepPagesPerByte // For debugging
_g_.m.locks++
// Verify that cleaning is complete
if atomic.Load(&mheap_.sweepdone) != 0 {
_g_.m.locks--
return ^uintptr(0)
}
atomic.Xadd(&mheap_.sweepers, +1)
// Find a span And release
var s *mspan
sg := mheap_.sweepgen
for {
if go115NewMCentralImpl {
s = mheap_.nextSpanForSweep()
} else {
s = mheap_.sweepSpans[1-sg/2%2].pop()
}
if s == nil {
atomic.Store(&mheap_.sweepdone, 1)
break
}
if state := s.state.get(); state != mSpanInUse {
continue
}
// span Of sweepgen be equal to mheap.sweepgen - 2, So it means that the current unit needs to be cleaned up
if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
break
}
}
// clear span
npages := ^uintptr(0)
if s != nil {
npages = s.npages
// Reclaiming memory
if s.sweep(false) {
atomic.Xadduintptr(&mheap_.reclaimCredit, npages)
} else {
npages = 0
}
}
_g_.m.locks--
return npages
} Looking for span Will pass when state State and sweepgen Is it equal to mheap.sweepgen - 2 To see if it's necessary to clean up span. It will eventually pass mspan.sweep To clean up .
Let's take a quick look sweep The implementation of the :
func (s *mspan) sweep(preserve bool) bool {
if !go115NewMCentralImpl {
return s.oldSweep(preserve)
}
_g_ := getg()
sweepgen := mheap_.sweepgen
// Count the number of pages cleaned up
atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))
spc := s.spanclass
size := s.elemsize
c := _g_.m.p.ptr().mcache
...
// Calculate the number of objects released
nalloc := uint16(s.countAlloc())
nfreed := s.allocCount - nalloc
s.allocCount = nalloc
s.freeindex = 0 // reset allocation index to start of span.
s.allocBits = s.gcmarkBits
s.gcmarkBits = newMarkBits(s.nelems)
s.refillAllocCache(0)
// Set up span.sweepgen and mheap.sweepgen equal
atomic.Store(&s.sweepgen, sweepgen)
if spc.sizeclass() != 0 {
// Dealing with the recycling of small objects
// Handle spans for small objects.
if nfreed > 0 {
s.needzero = 1
c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
}
if !preserve {
if nalloc == 0 {
// Direct release span Into the pile
mheap_.freeSpan(s)
return true
}
// take span Release to mcentral in
if uintptr(nalloc) == s.nelems {
mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
} else {
mheap_.central[spc].mcentral.partialSwept(sweepgen).push(s)
}
}
} else if !preserve {
// Dealing with the recycling of large objects
if nfreed != 0 {
// Free large object span to heap.
if debug.efence > 0 {
s.limit = 0 // prevent mlookup from finding this span
sysFault(unsafe.Pointer(s.base()), size)
} else {
// Direct release span Into the pile
mheap_.freeSpan(s)
}
c.local_nlargefree++
c.local_largefree += size
return true
}
// Add a large span directly onto the full+swept list.
mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
}
return false
}summary
I used to use java I just had a superficial understanding of the mark clearing algorithm , This is the first time to go deep into Go The three color algorithm of language has brought me huge profits . This article took a long time to write because of the article and memory allocation 、 The relationship between cyclic scheduling is very large , So you have to understand the first two to understand GC Principle .
Limited to my own work and study time , I didn't explain the code related to root tag very clearly , If you are interested in it, you can gcBgMarkWorker Let's study .
Reference
Garbage Collection In Go https://www.ardanlabs.com/blog/2018/12/garbage-collection-in-go-part1-semantics.html
Visualizing Garbage Collection Algorithms https://spin.atomicobject.com/2014/09/03/visualizing-garbage-collection-algorithms/
Proposal: Eliminate STW stack re-scanning https://go.googlesource.com/proposal/+/master/design/17503-eliminate-rescan.md
Uniprocessor Garbage Collection Techniques https://www.cs.cmu.edu/~fp/courses/15411-f14/misc/wilson94-gc.pdf
https://golang.design/under-the-hood/zh-cn/part2runtime/ch08gc/barrier/
golang-garbage-collector https://draveness.me/golang/docs/part3-runtime/ch07-memory/golang-garbage-collector
Golang Source code exploration ( 3、 ... and ) GC Implementation principle of https://www.cnblogs.com/zkweb/p/7880099.html
Real-time garbage collection on general-purpose machines http://www.yuasa.kuis.kyoto-u.ac.jp/~yuasa/YuasasSnapshot.pdf
On-the-Fly Garbage Collection: An Exercise in Cooperation https://lamport.azurewebsites.net/pubs/garbage.pdf
Write Barrier Elision for Concurrent Garbage Collectors https://www.semanticscholar.org/paper/Write-barrier-elision-for-concurrent-garbage-Vechev-Bacon/c046c3388ee506028f0e68f8636b3cd22420ddc0
Garbage-First Garbage Collection http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.63.6386&rep=rep1&type=pdf
https://github.com/golang-design/under-the-hood/issues/20
garbage-collection https://github.com/ardanlabs/gotraining/tree/master/reading#garbage-collection
边栏推荐
- Classic examples of C language 100
- When the game meets NFT, is it "chicken ribs" or "chicken legs"?
- How to learn go language happily? Let's go!
- Easycvr, an urban intelligent video monitoring image analysis platform, plays national standard equipment videos and captures unstable packets for troubleshooting
- March 27, 2021: give you a head node of the linked list, and rotate the linked list
- [playing with Tencent cloud] a solution to the impassability of cross-border access to foreign websites using Tencent cloud CVM
- Talk about some good ways to participate in the project
- Conditional competition overview
- As for IOT safety, 20 CSOs from major manufacturers say
- Solution to the problem that qlineedit setting qdoublevalidator setting range is invalid
猜你喜欢
随机推荐
Cloud native monitoring via blackbox_ Exporter monitoring website
Create a green city and 3D visualization of digital twin natural gas stations
2021-04-02: given a square or rectangular matrix, zigzag printing can be realized.
实现TypeScript运行时类型检查
Implement typescript runtime type checking
[play Tencent cloud] experience and development of game multimedia engine (II)
构建跨公链平台解决DApp开发问题
Can yangjianyun's new media operation in 2021 bear all the expectations of the enterprise's private domain traffic demand?
About with admin option and with grant option
Development of block hash game guessing system (mature code)
Live broadcast Preview - on April 1, I made an appointment with you to explore tcapulusdb with Tencent cloud
Meituan financial report: making money while burning money
New MySQL 8.0 feature - enhanced logical backup recovery
Using consistent hash algorithm in Presto to enhance the data cache locality of dynamic clusters
Users of the Tiktok open platform are authorized to obtain the user's fan statistics and short video data
Learn typescript with VAM (phase 1)
H265 video streaming web page without plug-in player easywasmlayer Troubleshooting and solution of JS unable to set cover photo
What is the problem that the data is not displayed on the login web page after the configuration of the RTSP video security intelligent monitoring system easynvr is completed
Explore cloudera manager management software tuning (1)
Comparison of similarities and differences between easynvr video edge computing gateway and easynvr software versions

