当前位置：网站首页>Go language GC implementation principle and source code analysis

Go language GC implementation principle and source code analysis

2022-06-24 17:23:00 【luozhiyun】

Please state the source of reprint ~, This article was published at luozhiyun The blog of ：https://www.luozhiyun.com/archives/475
This article uses Go Source code 1.15.7

Introduce

Tricolor notation

The three color marking method divides the color of the object into black 、 ash 、 white , Three colors .

black ： The object has been marked , And all the attributes under the object have been marked （ Objects needed by the program ）;
gray ： The object has been marked , But the attributes under the object are not marked completely （GC You need to find garbage from this object ）;
white ： The object has not been marked （ Object garbage ）;

When the garbage collector starts working , from GC Roots Start traversal access , The access steps can be divided into the following steps ：

GC Roots The root object is grayed out ;
Then get the object from the gray set , Mark it black , Mark the object to which the object refers as gray ;
Repeat step 2, Until there is no grey set to mark ;
After the end , The remaining unmarked white objects are GC Roots Unreachable , Can be recycled .

The process is as follows ：

TRI-COLOR

Now let's talk about the problems of three color marking method .

The problem of tricolor marking method

Multi label - Floating garbage problem

hypothesis E It's been marked （ It's gray ）, here D and E Broken reference , In principle, the object E/F/G Should be recycled , But because E It's gray , It will continue to traverse as a live object . The end result is ： This part of the object will still be marked as alive , This is the current round GC This memory will not be recycled .

This part should have been recycled But there is no recycled memory , It's called “ Floating garbage ”. The process is shown in the figure below ：

TRI-COLOR2

Missing mark - Hanging pointer problem

In addition to the above problems , There is also the problem of missing labels . When GC The thread has traversed to E Turn grey ,D When it turns black , gray E Disconnect the white quote G , black D Quoted white G. Cut back at this point GC The thread continues to run , because E There is no right to G Of quoted , So it will not G Put it in the grey set . Although because of D Re quoted G, But because D It's already black , No more traversal processing .

The final result is ：G Will stay in the white set all the time , Finally, it's cleaned up as garbage . This directly affects the correctness of the application , It's unacceptable , This is also Go Need to be in GC It's a problem to solve .

TRI-COLOR3

Memory barrier

in order to solve The hanging pointer problem above , We need to introduce barrier technology to ensure data consistency .

A memory barrier, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memoryoperations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

Memory barrier , It's a barrier command , It enables CPU Or the compiler enforces memory operations issued before and after the barrier instruction Sort constraints , Operations performed before the memory barrier must precede those performed after the memory barrier .

So in order to ensure the correctness in the marking algorithm , Then we need to meet any of the following conditions ：

Strong trichromatic invariance （strong tri-color invariant）： Black objects don't point to white objects , It only points to gray objects or black objects ;
Weak trichromatic invariance （weak tri-color invariant）： Even if the black object points to the white object , So starting from the grey object , There is always a path to find the white object ;

Depending on the type of operation , We can divide the memory barrier into Read barrier（ Reading barrier ） and Write barrier（ Write barriers ） Two kinds of , stay Go All of them use Write barrier（ Write barriers ）, The reason is 《Uniprocessor Garbage Collection Techniques》 It also mentions ：

If a non copying collector is used the use of a read barrier is an unnecessary expense.there is no need to protect the mutator from seeing an invalid version of a pointer. Write barrier techniques are cheaper, because heap writes are several times less common than heap reads

For a garbage collector that doesn't need to copy objects , Read barrier（ Reading barrier ） The price is high , Because for this kind of garbage collector, there is no need to save the version pointer of the read operation . relatively speaking Write barrier（ Write barriers ） The code is smaller , Because the write operations in the heap are much smaller than the read operations in the heap .

Let's take a look at Write barrier（ Write barriers ） How does this work .

Dijkstra Write barrier

Go 1.7 It used to be Dijkstra Write barrier（ Write barriers ）, The implementation used is similar to the following pseudo code ：

writePointer(slot, ptr):
    shade(ptr)
    *slot = ptr

If the object is white ,shade(ptr) Objects are marked gray . This guarantees strong trichromatic invariance , It will guarantee ptr The object the pointer points to is assigned to *slot The front is not white .

as follows , The root object points to D Objects are marked black and D Object refers to the object E It's marked in gray ; If D Disconnect right E References to , Change to quote B object , Then the trigger write barrier will B Objects are marked in gray .

TRI-COLOR4

Dijkstra Write barrier Although the implementation is very simple , It also guarantees strong trichromatic invariance , But in 《Proposal: Eliminate STW stack re-scanning》 It has some disadvantages ：

In particular, it presents a trade-off for pointers on stacks: either writes to pointers on the stack must have write barriers, which is prohibitively expensive, or stacks must be permagrey.

Because objects on the stack are also considered root objects in garbage collection , So you can either add objects to the stack , But this can add a lot of extra overhead to writing pointers ; Or when a write on the stack occurs , Mark stack as constant gray （permagrey）.

Go 1.7 The choice is to mark the stack as constant gray , But it needs to be in the tag termination phase STW Rescan these stacks when you need to （re-scan）. The reason is described below ：

without stack write barriers, we can‘t ensure that the stack won’t later contain a reference to a white object, so a scanned stack is only black until its goroutine executes again, at which point it conservatively reverts to grey. Thus, at the end of the cycle, the garbage collector must re-scan grey stacks to blacken them and finish marking any remaining heap pointers.

Yuasa Write barrier

Yuasa Write barrier yes Yuasa stay 《Real-time garbage collection on general-purpose machines》 A deletion barrier proposed in （deletion barrier） technology . The idea is when the assignor removes the white pointer from a gray or white object , This behavior is notified to the concurrent collector through the write barrier .

The algorithm will use the following write barrier to ensure the correctness of the program in incremental or concurrent garbage collection , The pseudo code is implemented as follows ：

writePointer(slot, ptr)
    shade(*slot)
    *slot = ptr

To prevent losing the path from gray objects to white objects , It should be assumed that slot It could turn black , To make sure ptr Will not be assigned to slot It turns white before ,shade(slot) Will be will be slot Marked in gray , Then the write always creates a path from gray to gray or gray to white objects , In this way, removing the write barrier ensures weak trichromatic invariance , The downstream objects referenced by old objects must be referenced by gray objects .

YuasaWritebarrier

Hybrid write barrier

It says it's here Go 1.7 It used to be Dijkstra Write barrier（ Write barriers ） To keep the three color invariance .Go When rescanning, you must ensure that the reference of the object does not change , So there will be a pause （STW）、 Mark all stack objects gray and rescan , It usually consumes 10～100 Time in milliseconds .

adopt Proposal: Eliminate STW stack re-scanning https://go.googlesource.com/proposal/+/master/design/17503-eliminate-rescan.md Introduction to , We can know that in order to eliminate the performance loss caused by rescanning ,Go stay 1.8 When you use Hybrid write barrier（ Hybrid write barrier ）, Combined with the Yuasa write barrier and Dijkstra write barrier , The pseudo code of the implementation is as follows ：

writePointer(slot, ptr):
    shade(*slot)
    if current stack is grey:
        shade(ptr)
    *slot = ptr

This not only simplifies GC The process of , At the same time, it reduces the cost of rescanning in the tag termination phase . The basic idea of hybrid writing barrier is ：

the write barrier shades the object whose reference is being overwritten, and, if the current goroutine's stack has not yet been scanned, also shades the reference being installed.

Which translates as ： Coloring the object being covered , And if the current stack is not scanned , The pointer is also colored .

meanwhile , stay GC All newly assigned objects will immediately turn black during the process , In memory allocation go\src\runtime\malloc.go Of mallocgc You can see in the function ：

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer { 
	...  
	dataSize := size
	//  obtain mcache, It is used to deal with the allocation of micro objects and small objects 
	c := gomcache()
	var x unsafe.Pointer
	//  Indicates whether the object contains a pointer ,true Indicates that there is no pointer in the object 
	noscan := typ == nil || typ.ptrdata == 0
	// maxSmallSize=32768 32k
	if size <= maxSmallSize {
		// maxTinySize= 16 bytes 
		if noscan && size < maxTinySize {
			...
		} else {
			...
		}
		//  Greater than  32 Kb  Memory allocation , adopt  mheap  Distribute 
	} else {
		...
	} 
	...
	//  stay  GC  New objects assigned during the period are marked black 
	if gcphase != _GCoff {
		gcmarknewobject(span, uintptr(x), size, scanSize)
	}
	...
	return x
}

In the marking phase of garbage collection , Mark the new object black , Prevent objects in newly allocated stack memory and heap memory from being incorrectly recycled .

analysis

GC phase Garbage collection stage

GC The relevant code is in runtime/mgc.go Under the document . Through the introduction of notes, we can know that GC Divided into 4 Stages ：

sweep termination（ Clean up terminated ）
1. Will trigger STW , be-all P（ processor ） Will enter safe-point（ safer ）;
2. Clean up what has not been cleaned up span , Don't know what is span My classmates can have a look at my ： Detailed explanation Go Memory allocation source code implementation https://www.luozhiyun.com/archives/434;
the mark phase（ Marking stage ）
1. take _GCoff GC state Change to _GCmark, Turn on Write Barrier （ Write barrier ）、mutator assists（ Assist thread ）, Team the root object ;
2. Restore program execution ,mark workers（ Tag process ） and mutator assists（ Assist thread ） Will start marking objects in memory concurrently . For any pointer write and new pointer value , Will be covered by a writing barrier , All newly created objects are marked directly in black ;
3. GC Execute the marking of the root node , This includes scanning all stacks 、 Global objects and runtime data structures that are not in the heap . scanning goroutine Stack painting leads to goroutine stop it , And gray all pointers found on the stack , And then go ahead and do it goroutine.
4. GC When traversing the gray object queue , It turns gray objects into black , And grays out the object that the object points to ;
5. GC The distributed algorithm will terminate （distributed termination algorithm） To detect when there are no more root tag jobs or gray objects , If not GC It's going to be mark termination（ Mark termination ）;
mark termination（ Mark termination ）
1. STW, And then GC The stage changes to _GCmarktermination, close GC Worker threads and mutator assists（ Assist thread ）;
2. Perform cleanup , Such as flush mcache;
the sweep phase（ Clean up phase ）
1. take GC The state changes to _GCoff, Initialize cleanup state and close Write Barrier（ Write barrier ）;
2. Restore program execution , From then on, the newly created objects are all white ;
3. Background concurrent cleaning of all memory management units

It should be noted that , It's mentioned above that mutator assists, Because there's a situation ：

during the collection that the Goroutine dedicated to GC will not finish the Marking work before the heap memory in-use reaches its limit

because GC The job of tagging is distribution 25% Of CPU To carry out GC operation , So it's possible GC The tag worker thread of is slower than the allocated memory of the application , Lead to endless marking , At this time, you need the thread of the application to help complete the marking work ：

If the collector determines that it needs to slow down allocations, it will recruit the application Goroutines to assist with the Marking work. This is called a Mutator Assist. One positive side effect of Mutator Assist is that it helps to finish the collection faster.

The next time GC opportunity

The next time GC The timing can be determined by an environment variable GOGC To control , The default is 100 , Growth 100% It will trigger only when there is more memory in the heap GC.

This value represents a ratio of how much new heap memory can be allocated before the next collection has to start.

The official explanation is , If you are currently using 4M Memory , So next time GC Will reach... In memory 8M When . Let's take a look at a specific example ：

package main

import (
	"fmt"
)

func main() {
	fmt.Println("start.")

	container := make([]int, 8)
	fmt.Println("> loop.")
	for i := 0; i < 32*1000*1000; i++ {
		container = append(container, i)
	}
	fmt.Println("< loop.")
}

It should be noted that , It is recommended to use when doing experiments Linux Environmental Science , without Linux The environment can be like me in win10 Next came a virtual machine , And then use vscode Remote to Linux The experiment was carried out , Let's have a try .

After compiling , have access to gctrace track GC situation ：

[[email protected] gotest]# go build main.go 
[[email protected] gotest]# GODEBUG=gctrace=1 ./main
start.
> loop.
gc 1 @0.004s 4%: 0.22+1.4+0.021 ms clock, 1.7+0.009/0.40/0.073+0.16 ms cpu, 4->5->1 MB, 5 MB goal, 8 P
gc 2 @0.006s 4%: 0.10+1.6+0.020 ms clock, 0.83+0.12/0/0+0.16 ms cpu, 4->6->1 MB, 5 MB goal, 8 P
gc 3 @0.009s 16%: 0.035+5.5+2.2 ms clock, 0.28+0/0.47/0.007+18 ms cpu, 4->15->15 MB, 5 MB goal, 8 P
...
< loop.

It shows 3 Time GC The situation of , Now let's see what it means ：

gc 1 @0.004s 4%: 0.22+1.4+0.021 ms clock, 1.7+0.009/0.40/0.073+0.16 ms cpu, 4->5->1 MB, 5 MB goal, 8 P

gc 1 ： For the first time since the program started 1 Time GC
@0.004s： The time since the program started 
4%： So far ,GC  It's used for marking work CPU Time accounts for the total CPU Percent of 

 Garbage collection time 
0.22 ms： Mark the beginning  STW  Time 
1.4 ms： Concurrent mark time 
0.021 ms： Mark termination  STW  Time 

 Garbage collection takes up cpu Time 
1.7 ms： Mark the beginning  STW  Time 
0.009 ms：mutator assists Time taken up 
0.40 ms： Time spent marking threads 
0.073 ms：idle mark workers Time taken up 
0.16 ms： Mark termination  STW  Time 

 Memory 
4 MB： Heap size before marking starts 
5 MB： The size of the heap after the mark 
1 MB： The size of the surviving heap after marking is complete 
5 MB goal： Target size of heap memory in use after marking is complete 

8 P： How many processors are used

GCTrace

From the above GC In memory information you can see , stay GC Before the tag starts, the heap size is 4MB, Because tagging is done concurrently , So when the tag is complete, the size used in the heap is 5MB, This means that there is 1MB The allocation of memory occurs in GC period . Finally, we can see GC After marking, the surviving heap size is only 1MB, This also means that the heap can occupy up to 2MB Start the next round at the end of the day GC.

We can see from above Goal Part of the memory size is 5MB, And the actual In-Use After Part of the memory footprint is equal , In many complex cases, it's not equal , because Goal Part of the memory size is calculated based on the current memory usage .

the goal is calculated based on the current amount of the heap memory in-use, the amount of heap memory marked as live, and timing calculations about the additional allocations that will occur while the collection is running.

Trigger GC Conditions

Trigger GC The condition is that gcTrigger.test To check , So let's see gcTrigger.test How to determine if garbage collection needs to be triggered ：

func (t gcTrigger) test() bool {
	if !memstats.enablegc || panicking != 0 || gcphase != _GCoff {
		return false
	}
	switch t.kind {
	case gcTriggerHeap: 
		//  The allocation of heap memory reaches the trigger heap size calculated by the controller 
		return memstats.heap_live >= memstats.gc_trigger
	case gcTriggerTime:
		if gcpercent < 0 {
			return false
		}
		lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime))
		//  If it doesn't trigger in a certain period of time , It will trigger a new cycle 
		return lastgc != 0 && t.now-lastgc > forcegcperiod
	case gcTriggerCycle:
		//  Ask for a new round of GC,  If started, skip 
		return int32(t.n-work.cycles) > 0
	}
	return true
}

gcTriggerTime The trigger time is determined by forcegcperiod Decisive , The default is 2 minute . Let's take a look at the heap memory size trigger GC The situation of .

gcTriggerHeap The allocation of heap memory reaches the trigger heap size calculated by the controller ,heap_live Values are calculated when memory is allocated ,gc_trigger The calculation of is in runtime.gcSetTriggerRatio Function .

func gcSetTriggerRatio(triggerRatio float64) { 
	// gcpercent  By environment variable  GOGC  decision 
	if gcpercent >= 0 {
		//  The default is  1
		scalingFactor := float64(gcpercent) / 100 
		//  maximal  maxTriggerRatio  yes  0.95
		maxTriggerRatio := 0.95 * scalingFactor
		if triggerRatio > maxTriggerRatio {
			triggerRatio = maxTriggerRatio
		}

		//  maximal  minTriggerRatio  yes  0.6
		minTriggerRatio := 0.6 * scalingFactor
		if triggerRatio < minTriggerRatio {
			triggerRatio = minTriggerRatio
		}
	} else if triggerRatio < 0 { 
		triggerRatio = 0
	}
	memstats.triggerRatio = triggerRatio

	trigger := ^uint64(0)
	if gcpercent >= 0 {
        //  Multiply the size of the current tag to survive 1+ coefficient triggerRatio
		trigger = uint64(float64(memstats.heap_marked) * (1 + triggerRatio))
		...
	}
	memstats.gc_trigger = trigger
	...
}

gcSetTriggerRatio The function will work out according to the calculation triggerRatio To get the next trigger GC What's the heap size of .triggerRatio Every time GC It will be adjusted after , Calculation triggerRatio The function of is gcControllerState.endCycle In the ,gcControllerState.endCycle Will be in MarkDone Called in .

func (c *gcControllerState) endCycle() float64 {
	const triggerGain = 0.5
	//  The goal is Heap growth rate  = （ The next time  GC  After that, the heap size  -  Heap survival size ）/  Heap survival size 
	goalGrowthRatio := float64(memstats.next_gc-memstats.heap_marked) / float64(memstats.heap_marked)
	//  actual Heap growth rate ,  Equal to the total size of / Survival size -1
	actualGrowthRatio := float64(memstats.heap_live)/float64(memstats.heap_marked) - 1
	// GC The usage time of the marking phase 
	assistDuration := nanotime() - c.markStartTime
	// GC Marking stage of CPU Occupancy rate ,  The target value is 0.25
	utilization := gcBackgroundUtilization
	// Add assist utilization; avoid divide by zero.
	if assistDuration > 0 {
		// assistTime  yes G auxiliary GC Total time spent marking objects 
		//  additional CPU Occupancy rate  =  auxiliary GC The total time the object was marked  / (GC Mark usage time  * P The number of )//  additional CPU Occupancy rate  =  auxiliary GC The total time the object was marked  / (GC Mark usage time  * P The number of )
		utilization += float64(c.assistTime) / float64(assistDuration*int64(gomaxprocs))
	}
	//  Trigger factor offset value  =  Target growth rate  -  The original trigger factor  - CPU Occupancy rate  /  The goal is CPU Occupancy rate  * ( Real growth rate  -  The original trigger factor )
	triggerError := goalGrowthRatio - memstats.triggerRatio - utilization/gcGoalUtilization*(actualGrowthRatio-memstats.triggerRatio)

	//  Adjust the trigger factor according to the offset value ,  Adjust only half of the offset at a time 
	triggerRatio := memstats.triggerRatio + triggerGain*triggerError
 
	return triggerRatio
}

about triggerRatio Generally speaking, it's more complicated , We can tell from the deviation ：

The greater the real growth rate , The smaller the trigger factor offset is , Less than 0 Trigger next time GC It's going to be early ;
CPU The more occupancy , The smaller the trigger factor offset is , Less than 0 Trigger next time GC It's going to be early ;
The bigger the original trigger coefficient , The smaller the trigger factor offset is , Less than 0 Trigger next time GC It's going to be early ;

Through the above analysis , It also explains why GODEBUG=gctrace=1 In the analysis, it is clear that the heap memory has not reached 2 Times were executed ahead of time , Mainly affected by triggerError The effect of offset leads to .

Start GC

We can call runtime.GC To trigger manually GC. But actually , Trigger GC The entry to is not normally called manually . Normal trigger GC It should be called when applying for memory runtime.mallocgc Or is it Go Monitoring thread in the background sysmon Time check call runtime.forcegchelper.

func GC() {
	//  obtain  GC  cycles 
	n := atomic.Load(&work.cycles)
	//  Wait for the tag of the previous loop to terminate 、 The mark and clear termination phase is complete 
	gcWaitOnMark(n)
	//  Trigger a new round of  GC
	gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1})
	//  ditto 
	gcWaitOnMark(n + 1) 
	//  Waiting to clean up all pending memory management units 
	for atomic.Load(&work.cycles) == n+1 && sweepone() != ^uintptr(0) {
		sweep.nbgsweep++
		//  Give up  P
		Gosched()
	} 

	for atomic.Load(&work.cycles) == n+1 && atomic.Load(&mheap_.sweepers) != 0 {
		Gosched()
	}
 
	mp := acquirem()
	cycle := atomic.Load(&work.cycles)
	if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) {
		//  Publish the snapshot of heap memory state in this phase （ heap profile）
		mProf_PostSweep()
	}
	releasem(mp)
}

First of all, I will get GC The number of cycles , And then call gcWaitOnMark Wait for the tag of the previous loop to terminate 、 The mark and clear termination phase is complete ;
call gcStart Trigger a new round of GC, And will call gcWaitOnMark Wait for the tag of the current loop to terminate 、 The mark and clear termination phase is complete ;
call sweepone Waiting to clean up all pending memory management units , And then call Gosched Give up P;
After finishing the cleaning work of this round of garbage collection , call mProf_PostSweep Publish the snapshot of heap memory state in this phase ;

GC start-up

The picture below is more complete GC technological process , It can be used as a navigation when looking at the source code ：

GC Start

gcStart The function is longer , Let's take a look at the following sections gcStart：

func gcStart(trigger gcTrigger) {
	...
	//  Verify garbage collection conditions  , And clean up the memory units that have been marked 
	for trigger.test() && sweepone() != ^uintptr(0) {
		sweep.nbgsweep++
	} 
	//  Get global  startSema Semaphore 
	semacquire(&work.startSema) 
	//  Verify the garbage collection conditions again 
	if !trigger.test() {
		semrelease(&work.startSema)
		return
	}
	//  Check if it's called manually  runtime.GC
	work.userForced = trigger.kind == gcTriggerCycle 
 
	semacquire(&gcsema)
	semacquire(&worldsema) 
	//  Start the background tag task 
	gcBgMarkStartWorkers()
	//  Reset flag related state 
	systemstack(gcResetMarkState)

	// work  Initialization work 
	work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs
	if work.stwprocs > ncpu { 
		work.stwprocs = ncpu
	} 
	work.heap0 = atomic.Load64(&memstats.heap_live)
	work.pauseNS = 0
	work.mode = mode 
	//  Record the start time 
	now := nanotime()
	work.tSweepTerm = now
	work.pauseStart = now
	//  Suspend program  STW
	systemstack(stopTheWorldWithSema) 
	//  Before the concurrency tag , Make sure the cleanup is done 
	systemstack(func() {
		finishsweep_m()
	})
	//  clear sched.sudogcache  as well as  sync.Pools
	clearpools()
	// GC  frequency 
	work.cycles++
	//  At the beginning  GC  Clean up the state of the controller before , Mark a new round GC Started 
	gcController.startCycle()
	work.heapGoal = memstats.next_gc 
	//  Set the GC Status as _GCmark
	//  Then enable the write barrier 
	setGCPhase(_GCmark)
	//  Initialize the state needed for background scanning 
	gcBgMarkPrepare() // Must happen before assist enable.
	//  Scan the stack 、 Global variables and other root objects and add them to the queue 
	gcMarkRootPrepare() 
	//  Mark all tiny alloc Objects waiting to be merged 
	gcMarkTinyAllocs() 
	//  Enable  mutator  assists（ Assist thread ）
	atomic.Store(&gcBlackenEnabled, 1)
	//  Record the start time of the mark 
	gcController.markStartTime = now
	mp = acquirem()
	//  Start the program , Background tasks also start to tag objects in the heap 
	systemstack(func() {
		now = startTheWorldWithSema(trace.enabled)
		//  How long has the record stopped ,  And mark the start of the phase 
		work.pauseNS += now - work.pauseStart
		work.tMark = now
	}) 
	semrelease(&worldsema)
	...
}

Two calls trigger.test Check if the conditions for garbage collection are met , This function we talked about above ;
call semacquire(&work.startSema) locked , call gcBgMarkStartWorkers Start the background tag task , We will focus on this later ;
Yes work The structure does the initialization work , Set up what garbage collection needs Goroutine Quantity and completed GC Times, etc ;
At the beginning GC Previous call gcController.startCycle Clean up the state of the controller , Mark a new round GC Started ;
call setGCPhase Set the GC Status as _GCmark , Then enable the write barrier ;
call gcBgMarkPrepare Initialize the state needed for background scanning ;
call gcMarkRootPrepare Put the scan on the stack 、 Global variables and other root objects and add them to the queue ;
call gcMarkTinyAllocs Mark all tiny alloc Memory block ;
Set up gcBlackenEnabled , Enable mutator assists（ Assist thread ）;
After recording the start time of the mark , call startTheWorldWithSema Start the program , Background tasks also start to tag objects in the heap ;

The picture below shows gcStart State changes in the process , as well as STW The way to pause , Write barrier enable cycle ：

GC Start2

The above is just a rough description of the functions , Here are some important functions .

startCycle

func (c *gcControllerState) startCycle() {
	c.scanWork = 0
	c.bgScanCredit = 0
	c.assistTime = 0
	c.dedicatedMarkTime = 0
	c.fractionalMarkTime = 0
	c.idleMarkTime = 0
	//  Set up  next_gc  minimum value  
	if memstats.next_gc < memstats.heap_live+1024*1024 {
		memstats.next_gc = memstats.heap_live + 1024*1024
	}
	// gcBackgroundUtilization  The default is  0.25
	//  yes GC Occupied P The target value 
	totalUtilizationGoal := float64(gomaxprocs) * gcBackgroundUtilization
	// dedicatedMarkWorkersNeeded  be equal to P The quantity of 25%  add  0.5  Take out the decimal point 
	c.dedicatedMarkWorkersNeeded = int64(totalUtilizationGoal + 0.5)
	utilError := float64(c.dedicatedMarkWorkersNeeded)/totalUtilizationGoal - 1
	const maxUtilError = 0.3
	if utilError < -maxUtilError || utilError > maxUtilError {
		if float64(c.dedicatedMarkWorkersNeeded) > totalUtilizationGoal {
			c.dedicatedMarkWorkersNeeded--
		}
		//  yes  gcMarkWorkerFractionalMode  What's the share of P The target value (
		c.fractionalUtilizationGoal = (totalUtilizationGoal - float64(c.dedicatedMarkWorkersNeeded)) / float64(gomaxprocs)
	} else {
		c.fractionalUtilizationGoal = 0
	}

	if debug.gcstoptheworld > 0 {
		c.dedicatedMarkWorkersNeeded = int64(gomaxprocs)
		c.fractionalUtilizationGoal = 0
	}
	for _, p := range allp {
		p.gcAssistTime = 0
		p.gcFractionalMarkTime = 0
	}
	//  Computational assistance GC Parameters of 
	c.revise()
}

What needs to be noted here is dedicatedMarkWorkersNeeded And fractionalUtilizationGoal Calculation process , This will be calculating work The use of working mode .

Mark tiny alloc

func gcMarkTinyAllocs() {
	for _, p := range allp {
        //  Mark each one  P  Medium  mcache  Medium  tiny
		c := p.mcache
		if c == nil || c.tiny == 0 {
			continue
		}
		_, span, objIndex := findObject(c.tiny, 0, 0)
		gcw := &p.gcw
        //  Mark living objects , And add it to  gcwork  Tag queue 
		greyobject(c.tiny, 0, 0, span, gcw, objIndex)
	}
}

tiny block This data structure is also discussed in the memory allocation section , The main thing here is to put all P Medium mcache Medium tiny Find and mark , And then add it to gcwork Tag queue , As for what is gcwork Tag queue , We'll talk about it next in the execution tag .

write Barrier Write barriers

Set up GC When marking a stage, it will judge whether it needs to be turned on or not according to the current setting value write Barrier ：

func setGCPhase(x uint32) {
	atomic.Store(&gcphase, x)
	writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
	writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
}

The compiler will be there src\cmd\compile\internal\ssa\writebarrier.go Call in writebarrier function , As its notes say ：

// writebarrier pass inserts write barriers for store ops (Store, Move, Zero) // when necessary (the condition above). It rewrites store ops to branches // and runtime calls, like // // if writeBarrier.enabled { // gcWriteBarrier(ptr, val) // Not a regular Go call // } else { // *ptr = val // }

In execution Store, Move, Zero Add a write barrier while waiting for the assembly operation .

We can go through dlv Breakpoint found gcWriteBarrier The location of the assembly code is go/src/runtime/asm_amd64.s:1395. The assembly function calls runtime.wbBufFlush take write barrier The cache task is added to GC In the work queue of .

func wbBufFlush(dst *uintptr, src uintptr) {
	...
	systemstack(func() {
		...
		wbBufFlush1(getg().m.p.ptr())
	})
}

func wbBufFlush1(_p_ *p) {
	//  Get the pointer to the cache 
	start := uintptr(unsafe.Pointer(&_p_.wbBuf.buf[0]))
	n := (_p_.wbBuf.next - start) / unsafe.Sizeof(_p_.wbBuf.buf[0])
	ptrs := _p_.wbBuf.buf[:n]

	_p_.wbBuf.next = 0

	  
	gcw := &_p_.gcw
	pos := 0
	for _, ptr := range ptrs {
		//  Find the object 
		obj, span, objIndex := findObject(ptr, 0, 0)
		if obj == 0 {
			continue
		} 
		mbits := span.markBitsForIndex(objIndex)
		//  Judge whether it has been marked 
		if mbits.isMarked() {
			continue
		}
		//  marked 
		mbits.setMarked()

		//  Mark  span.
		arena, pageIdx, pageMask := pageIndexOf(span.base())
		if arena.pageMarks[pageIdx]&pageMask == 0 {
			atomic.Or8(&arena.pageMarks[pageIdx], pageMask)
		}

		if span.spanclass.noscan() {
			gcw.bytesMarked += uint64(span.elemsize)
			continue
		}
		ptrs[pos] = obj
		pos++
	}

	//  Add objects to  gcWork In line 
	gcw.putBatch(ptrs[:pos]) 
	//  Reset  write barrier  cache 
	_p_.wbBuf.reset()
}

In fact, writing barrier here is the same routine as concurrent markup , You can look at the concurrency tag and then come back .wbBufFlush1 Can traverse write barrier cache , And then call findObject After finding the object, mark it with the flag bit , Finally, add the object to gcWork Scanning in the queue , and Reset write barrier cache .

stopTheWorldWithSema And startTheWorldWithSema

stopTheWorldWithSema And startTheWorldWithSema Is a pair of core functions for pausing and resuming programs .

func stopTheWorldWithSema() {
	_g_ := getg() 

	lock(&sched.lock)
	sched.stopwait = gomaxprocs
	//  Mark  gcwaiting, When scheduling, when you see this flag, you will enter the wait 
	atomic.Store(&sched.gcwaiting, 1)
	//  Send preemption signals 
	preemptall() 
	//  Pause the current  P
	_g_.m.p.ptr().status = _Pgcstop // Pgcstop is only diagnostic.
	sched.stopwait--
	//  Traverse all  P , modify  P  The status of is  _Pgcstop  Stop running 
	for _, p := range allp {
		s := p.status
		if s == _Psyscall && atomic.Cas(&p.status, s, _Pgcstop) {
			if trace.enabled {
				traceGoSysBlock(p)
				traceProcStop(p)
			}
			p.syscalltick++
			sched.stopwait--
		}
	}
	//  Stop being idle  P  list 
	for {
		p := pidleget()
		if p == nil {
			break
		}
		p.status = _Pgcstop
		sched.stopwait--
	}
	wait := sched.stopwait > 0
	unlock(&sched.lock)
	if wait {
		for {
			//   wait for  100 us
			if notetsleep(&sched.stopnote, 100*1000) {
				noteclear(&sched.stopnote)
				break
			}
			//  Send the preemption signal again 
			preemptall()
		}
	}
	//  Safety inspection 
	bad := ""
	if sched.stopwait != 0 {
		bad = "stopTheWorld: not stopped (stopwait != 0)"
	} else {
		for _, p := range allp {
			if p.status != _Pgcstop {
				bad = "stopTheWorld: not stopped (status != _Pgcstop)"
			}
		}
	}
	if atomic.Load(&freezing) != 0 {
		lock(&deadlock)
		lock(&deadlock)
	}
	if bad != "" {
		throw(bad)
	}
}

This method will pass sched.stopwait To see if all the P It's all suspended . First, by calling preemptall Send preemption signal to preempt all running G, Then traverse P Set all States to _Psyscall、 Idle P All suspended , If there's still something to stop P, Wait for them to stop .

func startTheWorldWithSema(emitTraceEvent bool) int64 {
	mp := acquirem() // disable preemption because it can be holding p in a local var
	//  Judge what you receive  netpoll  Event and add the corresponding G To the queue to be run 
	if netpollinited() {
		list := netpoll(0) // non-blocking
		injectglist(&list)
	}
	lock(&sched.lock)

	procs := gomaxprocs
	if newprocs != 0 {
		procs = newprocs
		newprocs = 0
	}
  //  Expand or shrink global processors 
	p1 := procresize(procs)
	//  Cancel GC Waiting for the mark 
	sched.gcwaiting = 0
	//  If  sysmon （ Background monitoring thread ）  Wake it up while waiting 
	if sched.sysmonwait != 0 {
		sched.sysmonwait = 0
		notewakeup(&sched.sysmonnote)
	}
	unlock(&sched.lock)
	//  Wake up a task that can be run P
	for p1 != nil {
		p := p1
		p1 = p1.link.ptr()
		if p.m != 0 {
			mp := p.m.ptr()
			p.m = 0
			if mp.nextp != 0 {
				throw("startTheWorld: inconsistent mp->nextp")
			}
			mp.nextp.set(p)
			notewakeup(&mp.park)
		} else {
			// Start M to run P 
			newm(nil, p, -1)
		}
	} 
	startTime := nanotime()
	if emitTraceEvent {
		traceGCSTWDone()
	}
 	//  If there's something free P, And there's no spinning M Wake up or create a M
	wakep() 
	releasem(mp) 
	return startTime
}

startTheWorldWithSema It's much simpler , First of all, from the netpoller Get the task to be processed in and join the global queue ; Then traverse P Linked list , Wake up a task that can be run P.

Create background tags Worker

func gcBgMarkStartWorkers() {
	//  Traverse all of  P
	for _, p := range allp {
        //  If it has been started, it will not be started repeatedly 
		if p.gcBgMarkWorker == 0 {
            //  Create for each global processor a  Goroutine
			go gcBgMarkWorker(p)
            //  Wait for the task notification semaphore after startup  bgMarkReady  Go on 
			notetsleepg(&work.bgMarkReady, -1)
			noteclear(&work.bgMarkReady)
		}
	}
}

gcBgMarkStartWorkers It's going to be for everyone P Create... To perform background marking tasks Goroutine, every last Goroutine Will run gcBgMarkWorker,notetsleepg Will wait for gcBgMarkWorker Notification semaphore bgMarkReady Go on .

Although here for each P Started a background marking task , But the only thing that can work at the same time is 25%, The scheduler is scheduling the loop runtime.schedule By calling gcController.findRunnableGCWorker Methods to control .

Before we look at this method , Let's start with a concept , Mark Worker Mode Mark working mode , For now, there are three , These three methods are to ensure the utilization of the marking thread in the background .

type gcMarkWorkerMode int

const (
	// gcMarkWorkerDedicatedMode indicates that the P of a mark
	// worker is dedicated to running that mark worker. The mark
	// worker should run without preemption.
	gcMarkWorkerDedicatedMode gcMarkWorkerMode = iota

	// gcMarkWorkerFractionalMode indicates that a P is currently
	// running the "fractional" mark worker. The fractional worker
	// is necessary when GOMAXPROCS*gcBackgroundUtilization is not
	// an integer. The fractional worker should run until it is
	// preempted and will be scheduled to pick up the fractional
	// part of GOMAXPROCS*gcBackgroundUtilization.
	gcMarkWorkerFractionalMode

	// gcMarkWorkerIdleMode indicates that a P is running the mark
	// worker because it has nothing else to do. The idle worker
	// should run until it is preempted and account its time
	// against gcController.idleMarkTime.
	gcMarkWorkerIdleMode
)

You can see from the code comments that ：

gcMarkWorkerDedicatedMode ：P Specifically responsible for tagging objects , Will not be preempted by the scheduler ;
gcMarkWorkerFractionalMode： The main reason is that the utilization rate of the default marking thread is 25%, So if CPU The kernel is not 4 Multiple , You can't divide by an integer , Start this type of work mode to help garbage collection achieve the utilization target ;
gcMarkWorkerIdleMode： Express P Currently only the tag thread is running , There is nothing else to do G , It will run the tag task of garbage collection until it is preempted ;

func (c *gcControllerState) findRunnableGCWorker(_p_ *p) *g {
	...
	//  Atoms decrease by the corresponding value ,  If the decrease is greater than or equal to 0 Then return to true,  Otherwise return to false
	decIfPositive := func(ptr *int64) bool {
		if *ptr > 0 {
			if atomic.Xaddint64(ptr, -1) >= 0 {
				return true
			}
			// We lost a race
			atomic.Xaddint64(ptr, +1)
		}
		return false
	}
	//  Reduce dedicatedMarkWorkersNeeded,  When successful, the mode of background marking task is Dedicated
	if decIfPositive(&c.dedicatedMarkWorkersNeeded) { 
		_p_.gcMarkWorkerMode = gcMarkWorkerDedicatedMode
	} else if c.fractionalUtilizationGoal == 0 {
		// No need for fractional workers.
		return nil
	} else {
		//  Time to perform the marking task 
		delta := nanotime() - gcController.markStartTime 
		if delta > 0 && float64(_p_.gcFractionalMarkTime)/float64(delta) > c.fractionalUtilizationGoal {
			// Nope. No need to run a fractional worker.
			return nil
		}
		_p_.gcMarkWorkerMode = gcMarkWorkerFractionalMode
	}
 
	gp := _p_.gcBgMarkWorker.ptr()
	casgstatus(gp, _Gwaiting, _Grunnable) 
	return gp
}

stay findRunnableGCWorker Will pass dedicatedMarkWorkersNeeded To decide whether to adopt gcMarkWorkerDedicatedMode Of Mark Worker Mode Mark working mode .dedicatedMarkWorkersNeeded Is in gcControllerState.startCycle Initialize in .

I won't paste the formula , stay gcControllerState.startCycle I've already said , Generally speaking, if it is 8 nucleus CPU, that dedicatedMarkWorkersNeeded by 2 , If it is 6 nucleus CPU, Because it can't be 4 to be divisible by , Calculated dedicatedMarkWorkersNeeded by 1, So it needs to be gcMarkWorkerFractionalMode Pattern to ensure CPU Utilization ratio .

gcMarkWorkerIdleMode It will be executed in the scheduler findrunnable Call when seizing ：

func findrunnable() (gp *g, inheritTime bool) {
	...
stop:
    //  be in  GC  Stage words , Access to perform GC Mark the task G
    if gcBlackenEnabled != 0 && _p_.gcBgMarkWorker != 0 && gcMarkWorkAvailable(_p_) {
        _p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
        gp := _p_.gcBgMarkWorker.ptr()
        // Will local  P  Of  GC  Mark special  G  Position  Grunnable
        casgstatus(gp, _Gwaiting, _Grunnable) 
        return gp, false
    }
	...
}

I've seen my 《 Detailed explanation Go Language scheduling loop source code implementation 》 I think all of you know that , When preemptive dispatch runs here , Usually P It's impossible to seize G 了 , I'm going to sleep , Therefore, it is safe to perform the marking task before hibernation .

Students who have not seen the scheduling cycle can see here ： Detailed explanation Go Language scheduling loop source code implementation https://www.luozhiyun.com/archives/448 .

Concurrent scan marks

The concurrent scan mark can be summarized as the following parts ：

Put the current P Pack it up parkInfo , And then call gopark Let the current G Go to sleep , Before dormancy will P Of gcBgMarkWorker And G Binding , Waiting to wake up ;
according to Mark Worker Mode Call different policy calls gcDrain Execution tag ;
Determine whether all background marking tasks have been completed , And there are no more tasks , call gcMarkDone Ready to enter the completion marking phase ;

Background mark sleep wait

func gcBgMarkWorker(_p_ *p) {
	gp := getg()

	type parkInfo struct {
		m      muintptr  
		attach puintptr  
	}  
	gp.m.preemptoff = "GC worker init"
	//  initialization  park
	park := new(parkInfo)
	gp.m.preemptoff = ""
	//  Set current M And it's forbidden to seize 
	park.m.set(acquirem())
	//  Set current P
	park.attach.set(_p_) 
	//  notice gcBgMarkStartWorkers You can continue to deal with 
	notewakeup(&work.bgMarkReady)

	for { 
		//  Let the current  G  Go to sleep 
		gopark(func(g *g, parkp unsafe.Pointer) bool {
			park := (*parkInfo)(parkp)
 
			releasem(park.m.ptr()) 
			//  Set the associated  P
			if park.attach != 0 {
				p := park.attach.ptr()
				park.attach.set(nil) 
				//  Put the current G Set to P Of gcBgMarkWorker member 
				if !p.gcBgMarkWorker.cas(0, guintptr(unsafe.Pointer(g))) { 
					return false
				}
			}
			return true
		}, unsafe.Pointer(park), waitReasonGCWorkerIdle, traceEvGoBlock, 0)
		...
    }
}

stay gcBgMarkStartWorkers We see , It will traverse all P , Then for each P Create a team responsible for Mark Work Of G, Although here for each P Started a background marking task , But not everyone P They're going to do the tagging task , The default resource utilization rate of background marking task is 25%, therefore gcBgMarkWorker Will initialize park And will G and P Of gcBgMarkWorker Sleep after binding .

The scheduler is scheduling the loop runtime.schedule By calling gcController.findRunnableGCWorker Methods to control , Let what Mark Work It can be executed , The above code has been pasted , I won't repeat it here .

WorkMark

Background tags

After waking up , We will gcMarkWorkerMode Select different tag execution strategies , Different execution strategies call runtime.gcDrain :

func gcBgMarkWorker(_p_ *p) {
	gp := getg()
	...
	for { 
		...
		//  Check P Of gcBgMarkWorker Whether with the current G Agreement ,  End current task when inconsistent 
		if _p_.gcBgMarkWorker.ptr() != gp {
			break
		}
		//  prohibit G Be seized 
		park.m.set(acquirem())

		//  Record the start time 
		startTime := nanotime()
		_p_.gcMarkWorkerStartTime = startTime

		decnwait := atomic.Xadd(&work.nwait, -1)

		systemstack(func() { 
			//  Set up G Is waiting so that its stack can be scanned 
			casgstatus(gp, _Grunning, _Gwaiting)
			//  Determine the mode of background marking task 
			switch _p_.gcMarkWorkerMode {
			default:
				throw("gcBgMarkWorker: unexpected gcMarkWorkerMode")
			case gcMarkWorkerDedicatedMode:
				//  In this mode P You should focus on the execution of tags 
				gcDrain(&_p_.gcw, gcDrainUntilPreempt|gcDrainFlushBgCredit)
				if gp.preempt { 
					//  When preempted, all of the G All kicked to the global run queue 
					lock(&sched.lock)
					for {
						gp, _ := runqget(_p_)
						if gp == nil {
							break
						}
						globrunqput(gp)
					}
					unlock(&sched.lock)
				}
				//  Continue with tag execution 
				gcDrain(&_p_.gcw, gcDrainFlushBgCredit)
			case gcMarkWorkerFractionalMode:
				//  Execution tag 
				gcDrain(&_p_.gcw, gcDrainFractional|gcDrainUntilPreempt|gcDrainFlushBgCredit)
			case gcMarkWorkerIdleMode:
				//  Execution tag ,  Until it is preempted or reaches a certain amount 
				gcDrain(&_p_.gcw, gcDrainIdle|gcDrainUntilPreempt|gcDrainFlushBgCredit)
			}
			//  recovery G State to run 
			casgstatus(gp, _Gwaiting, _Grunning)
		}) 
		...
	}
}

I've talked about different Mark Worker Mode The difference between , If you don't remember, you can turn it up . The execution tag section is mainly in switch In judgment , Pass in different parameters according to different modes gcDrain Execute in function .

It should be noted that , The incoming to gcDrain One of them is gcWork The structure of the body , It's equivalent to every P Private cache space , Store objects that need to be scanned , Provides an abstraction of production and consumption tasks for the garbage collector ,, This structure holds two important working buffers wbuf1 and wbuf2：

gcWork2

When we add or delete objects to the structure , It always works first wbuf1 buffer , once wbuf1 Insufficient buffer space or no objects , Will trigger buffer switching , And when both buffers are out of space or empty , Objects are inserted or retrieved from the global working buffer ：

func (w *gcWork) tryGet() uintptr {
	wbuf := w.wbuf1
	...
	// wbuf1 When there is no data in the buffer 
	if wbuf.nobj == 0 {
		// wbuf1  And  wbuf2  Exchange objects 
		w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1
		wbuf = w.wbuf1
		if wbuf.nobj == 0 {
			owbuf := wbuf
			//  from  work  Of  full  Get... In the queue 
			wbuf = trygetfull()
			...
		}
	} 
	wbuf.nobj--
	return wbuf.obj[wbuf.nobj]
}

Continue with the above gcBgMarkWorker Method , After the marking is finished, the marking is finished ：

func gcBgMarkWorker(_p_ *p) {
	gp := getg()
	...
	for { 
		...  
		//  Add up the time taken 
		duration := nanotime() - startTime
		switch _p_.gcMarkWorkerMode {
		case gcMarkWorkerDedicatedMode:
			atomic.Xaddint64(&gcController.dedicatedMarkTime, duration)
			atomic.Xaddint64(&gcController.dedicatedMarkWorkersNeeded, 1)
		case gcMarkWorkerFractionalMode:
			atomic.Xaddint64(&gcController.fractionalMarkTime, duration)
			atomic.Xaddint64(&_p_.gcFractionalMarkTime, duration)
		case gcMarkWorkerIdleMode:
			atomic.Xaddint64(&gcController.idleMarkTime, duration)
		}

		incnwait := atomic.Xadd(&work.nwait, +1)
 
		//  Determine whether all background marking tasks have been completed ,  And there are no more tasks 
		if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 
			//  Cancel and P The associated 
			_p_.gcBgMarkWorker.set(nil)
			//  allow G Be seized 
			releasem(park.m.ptr())
			//  Ready to enter the completion marking phase 
			gcMarkDone()

			//  It's reconnected before hibernation P
			//  Because it's allowed to be preempted ,  When you get here, it may become something else P
			//  If you re relate P If you fail, the task will end 
			park.m.set(acquirem())
			park.attach.set(_p_)
		}
	}
}

gcBgMarkWorker Will be based on incnwait To see if it's the last worker, And then call gcMarkWorkAvailable Function to verify gcwork Have all the tasks and global tasks been processed , If you're sure it's ok , So called gcMarkDone Enter the completion marking stage .

Mark scan

Let's take a look at gcDrain：

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
	gp := getg().m.curg
	//  Do you want to return when you see the preemption sign 
	preemptible := flags&gcDrainUntilPreempt != 0
	//  Whether to calculate the amount of scanning in the background to reduce the waiting time in the assist thread and wake-up G
	flushBgCredit := flags&gcDrainFlushBgCredit != 0
	//  Whether to perform only a certain amount of work 
	idle := flags&gcDrainIdle != 0
	//  Record the initial number of scans 
	initScanWork := gcw.scanWork
 
	checkWork := int64(1<<63 - 1)
	var check func() bool
	if flags&(gcDrainIdle|gcDrainFractional) != 0 {
        // drainCheckThreshold  Default  100000
		checkWork = initScanWork + drainCheckThreshold
		if idle {
			check = pollWork
		} else if flags&gcDrainFractional != 0 {
			check = pollFractionalWorkerExit
		}
	} 
	//  If the root object is not finished scanning ,  Scan the root object first 
	if work.markrootNext < work.markrootJobs {
		//  It circulates until it is preempted or  STW
		for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) {
			//  Take a value from the root object scan queue 
			job := atomic.Xadd(&work.markrootNext, +1) - 1
			if job >= work.markrootJobs {
				break
			}
			//  Perform root object scanning 
			markroot(gcw, job)
			if check != nil && check() {
				goto done
			}
		}
	} 
	...
}

gcDrain Function at the beginning , Will be based on flags Choose different strategies for different reasons .

gcDrainUntilPreempt： When G Return when seized ;
gcDrainIdle： call runtime.pollWork, When P Contains other pending G When to return to ;
gcDrainFractional： call runtime.pollFractionalWorkerExit, When CPU More than fractionalUtilizationGoal Of 20% When to return to ;

Set the check Variables are then executed runtime.markroot Scan the root object , Every time a scan is completed, it calls check Function to check whether the marking task should be exited , If so, jump to done Exit tag in code block .

After the task is completed, the tag will be obtained ：

func gcDrain(gcw *gcWork, flags gcDrainFlags) {
	...
	//  The root object is already in the tag queue ,  Consumer tag queue 
	//  It circulates until it is preempted or  STW
	for !(gp.preempt && (preemptible || atomic.Load(&sched.gcwaiting) != 0)) { 
		//  Put part of the local work back on the global queue 
		if work.full == 0 {
			gcw.balance()
		}
		//  Access to task 
		b := gcw.tryGetFast()
		if b == 0 {
			b = gcw.tryGet()
			if b == 0 { 
				wbBufFlush(nil, 0)
				b = gcw.tryGet()
			}
		}
		//  Can't get the object ,  Tag queue is empty ,  Out of the loop 
		if b == 0 { 
			break
		}
		//  Scan the acquired object 
		scanobject(b, gcw)
 
		//  If you've scanned a certain number of objects ,gcCreditSlack The value is 2000
		if gcw.scanWork >= gcCreditSlack {
			//  Add the number of objects scanned to the global 
			atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
			if flushBgCredit {
				//  Recording the number of bytes of memory scanned is used to reduce the workload of auxiliary marking 
				gcFlushBgCredit(gcw.scanWork - initScanWork)
				initScanWork = 0
			}
			checkWork -= gcw.scanWork
			gcw.scanWork = 0

			if checkWork <= 0 {
				checkWork += drainCheckThreshold
				if check != nil && check() {
					break
				}
			}
		}
	}

done:
	//  Add the number of objects scanned to the global 
	if gcw.scanWork > 0 {
		atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
		if flushBgCredit {
			//  Recording the number of bytes of memory scanned is used to reduce the workload of auxiliary marking 
			gcFlushBgCredit(gcw.scanWork - initScanWork)
		}
		gcw.scanWork = 0
	}
}

This is called before getting the cache queue runtime.gcWork.balance, Will gcWork Cache part of the work back to the global queue , This method is mainly used to balance the differences P The load of .

And get gcWork Cache task for , And give the task to scanobject perform , This function will start scanning from the position passed in , And color the active objects you find .runtime.gcFlushBgCredit The number of memory bytes scanned will be recorded to reduce the workload of auxiliary marking .

Let me summarize gcWork In and out of the team .gcWork We're on top of the team scanobject Method , Will get gcWork Cache objects and execute , But at the same time, if you find an active object, you will join the team again gcWork in .

except scanobject outside , Write barriers 、 Both root object scan and stack scan will be directed to gcWork Add extra gray objects to wait for processing .

GcWork

The root marker

func markroot(gcw *gcWork, i uint32) { 
	baseFlushCache := uint32(fixedRootCount)
	baseData := baseFlushCache + uint32(work.nFlushCacheRoots)
	baseBSS := baseData + uint32(work.nDataRoots)
	baseSpans := baseBSS + uint32(work.nBSSRoots)
	baseStacks := baseSpans + uint32(work.nSpanRoots)
	end := baseStacks + uint32(work.nStackRoots)

 	switch {
		//  Release mcache All in span,  requirement STW
	case baseFlushCache <= i && i < baseData:
		flushmcache(int(i - baseFlushCache)) 
		//  Scan global variables for read and write 
	case baseData <= i && i < baseBSS:
		for _, datap := range activeModules() {
			markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-baseData))
		}
		//  Scan for uninitialized global variables 	
	case baseBSS <= i && i < baseSpans:
		for _, datap := range activeModules() {
			markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-baseBSS))
		}
		//  scanning  finalizers  queue 
	case i == fixedRootFinalizers:
		for fb := allfin; fb != nil; fb = fb.alllink {
			cnt := uintptr(atomic.Load(&fb.cnt))
			scanblock(uintptr(unsafe.Pointer(&fb.fin[0])), cnt*unsafe.Sizeof(fb.fin[0]), &finptrmask[0], gcw, nil)
		}
		//  Release suspended  G  The stack 
	case i == fixedRootFreeGStacks: 
		systemstack(markrootFreeGStacks)
		//  scanning  MSpan.specials
	case baseSpans <= i && i < baseStacks: 
		markrootSpans(gcw, int(i-baseSpans))
	//  Scan each one  G  The stack 
	default: 
		//  Get the  G
		var gp *g
		if baseStacks <= i && i < end {
			gp = allgs[i-baseStacks]
		} else {
			throw("markroot: bad index")
		}
 
		//  Record the time of waiting to start 
		status := readgstatus(gp) // We are not in a scan state
		if (status == _Gwaiting || status == _Gsyscall) && gp.waitsince == 0 {
			gp.waitsince = work.tstart
		}
 
		//  Handed over to g0 scan 
		systemstack(func() { 
			userG := getg().m.curg
			selfScan := gp == userG && readgstatus(userG) == _Grunning
			//  If you scan your own , Then change your own g The state of 
			if selfScan {
				casgstatus(userG, _Grunning, _Gwaiting)
				userG.waitreason = waitReasonGarbageCollectionScan
			}
 
			//  Hang up  G, Let the corresponding  G  Stop running 
			stopped := suspendG(gp)
			if stopped.dead {
				gp.gcscandone = true
				return
			}
			if gp.gcscandone {
				throw("g already scanned")
			}
			//  scanning g The stack 
			scanstack(gp, gcw)
			gp.gcscandone = true
			resumeG(stopped)

			if selfScan {
				casgstatus(userG, _Gwaiting, _Grunning)
			}
		})
	}
}

See the scan above BSS and Date I'm also very confused about the related memory blocks , We combine Wikipedia Data segment https://en.wikipedia.org/wiki/Data_segment We can see that ：

The .data segment contains any global or static variables which have a pre-defined value and can be modified.
The BSS segment, also known as uninitialized data, is usually adjacent to the data segment.

Data Segments are usually global variables that are initialized in advance ,BSS Segments are usually uninitialized data .

Object scanning

func scanobject(b uintptr, gcw *gcWork) { 
	//  obtain  b  Of  heapBits  object 
	hbits := heapBitsForAddr(b)
	//  obtain  span
	s := spanOfUnchecked(b)
	// span  Corresponding object size 
	n := s.elemsize
	if n == 0 {
		throw("scanobject n == 0")
	}
	//  Maximum scan per time 128KB
	if n > maxObletBytes {
		// Large object. Break into oblets for better
		// parallelism and lower latency.
		if b == s.base() { 
			if s.spanclass.noscan() {
				// Bypass the whole scan.
				gcw.bytesMarked += uint64(n)
				return
			}
			//  Put more than 128KB Put the object back gcworker in , Next time we scan 
			for oblet := b + maxObletBytes; oblet < s.base()+s.elemsize; oblet += maxObletBytes {
				if !gcw.putFast(oblet) {
					gcw.put(oblet)
				}
			}
		}

		n = s.base() + s.elemsize - b
		if n > maxObletBytes {
			n = maxObletBytes
		}
	}

	var i uintptr
	for i = 0; i < n; i += sys.PtrSize {
		//  Get the corresponding bit
		// Find bits for this word.
		if i != 0 { 
			hbits = hbits.next()
		} 
		bits := hbits.bits() 
		//  Check scan bit Determine whether to continue scanning 
		if i != 1*sys.PtrSize && bits&bitScan == 0 {
			break // no more pointers in this object
		}
		//  If it's not a pointer, continue 
		if bits&bitPointer == 0 {
			continue // not a pointer
		}
 
		//  Take the value of the pointer 
		obj := *(*uintptr)(unsafe.Pointer(b + i))
 
		if obj != 0 && obj-b >= n { 
			//  Search objects in heap according to address value 
			if obj, span, objIndex := findObject(obj, b, i); obj != 0 {
				//  call  greyobject  Tag the object and put it in the tag queue 
				greyobject(obj, b, i, span, gcw, objIndex)
			}
		}
	}
	gcw.bytesMarked += uint64(n)
	gcw.scanWork += int64(i)
}

Auxiliary markers mutator assists

At the beginning of the analysis, we also mentioned something about mutator assists The role of , Mainly to prevent heap It's growing too fast , stay GC In the process of execution, if it runs at the same time G Allocated memory , So this G Will be asked to assist GC Do part of the work , It follows a very simple and simple principle , As much memory is allocated, as many tagging tasks need to be completed .

mutator assists The entrance is at go\src\runtime\malloc.go Of mallocgc Function ：

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
	...
	var assistG *g
	if gcBlackenEnabled != 0 {
		assistG = getg()
		if assistG.m.curg != nil {
			assistG = assistG.m.curg
		}
        //  Subtract the memory value 
		assistG.gcAssistBytes -= int64(size)

		if assistG.gcAssistBytes < 0 {
            // This G is in debt.
			gcAssistAlloc(assistG)
		}
	}
	...
	return x
}

mallocgc Every time I allocate memory, I check gcAssistBytes Whether the field is negative , This field stores the current Goroutine The number of object bytes for auxiliary tags . If it's negative , Then call gcAssistAlloc From the perspective of overall credit bgScanCredit In order to get ：

func gcAssistAlloc(gp *g) {
	 ...
retry:  
	//  Count the number of tagging tasks that need to be done 
	debtBytes := -gp.gcAssistBytes
	scanWork := int64(gcController.assistWorkPerByte * float64(debtBytes))
	if scanWork < gcOverAssistWork {
		scanWork = gcOverAssistWork
		debtBytes = int64(gcController.assistBytesPerWork * float64(scanWork))
	}

	//  Gets the number of bytes of the global auxiliary tag 
	bgScanCredit := atomic.Loadint64(&gcController.bgScanCredit)
	stolen := int64(0)
	if bgScanCredit > 0 {
		if bgScanCredit < scanWork {
			stolen = bgScanCredit
			gp.gcAssistBytes += 1 + int64(gcController.assistBytesPerWork*float64(stolen))
		} else {
			stolen = scanWork
			gp.gcAssistBytes += debtBytes
		}
		//  Global credit deduction stolen points 
		atomic.Xaddint64(&gcController.bgScanCredit, -stolen) 
		scanWork -= stolen 
		//  Reduced to  0  explain  bgScanCredit  With enough credit to handle  scanWork
		if scanWork == 0 { 
			return
		}
	} 
	//  So let's go over here  bgScanCredit  Less than  scanWork
	//  Need to call  gcDrainN  Complete the specified number of tag tasks and return 
	systemstack(func() {
		//  Perform the tagging task 
		gcAssistAlloc1(gp, scanWork) 
	})

	completed := gp.param != nil
	gp.param = nil
	if completed {
		gcMarkDone()
	}

	if gp.gcAssistBytes < 0 { 
		if gp.preempt {
			Gosched()
			goto retry
		}
		//  If the overall credit is still insufficient, the current  Goroutine  Fall into hibernation  
		//  Join the global auxiliary tag queue and wait for the background tag task to wake up 
		if !gcParkAssist() {
			goto retry
		} 
	}  
}

If the overall credit is still insufficient, the current Goroutine Fall into hibernation , Join the global auxiliary tag queue and wait for the background tag task to wake up .

Called when scanning memory gcFlushBgCredit Will be responsible for the wake-up AIDS Goroutine ：

func gcFlushBgCredit(scanWork int64) {
	//  There is no waiting in the secondary queue  Goroutine 
	if work.assistQueue.q.empty() {
		//  The current credit will be added directly to the global credit  bgScanCredit
		atomic.Xaddint64(&gcController.bgScanCredit, scanWork)
		return
	}

	scanBytes := int64(float64(scanWork) * gcController.assistBytesPerWork)

	lock(&work.assistQueue.lock)
	//  If the secondary queue is not empty 
	for !work.assistQueue.q.empty() && scanBytes > 0 {
		gp := work.assistQueue.q.pop()
		//  Wake up the  Goroutine
		if scanBytes+gp.gcAssistBytes >= 0 { 
			scanBytes += gp.gcAssistBytes
			gp.gcAssistBytes = 0 
			ready(gp, 0, false)
		} else { 
			gp.gcAssistBytes += scanBytes
			scanBytes = 0 
			work.assistQueue.q.pushBack(gp)
			break
		}
	}
	//  There's still a lot left to mark , These tagging tasks are added to the global credit 
	if scanBytes > 0 { 
		scanWork = int64(float64(scanBytes) * gcController.assistWorkPerByte)
		atomic.Xaddint64(&gcController.bgScanCredit, scanWork)
	}
	unlock(&work.assistQueue.lock)
}

gcFlushBgCredit Will get the auxiliary queue for sleep Goroutine , If the current credit is enough , Then it will help Goroutine Wake up the , If there's anything left , Then all of these tagging tasks will be added to the global credit .

Generally speaking, it is a set of mechanisms as follows ：

mutator

Finish marking

Up there we are gcBgMarkWorker Analysis of , After the tag completes, it calls gcMarkDone Execute the tag to complete the operation .

func gcMarkDone() {
	
	semacquire(&work.markDoneSema)

top: 
	//  Check again if the task is finished 
	if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) {
		semrelease(&work.markDoneSema)
		return
	}
 
	semacquire(&worldsema)
 
	gcMarkDoneFlushed = 0 
	systemstack(func() {
		gp := getg().m.curg 
		casgstatus(gp, _Grunning, _Gwaiting)
		//  Traverse all  P
		forEachP(func(_p_ *p) {
			//  take  P  Corresponding write barrier buffer  Objects in are added to  gcWork  in 
			wbBufFlush1(_p_)
			//  take  gcWork  To join the global queue 
			_p_.gcw.dispose()
			//  Express  gcWork  All of the data has been migrated to   In global queue 
			if _p_.gcw.flushedWork {
				atomic.Xadd(&gcMarkDoneFlushed, 1)
				_p_.gcw.flushedWork = false
			} else if debugCachedWork {
				...
			}
			...
		})
		casgstatus(gp, _Gwaiting, _Grunning)
	})

	if gcMarkDoneFlushed != 0 {
		if debugCachedWork {
			// Release paused gcWorks.
			atomic.Xadd(&gcWorkPauseGen, 1)
		}
		semrelease(&worldsema)
		goto top
	}
	//  Record the start time of the completion marking phase and STW Start time 
	now := nanotime()
	work.tMarkTerm = now
	work.pauseStart = now
	//  prohibit G Be seized 
	getg().m.preemptoff = "gcing" 
	// STW
	systemstack(stopTheWorldWithSema) 
	...   
	//  No assistance GC And background marking task running 
	atomic.Store(&gcBlackenEnabled, 0) 
	//  Wake up all because of assistance GC And dormant G
	gcWakeAllAssists() 
	semrelease(&work.markDoneSema) 
	schedEnableUser(true) 
	//  Calculate the next trigger gc Needed heap size 
	nextTriggerRatio := gcController.endCycle() 
	//  Execution marks terminate 
	gcMarkTermination(nextTriggerRatio)
}

gcMarkDone Would call forEachP Function traverses all of P , And the corresponding P Medium gcWork Move the task in to the global queue , If gcWork If there's a task in it, it will gcMarkDoneFlushed Add 1, All over P And then I'll judge if gcMarkDoneFlushed Not for 0, So jump to top The tag bit continues to loop , Until there are no tasks in the local queue .

Next, we'll close gcBlackenEnabled Set to 0, It means to turn off the auxiliary tag and background tag ; Wake up the blocked auxiliary marker protocol ; call schedEnableUser Recover users Goroutine The scheduling ; It should be noted that , At present STW Stage , So awakened Goroutine Not immediately , Will wait STW It's not done until it's over .

Last call gcMarkTermination Execution marks terminate .

Mark termination

func gcMarkTermination(nextTriggerRatio float64) { 
	//  No assistance GC And background marking task running 
	atomic.Store(&gcBlackenEnabled, 0)
	//  take  GC  The phase switches to  _GCmarktermination
	setGCPhase(_GCmarktermination)
	work.heap1 = memstats.heap_live
	//  Record the start time 
	startTime := nanotime()
	mp := acquirem()
	mp.preemptoff = "gcing"
	_g_ := getg()
	_g_.m.traceback = 2
	gp := _g_.m.curg
	//  Set up  G  The state is waiting 
	casgstatus(gp, _Grunning, _Gwaiting)
	gp.waitreason = waitReasonGarbageCollection 
	//  Switch to  g0  function 
	systemstack(func() {
		//  Start  STW  Tags in 
		gcMark(startTime) 
	})

	systemstack(func() {
		work.heap2 = work.bytesMarked
		...
		//  Set up current GC Stage to close ,  And disable the write barrier 
		setGCPhase(_GCoff)
		//  Wake up the background cleaning task 
		gcSweep(work.mode)
	}) 
	_g_.m.traceback = 0
	casgstatus(gp, _Gwaiting, _Grunning) 
	//  Statistics and reset cleaning status related codes  
	... 
	//  Statistics execution GC Number of times then wake up waiting to be cleaned G
	lock(&work.sweepWaiters.lock)
	memstats.numgc++
	injectglist(&work.sweepWaiters.list)
	unlock(&work.sweepWaiters.lock)
 	//  Performance statistics  
	mProf_NextCycle()
	//  again   start The World
	systemstack(func() { startTheWorldWithSema(true) })
	//  Performance statistics  
	mProf_Flush() 
	//  Mobile tag queue uses  workbuf  To  free list,  So they can be recycled 
	prepareFreeWorkbufs()
	//  Release unused stacks 
	systemstack(freeStackSpans) 
	//  Make sure that each  P  Of  mcache  All be  flush 
	systemstack(func() {
		forEachP(func(_p_ *p) {
			_p_.mcache.prepareForSweep()
		})
	}) 
	...
	semrelease(&worldsema)
	semrelease(&gcsema) 
	releasem(mp)
	mp = nil 
}

gcMarkTermination Mainly do some confirmation work and statistical work . Going into this method will first GC The stage is set to _GCmarktermination, And then call gcMark Method to confirm if all GC The tagging has been done . And then GC The stage is set to _GCoff, call gcSweep Start cleaning up . Next is the omitted data statistics related code , Including the size of memory in use 、GC Time 、CPU Utilization rate, etc. . Finally, do some confirmation work , Make sure that every P Of mcache All be flush , The stacks are all free ,workbuf All moved to free list For recycling and so on .

Backstage cleaning

func gcSweep(mode gcMode) {
	if gcphase != _GCoff {
		throw("gcSweep being done but phase is not GCoff")
	}
	lock(&mheap_.lock)
	mheap_.sweepgen += 2
	//  Reset tag bit 
	mheap_.sweepdone = 0
	...
	mheap_.pagesSwept = 0
	mheap_.sweepArenas = mheap_.allArenas
	mheap_.reclaimIndex = 0
	mheap_.reclaimCredit = 0
	unlock(&mheap_.lock)

	if go115NewMCentralImpl {
		sweep.centralIndex.clear()
	}
	//  If it's not parallel GC 
	if !_ConcurrentSweep || mode == gcForceBlockMode {
		...
		return
	}

	//  Wake up the background cleaning task 
	lock(&sweep.lock)
	if sweep.parked {
		sweep.parked = false
		ready(sweep.g, 0, true)
	}
	unlock(&sweep.lock)
}

gcSweep The main task is to reset the state of the cleanup phase , Then wake up sweep Clean Goroutine . Background cleaning task is in initialization main Goroutine Called when bgsweep Set up ：

gcenable

func gcenable() {
	// Kick off sweeping and scavenging.
	c := make(chan int, 2)
    //  Set up asynchronous sweep 
	go bgsweep(c) 
	<-c 
}

bgsweep

func bgsweep(c chan int) {
	//  Set up cleaning  Goroutine 
	sweep.g = getg()
	//  Waiting to wake up  
	lockInit(&sweep.lock, lockRankSweep)
	lock(&sweep.lock)
	sweep.parked = true
	c <- 1
	goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)

	//  Cycle cleaning 
	for {
		//  Clean one up span,  Then go to scheduling 
		for sweepone() != ^uintptr(0) {
			sweep.nbgsweep++
			Gosched()
		}
		//  Release some unused tag queue buffers to heap
		for freeSomeWbufs(true) {
			Gosched()
		}
		lock(&sweep.lock)
		//  Judge  sweepdone  Whether the flag bit is equal to  0
		//  If the cleaning is not completed, continue the cycle 
		if !isSweepDone() { 
			unlock(&sweep.lock)
			continue
		}
		//  Otherwise, let the background cleaning task go to sleep 
		sweep.parked = true
		goparkunlock(&sweep.lock, waitReasonGCSweepWait, traceEvGoBlock, 1)
	}
}

bgsweep The task of cleaning is actually done by sweepone On going , It looks in heap memory for the ones to be cleaned up span, And it will return to how much it cleaned up page To heap in , return ^uintptr(0) There is nothing to clean ：

func sweepone() uintptr {
	_g_ := getg()
	sweepRatio := mheap_.sweepPagesPerByte // For debugging 
	_g_.m.locks++
	//  Verify that cleaning is complete 
	if atomic.Load(&mheap_.sweepdone) != 0 {
		_g_.m.locks--
		return ^uintptr(0)
	}
	atomic.Xadd(&mheap_.sweepers, +1)

	// Find a  span  And release 
	var s *mspan
	sg := mheap_.sweepgen
	for {
		if go115NewMCentralImpl {
			s = mheap_.nextSpanForSweep()
		} else {
			s = mheap_.sweepSpans[1-sg/2%2].pop()
		}
		if s == nil {
			atomic.Store(&mheap_.sweepdone, 1)
			break
		}
		if state := s.state.get(); state != mSpanInUse { 
			continue
		}
		// span  Of  sweepgen  be equal to  mheap.sweepgen - 2, So it means that the current unit needs to be cleaned up 
		if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
			break
		}
	}

	//  clear  span
	npages := ^uintptr(0)
	if s != nil {
		npages = s.npages
		//  Reclaiming memory 
		if s.sweep(false) { 
			atomic.Xadduintptr(&mheap_.reclaimCredit, npages)
		} else {
			 
			npages = 0
		}
	} 
	_g_.m.locks--
	return npages
}

Looking for span Will pass when state State and sweepgen Is it equal to mheap.sweepgen - 2 To see if it's necessary to clean up span. It will eventually pass mspan.sweep To clean up .

Let's take a quick look sweep The implementation of the ：

func (s *mspan) sweep(preserve bool) bool {
	if !go115NewMCentralImpl {
		return s.oldSweep(preserve)
	} 
	_g_ := getg() 
	sweepgen := mheap_.sweepgen
	//  Count the number of pages cleaned up 
	atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))

	spc := s.spanclass
	size := s.elemsize

	c := _g_.m.p.ptr().mcache  
	...
	//  Calculate the number of objects released  
	nalloc := uint16(s.countAlloc())
	nfreed := s.allocCount - nalloc
	 
	s.allocCount = nalloc
	s.freeindex = 0 // reset allocation index to start of span.
	  
	s.allocBits = s.gcmarkBits
	s.gcmarkBits = newMarkBits(s.nelems)
 
	s.refillAllocCache(0) 
	//  Set up  span.sweepgen  and  mheap.sweepgen  equal 
	atomic.Store(&s.sweepgen, sweepgen)

	if spc.sizeclass() != 0 {
		//  Dealing with the recycling of small objects 
		// Handle spans for small objects.
		if nfreed > 0 { 
			s.needzero = 1
			c.local_nsmallfree[spc.sizeclass()] += uintptr(nfreed)
		}
		if !preserve { 
			if nalloc == 0 { 
				//  Direct release  span  Into the pile 
				mheap_.freeSpan(s)
				return true
			} 
			//  take  span  Release to  mcentral  in 
			if uintptr(nalloc) == s.nelems {
				mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
			} else {
				mheap_.central[spc].mcentral.partialSwept(sweepgen).push(s)
			}
		}
	} else if !preserve { 
		//  Dealing with the recycling of large objects 
		if nfreed != 0 {
			// Free large object span to heap. 
			if debug.efence > 0 {
				s.limit = 0 // prevent mlookup from finding this span
				sysFault(unsafe.Pointer(s.base()), size)
			} else {
				//  Direct release  span  Into the pile 
				mheap_.freeSpan(s)
			}
			c.local_nlargefree++
			c.local_largefree += size
			return true
		}

		// Add a large span directly onto the full+swept list.
		mheap_.central[spc].mcentral.fullSwept(sweepgen).push(s)
	}
	return false
}

summary

I used to use java I just had a superficial understanding of the mark clearing algorithm , This is the first time to go deep into Go The three color algorithm of language has brought me huge profits . This article took a long time to write because of the article and memory allocation 、 The relationship between cyclic scheduling is very large , So you have to understand the first two to understand GC Principle .

Limited to my own work and study time , I didn't explain the code related to root tag very clearly , If you are interested in it, you can gcBgMarkWorker Let's study .