当前位置：网站首页>Mit6.824-lab2d-2022 (detailed explanation of log compression)

Mit6.824-lab2d-2022 (detailed explanation of log compression)

2022-06-10 10:51:00 【Xingping XP】

List of articles

Preface
One 、 Log compression （Log compaction） The background of
Two 、Snapshot The interpretation of
3、 ... and 、lab2D
3、 ... and 、DEBUG gossip
summary

Preface

Because I am busy with the final exam , as well as 2D Most of the code refactoring is required … This one is actually a long time behind the last one .

One 、 Log compression （Log compaction） The background of

First, we need to understand the background of log compression , Why do I need log compression ？
Let's see paper Description in ：

Raft’s log grows during normal operation to incorporate more client requests, but in a practical system, it cannot grow without bound. As the log grows longer, it occupies more space and takes more time to replay. Thiswill eventually cause availability problems without some
mechanism to discard obsolete information that has accumulated in the log.

This passage means ：Raft The log of will grow during normal operation , To accommodate more client requests , But in a real system , It cannot grow indefinitely . As the log gets longer , It takes up more space and takes more time to replay . If there is no mechanism to discard the stale information accumulated in the log , This will eventually lead to Usability issues .
So we use Snapshot（ snapshot ） To simply implement log compression .

Two 、Snapshot The interpretation of

2.1、Snapshot The way

stay Paper Of figure12 There is such a picture in which the function of snapshots can be well illustrated ：

This picture is actually easy to understand , Now let the written test explain briefly ： Suppose now log Stored in x、y Update of .x The update information of is in turn 3、2、0、5.y The update information of is in turn 1、9、7. And the log subscript 1~5 The log of is commit 了 , This indicates that this log is no longer needed for the current node . Then we will access this log Finally, store information As a log, that is x=0,y=9 And record the log index of the last snapshot storage （last included index） And the corresponding term of office . At this point, our new log only needs 6、7 Uncommitted parts ,log The length of is also from 7 Change into 2.

Therefore, it can be seen that snapshot storage is based on raft The number of nodes determines . Every node will Access your own snapshots , Snapshot information is equivalent to commit Later logs ..

2.2、Snapshot And AppendEntries The connection of

go back to raft In the log increment of , In fact, we can find that ,commit The updated process is actually ,Leader Send to each node to synchronize logs , And back to leader Sync RPC Result , to update matchIndex. If more than half of the nodes have successfully synchronized the logs , that leader More than half , And the latest matchIndex Set to commitIndex, And then submitted by ticker Submit . Then it will be updated the next time the log heartbeat is sent followers Of commitIndex Subscript .
Therefore, there may be half of the nodes , Or network partition ,crash The node of has not been updated to the submitted node , The submitted log has been leader Submit and abandon . Then we need leader Send your own snapshot , Install for these followers.

2.3、Snapshot Advantages and disadvantages

First, provide the original description ：

There are two more issues that impact snapshotting performance. First, servers must decide when to snapshot. If a server snapshots too often, it wastes disk bandwidth and energy; if it snapshots too infrequently, it risks exhausting its storage capacity, and it increases the time required to replay the log during restarts. One simple strategy is to take a snapshot when the log reaches a fixed size in
bytes. If this size is set to be significantly larger than the
expected size of a snapshot, then the disk bandwidth overhead for snapshotting will be small.
The second performance issue is that writing a snapshot can take a significant amount of time, and we do not want this to delay normal operations. The solution is to use copy-on-write techniques so that new updates can be accepted without impacting the snapshot being written. For example, state machines built with functional data
structures naturally support this. Alternatively, the operating system’s copy-on-write support (e.g., fork on Linux) can be used to create an in-memory snapshot of the entire state machine (our implementation uses this approach).

It can be seen that Snapshot There will be two questions ：

Waste of disk bandwidth and energy caused by frequent snapshot settings . Of course, frequent snapshot settings can also lead to storage problems . Therefore, a reasonable snapshot installation is the correct solution . Paper The strategy mentioned in is to Log Set a reasonable size , Beyond this size, the log snapshot is updated .
The other is the time spent writing snapshots . Because there may be competition when writing snapshots , And lock .paper It is also a classic copy-on-write Method , When updating snapshots , First, store a copy temporarily log Backup of .

3、 ... and 、lab2D

3.1、 The interaction process

Yes 2D In fact, it is not required to solve the performance overhead , Here we start with the interaction diagram provided by the official experiment guide .
Insert picture description here

Through this picture, you can actually review the whole lab2 The whole picture of . I will give another general overview of the specific process later , The snapshot section is described here .
As can be seen from the figure, in a single node Service The floor will be right Raft Node call Snapshot、 And CondInstallSnap Two functions , This is what we need to write in the laboratory . For their own Raft Nodes need to persist the snapshot information , That is to say SaveStateAndSnapshot function , This part of the persistent information will pass through Service use ReadSnapshot To call . These two functions on persistence , The lab has been persisting for us persist.go Provided in .
For snapshot connection between nodes, it is through InstallSnapshot Communicate between nodes . It depends on who is leader 了 .

3.2、Snapshot And CondInstallSnap

lab2 We need to implement these two functions by ourselves , Now let's talk about the role and implementation of these two functions .
First of all Snapshot function , about Snapshot The function is actually paper There is also a paragraph mentioned in ：

This snapshotting approach departs from Raft’s strongleader principle, since followers can take snapshots without the knowledge of the leader. However, we think thisdeparture is justified. While having a leader helps avoid conflicting decisions in reaching consensus, consensus has already been reached when snapshotting, so no decisions conflict. Data still only flows from leaders to followers, just followers can now reorganize their data.

because snapshot In fact, that is service Yes rf Called , send rf The node updates its own snapshot information . In this way, some people may think that this violates Raft The principle of strong leadership . Because followers can update their snapshots without the leader's knowledge . But in fact, this situation is reasonable , The snapshot is updated only to update the data , It does not conflict with reaching a consensus . Data is still just flowing from the leader to the lower level ,followers Just use snapshots to reduce their storage burden .

Then let's look at the specific code implementation ：

// Snapshot the service says it has created a snapshot that has
// all info up to and including index. this means the
// service no longer needs the log through (and including)
// that index. Raft should now trim its log as much as possible.
// index Represents a snapshot apply Applied index, and snapshot It represents the upper class service The stream of snapshot bytes transmitted , It includes Index Previous data 
//  The purpose of this function is to discard the logs installed in the snapshot , And install snapshot data , Also update the snapshot subscript , Belong to peers Self initiative update , And leader Sending snapshots does not conflict 
func (rf *Raft) Snapshot(index int, snapshot []byte) {
    
	// Your code here (2D).
	if rf.killed() {
    
		return
	}

	rf.mu.Lock()
	defer rf.mu.Unlock()
	//  If the subscript is greater than its own submission , Note the snapshot cannot be installed without being committed , If the self snapshot point is greater than index Note that installation is not required 
	//fmt.Println("[Snapshot] commintIndex", rf.commitIndex)
	if rf.lastIncludeIndex >= index || index > rf.commitIndex {
    
		return
	}
	//  Update snapshot log 
	sLogs := make([]LogEntry, 0)
	sLogs = append(sLogs, LogEntry{
    })
	for i := index + 1; i <= rf.getLastIndex(); i++ {
    
		sLogs = append(sLogs, rf.restoreLog(i))
	}

	//fmt.Printf("[Snapshot-Rf(%v)]rf.commitIndex:%v,index:%v\n", rf.me, rf.commitIndex, index)
	//  Update snapshot subscript / term 
	if index == rf.getLastIndex()+1 {
    
		rf.lastIncludeTerm = rf.getLastTerm()
	} else {
    
		rf.lastIncludeTerm = rf.restoreLogTerm(index)
	}

	rf.lastIncludeIndex = index
	rf.logs = sLogs

	// apply The snapshot should be reset commitIndex、lastApplied
	if index > rf.commitIndex {
    
		rf.commitIndex = index
	}
	if index > rf.lastApplied {
    
		rf.lastApplied = index
	}

	//  Persistent snapshot information 
	rf.persister.SaveStateAndSnapshot(rf.persistData(), snapshot)
}

Specific explanations remain in the code , It is not difficult to understand what this function does . It is worth mentioning that persister.SaveStateAndSnapshot() The function of is explained in introduce There are also , In fact, if the node is killed Lead to crash situation , Then restart after , The persistent data it needs .

CondInstallSnap function ： For this function, combine introduction The explanation in is also very easy to understand , The background of this function is that you sent a snapshot , Then the snapshot you sent should be uploaded to applyCh, And at the same time your appendEntries You also need to upload logs , Could lead to conflict . So the original text is described in this way ：

to avoid the requirement that snapshots and log entries sent on applyCh are coordinated.

But actually , As long as you are applied Do a good job of synchronization , Add mutex . Then this problem can be avoided , So this is also mentioned in the laboratory api It has been abandoned , Discourages the realization of , Simply return a true, Just go .

// CondInstallSnapshot
// A service wants to switch to snapshot. Only do so if Raft hasn't
// have more recent info since it communicate the snapshot on applyChan.
//
func (rf *Raft) CondInstallSnapshot(lastIncludedTerm int, lastIncludedIndex int, snapshot []byte) bool {
    

	// Your code here (2D).

	return true
}

3.3、InstallSnapshot RPC

This part is the interaction diagram leader For updating . Also like lab2a、lab2b I gave a table to implement by myself .
Insert picture description here
Here is a brief mention of when snapshots are needed .
As mentioned before , The time to send a snapshot is actually , Namely leader Send to follower When the log has been abandoned . Then you need to send a snapshot . Therefore, the call entry that sends the snapshot should check the log during log increment . The query condition is rf.nextIndex[server]-1 < rf.lastIncludeIndex Lower than your own snapshot .

			//installSnapshot, If rf.nextIndex[i]-1 Less than equal lastIncludeIndex, explain followers The log of is smaller than its snapshot state , Send your own snapshot 
			//  Also note that it is smaller than the snapshot , It is already relatively backward 
			if rf.nextIndex[server]-1 < rf.lastIncludeIndex {
    
				go rf.leaderSendSnapShot(server)
				rf.mu.Unlock()
				return
			}

RPC Structure ：

type InstallSnapshotArgs struct {
    
	Term             int    //  The term of office of the sending requestor 
	LeaderId         int    //  Of the requesting party LeaderId
	LastIncludeIndex int    //  Snapshot last applied Log subscript for 
	LastIncludeTerm  int    //  Snapshot last applied Current term of office at 
	Data             []byte //  The original byte stream data of the snapshot block 
	//Done bool
}

type InstallSnapshotReply struct {
    
	Term int
}

For parameter design, you can see introduction For some fields offset Split snapshots by offset this experiment does not require .

RPC Function implementation ：

// InstallSnapShot RPC Handler
func (rf *Raft) InstallSnapShot(args *InstallSnapshotArgs, reply *InstallSnapshotReply) {
    
	rf.mu.Lock()
	if rf.currentTerm > args.Term {
    
		reply.Term = rf.currentTerm
		rf.mu.Unlock()
		return
	}

	rf.currentTerm = args.Term
	reply.Term = args.Term

	rf.status = Follower
	rf.votedFor = -1
	rf.voteNum = 0
	rf.persist()
	rf.votedTimer = time.Now()

	if rf.lastIncludeIndex >= args.LastIncludeIndex {
    
		rf.mu.Unlock()
		return
	}

	//  After the snapshot logs cutting , Direct before snapshot applied
	index := args.LastIncludeIndex
	tempLog := make([]LogEntry, 0)
	tempLog = append(tempLog, LogEntry{
    })

	for i := index + 1; i <= rf.getLastIndex(); i++ {
    
		tempLog = append(tempLog, rf.restoreLog(i))
	}

	rf.lastIncludeTerm = args.LastIncludeTerm
	rf.lastIncludeIndex = args.LastIncludeIndex

	rf.logs = tempLog
	if index > rf.commitIndex {
    
		rf.commitIndex = index
	}
	if index > rf.lastApplied {
    
		rf.lastApplied = index
	}
	rf.persister.SaveStateAndSnapshot(rf.persistData(), args.Data)

	msg := ApplyMsg{
    
		SnapshotValid: true,
		Snapshot:      args.Data,
		SnapshotTerm:  rf.lastIncludeTerm,
		SnapshotIndex: rf.lastIncludeIndex,
	}
	rf.mu.Unlock()

	rf.applyChan <- msg

}

For this part, it is actually passed on Dataj Just to be direct applied, Of course, there may be unexpected snapshots in the original array log, Then split these , Note that each time you install a snapshot , To persist , As shown in the interaction diagram .

3.5、 Subscript Tips

In fact, for 2D Subscript update is the most tedious , The subscript of the snapshot should be included every time the log is updated , Because for real subscripts, for example, I have shown above ,1~ 7 In the Journal of Intercepted to 5 Snapshot , At this time, the real log subscript does not 7, It is 2, At this time, you have to use the offset of the snapshot , Go to restore Restore true subscript , For this part , It is recommended to use tool classes to encapsulate , That's right 2D The readability will be higher when modifying .

3、 ... and 、DEBUG gossip

finish lab2D Actually, it takes a long time , although 2D Subscript modification does the whole thing lab2 The most tedious of all , But it is not because of this that it is most difficult . It is 2D Need to be right 2A,2B,2C To do regression tests , Finally, several tests passed . The author has also experienced a lot bug, But after the final exam , I have been writing this blog for some time , Simply recall what you still remember bug.
Because in front of the author ABC It's using select For asynchronous . But found select In fact, there will be many Hidden blockage 、 The deadlock problem 、 The problem of resource leakage , And for multiple ticker Competition between , Not only does it make the code less readable , It may also take too long test Failure . As for how to find it, I still print the log patiently , It is found that those who fail test place , Finally, I have been send a election, After discovery, there are two rf The node has not received a reply from another node . Then the two nodes each hold one vote , As a result, it cannot be elected , Check that the node is stuck apply chan part , Finally, we can trace back to the source and find that the lock has been held and sent ticker Of select The place of . For this part, it is the whole ticker It's been modified , Split the commit log 、 Election log 、 Three log increments ticker.
There's another problem 2C Of figure8 as well as 2B Of RPC counts aren’t too high If you want compatibility between , Be sure to pay attention to three ticker Sleep time between , Coordination , If coordination is inconsistent , It can easily lead to a difficult life test Direct failure , Or one fails through another .
If it can be successfully compatible with the previous three experiments , So the first 4 One of the key points of the experiment is actually the timing of sending snapshots , Change of subscript , And then whether the persistence is involved . for example lab2C At the time of the Persist At this time, more Index and tenure of the persistent snapshot Two parameters .

summary

In fact, it is not easy to stick to it completely , about 2C In fact, it's a little lucky ,2D In fact, many modifications have been made , Consists of three RPC Write it down , In fact, there are not so many places case,paper Medium success Is the most concise . Then maybe I will write lab2 Post distributed summary and experience . And then in June, I will close my heart , Brush algorithm , Reprecipitation java Went to the .