当前位置：网站首页>Baidu simian: talk about persistence mechanism and rdb/aof application scenario analysis!

Baidu simian: talk about persistence mechanism and rdb/aof application scenario analysis!

2022-06-24 12:44:00 【Java program ape】

One 、Redis How persistence works ？

What is persistence ？Redis The real question of the interview , Simply put, it is to put the data into the device that will not lose the data after power failure , That is, we usually understand the hard disk .

First of all, let's see what the database does when it writes , There are five main processes ：

The client sends the write operation to the server （ The data is in the client's memory ）.
The database server receives the data of write request （ The data is in the memory of the server ）.
Server call write This system call , Write data to disk （ The data is in the buffer of the system memory ）.
The operating system transfers the data in the buffer to the disk controller （ Data in disk cache ）.
The disk controller writes data to the physical media of the disk （ The data is really on disk ）.

Fault analysis

The writing operations are as follows 5 A process , Let's combine the above 5 A process to see the various levels of failure ：

When the database system fails , At this time, the system kernel is still intact . So at this time, as long as we finish executing the 3 Step , So data is safe , Because the subsequent operating system will complete the following steps , Make sure that the data will end up on disk .
When the system is powered off , This time up 5 All the caches mentioned in item will be invalidated , And the database and the operating system will stop working . So only when the data is finished 5 Step after , In order to ensure that data will not be lost after power failure .

Pass above 5 Step by step , Maybe we would like to find out the following questions ：

How often does the database call write, Write data to kernel buffer ？
How long does the kernel write the data in the system buffer to the disk controller ？
When does the disk controller write the cached data to the physical media ？

For the first question , Generally, the database level will be under full control . And on the second question , The operating system has its default policy , But we can also go through POSIX API Provided fsync A series of commands forces the operating system to write data from the kernel to the disk controller . For the third question , It's like the database is out of reach , But actually , In most cases, the disk cache is set to be turned off , Or just open for read cache , That is to say, write operations will not be cached , Write directly to disk .

The recommended approach is to turn on write caching only when your disk device has a spare battery .

Data corruption

Data corruption , It's just that data can't be recovered , We talked about how to ensure that the data is actually written to the disk , But writing to disk may not mean that the data will not be corrupted . For example, we may have two different write operations for one write request , When an accident happens , May cause a write operation to complete safely , But another time hasn't been done yet . If the data file structure of the database is not reasonably organized , It may lead to the situation that the data cannot be recovered at all .

There are usually three strategies for organizing data , To prevent data files from being damaged to irrecoverable conditions ：

The first is the roughest treatment , It is not through the organization of data to ensure the recoverability of data . But by configuring data synchronization backup , After the data file is damaged, it can be recovered through data backup . actually MongoDB Do not open the operation log , By configuring Replica Sets That's what happened when .
The other is to add an operation log based on the above , Remember the action of the operation every time , In this way, we can recover the data through the operation log . Because the operation log is written in the way of sequential appending , So there will be no operation log can not be recovered . This is similar to MongoDB When the operation log is turned on .
The more safe way is not to modify the old data in the database , Just add in the way to complete the write operation , So the data itself is a log , In this way, the data can never be recovered . actually CouchDB It's a good example of this .

　 Two 、Redis Provides RDB Persistence and AOF Persistence

RDB The advantages of mechanism and its application

RDB Persistence refers to writing the data set snapshot in memory to disk within a specified time interval . It's also the default way to persist , This way is to write the data in memory to the binary file as a snapshot , The default filename is dump.rdb.

The snapshot persistence can be done automatically through configuration settings . We can configure redis stay n If it exceeds in seconds m individual key Automatically take a snapshot if it is modified , Here is the default snapshot save configuration

   save 900 1     #900 If it exceeds in seconds 1 individual key Be modified , Then initiate snapshot saving 
   save 300 10    #300 Second content if more than 10 individual key Be modified , Then initiate snapshot saving 
   save 60 10000

RDB File saving process

redis call fork, Now there are child processes and parent processes .
The parent process continues to process client request , Subprocesses are responsible for writing memory contents to temporary files . because os The write time replication mechanism of （copy on write) The parent-child process will share the same physical page , When the parent process processes the write request os A copy of the page to be modified by the parent process , Instead of writing shared pages . So the number in the address space of the subprocess According to fork A snapshot of the entire database at all times .
When the subprocess has finished writing the snapshot to the temporary file , Replace the original snapshot file with a temporary file , Then the subprocess exits .

client You can also use save perhaps bgsave Order notice redis Do a snapshot persistence .save The operation is to save the snapshot in the main thread , because redis Is to use a main thread to handle all client Request , This way it will block all client request . So... Is not recommended .

Another thing to note is , Each snapshot persistence is a complete write of memory data to disk once , Does not Incremental synchronization of dirty data only . If there's a lot of data , And there are many write operations , It's bound to cause a lot of disks io operation , Performance may be seriously affected .

advantage

Once adopted , So your whole Redis The database will contain only one file , This is very convenient for backup . For example, you may not plan to 1 Days to file some data .
Easy backup , We can easily put one by one RDB Move files to other storage media
RDB Speed ratio when recovering large data sets AOF It's faster to recover .
RDB Can be maximized Redis Performance of ： The parent process is saving RDB The only thing to do when you file is fork Make a sub process , Then this subprocess will handle all the subsequent saving work , The parent process does not need to execute any disks I/O operation .

Inferiority

If you need to try to avoid losing data in the event of a server failure , that RDB Not for you. . although Redis Allows you to set different savepoints （save point） To control the preservation RDB File frequency , however , because RDB The file needs to save the state of the entire dataset , So it's not an easy operation . So you may at least 5 Minutes to save RDB file . under these circumstances , In the event of a breakdown stop , You could lose a few minutes of data .
Every time you save RDB When ,Redis Both fork() Make a sub process , And it's up to the subprocesses to do the actual persistence work . When the data set is large , fork() It can be very time consuming , Cause the server to stop processing the client in a millisecond ; If the data set is very large , also CPU When time is very tight , So this kind of stop time may even be as long as a whole second . although AOF Rewriting also requires fork() , But no matter AOF How long is the execution interval of the rewrite , There will be no loss of data durability .

AOF File saving process

redis Will pass every written order received write Function appended to file ( The default is appendonly.aof).

When redis During restart, the contents of the whole database will be rebuilt in memory by executing the write command saved in the file again . Of course, because os Will be cached in the kernel write Changes made , So it may not be written to disk immediately . such aof The persistence of the method is also likely to lose some modifications . But we can tell... Through the configuration file redis We want to adopt fsync The function forces os Time to write to disk . There are three ways （ The default is ： Per second fsync once ）

appendonly yes              // Enable aof Persistence mode # 
appendfsync always      // Force write to disk every time write command is received , The slowest , But make sure it's completely persistent , It is not recommended to use 
appendfsync everysec     // Force write to disk every second , A good compromise between performance and persistence , recommend # 
appendfsync no    // Completely dependent on os, Best performance , Persistence is not guaranteed

aof The way it works also raises another question . Persistent files get bigger and bigger . For example, we call incr test command 100 Time , All files must be saved 100 Bar command , In fact, there are 99 All are superfluous . Because to restore the state of the database, a file is saved set test 100 That's enough .

To compress aof Persistent files for .redis Provides bgrewriteaof command . Received this order redis Data in memory will be stored in a similar way to snapshots Save to a temporary file by command , Finally replace the original file . The specific process is as follows

redis call fork , Now there are two processes, father and son
Subprocess according to database snapshot in memory , Write the command to the temporary file to rebuild the database state
The parent process continues to process client request , In addition to writing the write command to the original aof In file . At the same time, cache the received write commands . This ensures that if the subprocess fails to rewrite, there will be no problem .
After the subprocess writes the snapshot content to the temporary file in the command mode , The child process signals the parent process . Then the parent process writes the cached write command to the temporary file .
Now the parent process can replace the old... With a temporary file aof file , And rename , Later, the written orders received began to go to the new aof Add... To the file .

Note that it's rewriting aof Operation of file , Not reading the old aof file , Instead, the database contents in the whole memory are rewritten with a new command aof file , It's a bit like a snapshot .

advantage

Use AOF Persistence makes Redis Become very durable （much more durable）： You can set different fsync Strategy , Like none fsync , Once per second fsync , Or every time a write command is executed fsync . AOF The default policy for is per second fsync once , In this configuration ,Redis Still maintain good performance , And even in the event of a breakdown , And only lose one second of data at most （ fsync Will execute in the background thread , So the main thread can continue to work hard on command requests ）.
AOF A file is a log file that only appends （append only log）, So right. AOF Writing files does not need to be done seek , Even if the log contains commands that are not written completely for some reason （ For example, the disk is full when writing , Write stoppage , wait ）, redis-check-aof Tools can also easily fix this problem . Redis Can be in AOF When the file size becomes too large , Automatically in the background AOF Rewrite ： The rewritten new AOF The file contains the minimum set of commands required to recover the current dataset . The whole rewrite operation is absolutely safe , because Redis Creating a new AOF In the process of documentation , Will continue to append the command to the existing AOF In the document , Even if there is a outage during the rewrite , The existing AOF Documents will not be lost . And once it's new AOF File creation complete ,Redis From the old AOF File switch to new AOF file , And start on the new AOF File to append .
AOF The file holds all writes to the database in an orderly manner , These write operations to Redis The format of the protocol is saved , therefore AOF The contents of the document are very easy to read , Analyze the document （parse） It's easy too . export （export） AOF The documents are also very simple ： for instance , If you don't execute it carefully FLUSHALL command , But as long as AOF The file has not been rewritten , So just stop the server , remove AOF At the end of the document FLUSHALL command , And restart Redis , You can restore the dataset to FLUSHALL Status before execution .

Inferiority

For the same dataset ,AOF The volume of the file is usually larger than RDB Volume of file .
According to the fsync Strategy ,AOF May be slower than RDB . In general , Per second fsync Performance is still very high , Shut down fsync It can make AOF Speed and RDB As fast as , Even under high load . But when dealing with large write loads ,RDB More guaranteed maximum delay time （latency）.
AOF This has happened in the past bug ： Because of individual orders , Lead to AOF When the file is reloaded , Unable to restore the dataset as it was when it was saved . （ for instance , Blocking order BRPOPLPUSH It has caused such bug .） The test suite adds tests for this situation ： They will automatically generate random 、 Complex datasets , And by reloading the data to make sure everything is OK . Although this kind of bug stay AOF Not common in documents , But by contrast , RDB This is almost impossible bug Of .

Choose

Generally speaking , If you want to achieve enough PostgreSQL Data security of , You should use both persistence functions at the same time . If you are very concerned about your data , But it can still withstand data loss within a few minutes , Then you can only use RDB Persistence .

原网站

版权声明
本文为[Java program ape]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/05/20210526171258213t.html

当前位置：网站首页>Baidu simian: talk about persistence mechanism and rdb/aof application scenario analysis!

Baidu simian: talk about persistence mechanism and rdb/aof application scenario analysis!

One 、Redis How persistence works ？

First of all, let's see what the database does when it writes , There are five main processes ：

Fault analysis

Pass above 5 Step by step , Maybe we would like to find out the following questions ：

The recommended approach is to turn on write caching only when your disk device has a spare battery .

Data corruption

Two 、Redis Provides RDB Persistence and AOF Persistence

RDB The advantages of mechanism and its application

RDB File saving process

advantage

Inferiority

AOF File saving process

advantage

Inferiority

Choose

边栏推荐

猜你喜欢

随机推荐