当前位置：网站首页>How to ensure cache and database consistency

How to ensure cache and database consistency

2022-07-26 16:52:00 【Tao Ge is still fly】

Association between cache and database

But as business grows , Your project requests are increasing , At this time, if you read data from the database every time , There must be a performance problem .

The usual practice at this stage is , introduce cache To improve read performance , But how to use it ？

The figure shows the general process of our request

Specific process ：

The write request still only writes to the database
Read requests read the cache first , If the cache does not exist , Read from the database , And rebuild the cache
meanwhile , Write data in cache , Set the expiration time

thus , Infrequently accessed data in the cache , as time goes on , Will gradually 「 Be overdue 」 Eliminate , Reserved in the final cache , Are frequently accessed thermal data , Cache utilization is maximized .

Data consistency issues （ Non concurrent ）

When the data is updated , We don't just have to operate the database , Also operate the cache . The specific operation is , When modifying a piece of data , Not only to update the database , Also update with the cache .

But the database and cache are updated , There are also priority problems , The corresponding scheme is 2 individual ：

Update cache first , Update the database after
Update the database first , Update cache after

Let's not consider concurrency , Under normal circumstances , No matter who comes first, who comes after , Can keep the two consistent , But now we need to focus on anomalies .

Because the operation is divided into two steps , Then there is likely to be a first step success 、 The failure of the second step occurs .

Update cache first , Update the database after

If the cache update succeeds , But database update failed , Then the latest value is in the cache , But there are old values in the database .

Although the read request can hit the cache at this time , Get the right value , however , Once the cache fails , The old value will be read from the database , Rebuilding the cache is also the old value .

At this time, users will find that the data they modified before has changed back , Impact on business .

Update the database first , Update cache after

If the database update is successful , But cache update failed , Then the latest value in the database , Old value in cache .

Subsequent read requests read old data , Only when the cache expires , To get the correct value from the database .

At this time, the user will find , I just modified the data , But I can't see the change , After a while , The data just changed , It will also have an impact on the business .

Ensure the successful execution of the second step , Is the key to solving the problem

Retry after failure , Until success , But to avoid taking up too many resources , Should adopt Asynchronous retry , In fact, it is to write the retry request to the message queue , Then a special consumer will try again , Until success . Or more directly , In order to avoid the failure of the second step , We can cache the operation , Put directly Message queue in , The consumer operates the cache .

Message queuing ensures reliability ： Messages written to the queue , You won't lose until you spend successfully （ Don't worry about restarting the project ）
Message queuing ensures the successful delivery of messages ： Downstream pull messages from the queue , The message will not be deleted until it is consumed successfully , Otherwise, it will continue to deliver messages to consumers （ In line with our scenario ）

As for the write queue failure and the maintenance cost of the message queue ：

Write queue failed ： Operation cache and write message queue , At the same time, the probability of failure is actually very small
Maintenance cost ： Message queues are commonly used in our projects , Maintenance costs have not increased much

Another way ： Subscribe to database change logs , Reoperation cache

take MySQL give an example , When a piece of data is modified ,MySQL A change log will be generated （Binlog）, We can subscribe to this blog , Get the specific operation data , Then based on this data , Delete the corresponding cache .

Consistency problem in concurrency

Suppose we use Update the database first , Update the cache again The plan , And on the premise that both steps can be successfully performed , If there is concurrency , What's going to happen ？

There are threads A And thread B Two threads , You need to update 「 Same article 」 data , This will happen ：

Threads A Update the database （X = 1）
Threads B Update the database （X = 2）
Threads B Update cache （X = 2）
Threads A Update cache （X = 1）

Final X The value of in the cache is 1, In the database is 2, There is an inconsistency .

in other words ,A Although before B happen , but B Time to operate the database and cache , But it's better than that A It's a short time , The execution sequence is out of order , The final result of this data is not in line with expectations . Again Update cache first 、 Updating the database Similar problems will arise in our scheme .

Every time the data changes , All update cache , But the data in the cache may not be read immediately , This will lead to a lot of infrequently accessed data in the cache , Waste cache resources .

So at this point, we need to consider another scheme ： Delete cache .

Can deleting the cache ensure consistency

There are also schemes for deleting the cache 2 Kind of ：

So let's delete the cache , Update the database after
Update the database first , Delete cache after

If the second step fails , Will lead to inconsistent data .

So let's look here at Concurrent Under the circumstances ：

So let's delete the cache , Update the database after

If there is 2 Threads need to be concurrent Reading and writing data , The following scenarios may occur ：

Threads A To update X = 2（ Original value X = 1）
Threads A So let's delete the cache
Threads B Read cache , Discover that there is no , Read old value from database （X = 1）
Threads A Writes the new value to the database （X = 2）
Threads B Write the old value to the cache （X = 1）

Final X The value of in the cache is 1（ The old value ）, In the database is 2（ The new value ）, There is an inconsistency .

so , So let's delete the cache , Update the database after , Happen when read + Write Concurrent , There are still data inconsistencies .

Update the database first , Delete cache after

In cache X non-existent （ database X=1）
Threads A Read database , Get old value （X=1）
Threads B Update the database （X=2）
Threads B Delete cache
Threads A Write the old value to the cache （X=1）

Final X The value of in the cache is 1（ The old value ）, In the database is 2（ The new value ）, Inconsistencies also occur .

At this time, three conditions for inconsistent data are met
The cache just expired
Read request + Write request concurrency
Update the database + Time to delete cache （ step 3-4）, Than reading a database + Short write cache time （ step 2 and 5）
In fact, the probability of occurrence at this time is very low , Because writing a database usually starts with Lock , So write the database , It usually takes longer than reading the database .

Delay and delay double deletion of master-slave database

** Question 1 ：** In both cases, there are old value reentry

Question two ：

Update the database first , Then delete the cache scheme , Read / write separation + The delay of master-slave database can also lead to inconsistency ：

Threads A Update master library X = 2（ Original value X = 1）
Threads A Delete cache
Threads B The query cache , missed , The query gets the old value from the library （ Slave Library X = 1）
Synchronization from Library completed （ Master-slave library X = 2）
Threads B Write the old value to the cache （X = 1）

Final X The value of in the cache is 1（ The old value ）, In the master-slave library is 2（ The new value ）, Inconsistencies also occur .

Solve the first problem ： In a thread A Delete cache 、 After updating the database , Sleep for a while , Delete the cache again .

Solve the second problem ： Threads A A delay message can be generated , Write to message queue , Consumer delays deleting cache .

The purpose of these two programs , All to clear the cache , thus , Next time, you can read the latest value from the database , Write cache .****

原网站

版权声明
本文为[Tao Ge is still fly]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207261620119277.html