当前位置:网站首页>Frequently asked questions about redis

Frequently asked questions about redis

2022-06-26 00:07:00 Just put a flower in heaven and earth

Cache breakdown

What is cache breakdown

Cache breakdown refers to a request to access data , Not in cache , But there are cases in the database .
Generally speaking, this situation is that the cache has expired . But at this time, because there are many users accessing the cache concurrently , This is a hot spot key, Requests from so many users come at the same time , There is no data in the cache , So at the same time, I access the database to get data , Cause a surge in database traffic , The pressure suddenly increases ,
So a data cache , Each request quickly returns data from the cache , But at some point in time, the cache fails , A request did not request data in the cache , At this time, we say this request is " breakdown " Cache .

How to solve the cache breakdown problem

Solve the cache breakdown problem , There are roughly three ideas :

  • Only one request is released to the database , Then do the operation of building cache
    With the help of Redis set( for example :SET mykey Redis EX 1000 NX) Command to set a flag bit . Set successful release , If the setting fails, wait for polling . The released request goes back to the build cache operation .
  • Backstage renewal
    The idea of this scheme is , Open a scheduled task in the background , Actively update data that is about to expire .
  • Never expire
    Why is the cache broken down ? Because the expiration time is set , It is recycled after expiration , Then you can directly set it not to expire , Simple violence !

Cache penetration

What is cache penetration

Cache penetration It refers to a request to access data , Neither the cache nor the database , And users in a short time 、 High density of such requests , Every time you request to the database service , Put pressure on the database .
Generally speaking, such requests are malicious requests . I know that there is no such data here ( There is no cache or data ), But still send such a request , If there are too many such requests , It's easy to overwhelm the database .

How to solve the problem of cache penetration

There are generally two solutions : Caching empty objects and The bloon filter .

Caching empty objects

When the storage tier misses , Even empty objects returned are cached , At the same time, an expiration time will be set , Then accessing this data will get from the cache , Protected back-end data sources .
shortcoming : If null values can be cached , This means that the cache needs more space to store more keys , Because there may be a lot of empty keys ; Even if expiration time is set for null value , There will be some inconsistency between the data of cache layer and storage layer for a period of time , This has an impact on businesses that need to be consistent . How to avoid ?

  • The query result is also cached if it is empty , The cache time is set a little shorter , Or should key Corresponding data insert Clean up the cache after .
  • Yes, it must not exist key To filter . You can put all the possibilities key Put it in a big Bitmap in , Query through the bitmap Filter .

The bloon filter

adopt Bloom filter details This article I believe you should be able to pass redis Implement a simple bloom filter . So how to solve the cache penetration problem through the bloom filter ?
It's very simple , Take all of our redis Of key All are added to the bloom filter in advance , Then judge the request key Is in collection , If it does not exist, it can be considered illegal key Just end the request . In this way, most requests can be filtered out in advance , So as to protect the database .

Cache avalanche

What is a cache avalanche

Cache avalanche It means that a large number of cached data reach the expiration time in a very short time , At the same time, there are many requests to query these cached data , This causes all requests to be made directly to the database , Cause a surge in database traffic , And then cause the database to crash . At this time, there is no data in the cache , There is data in the database .

How to solve the problem of cache avalanche

The solutions are as follows :

  • Data preheating
    In order to prevent cache avalanche, hot data can be preheated in advance , This can avoid a large number of requests as soon as the project goes online , And there is no corresponding data in the cache . Data preheating It means before the formal deployment , Go through the possible data in advance , Manually trigger the loading of different caches key, In this way, a large amount of data that may be accessed will be loaded into the cache .

  • Peak staggering expiration
    Peak staggering expiration is the simplest way to prevent cache avalanche , in other words , Set up key When the expiration time , Add a short random expiration time , Make the cache expiration time as uniform as possible , This can avoid the cache avalanche caused by a large number of caches expiring at the same time .

  • Second level cache
    A1 For the original cache ,A2 Cache for copy ,A1 When the failure , You can visit A2,A1 Cache expiration time is set to short-term ,A2 Set to long term .

  • Current limiting the drop
    The idea of this solution is : After cache failure , Control the number of threads that read the database write cache by locking or queuing . For example, to some key Only one thread is allowed to query data and write cache , Other threads, etc. or directly downgrade .

  • redis High availability
    There is an extreme case of cache avalanche , It's all Redis The server cannot provide external services , In this case , What we have to do is to improve Redis High availability . High availability means , since redis It's possible to hang up , I'll add more redis, After this one goes down, others can continue to work , In fact, it's a cluster built .

Redis and DB Data double write

What is data double writing

How to solve the data double write problem

Using caching can improve performance 、 Relieve database pressure , But using caching can also lead to data inconsistency problems . To solve the problem of cache and database consistency , There are generally three classic modes :

  • Cache-Aside Pattern
    Bypass cache mode , It is proposed to solve the problem of data inconsistency between cache and database as much as possible . This mode is divided into : Reading process and Writing process .
    Reading process : When reading , Read cache first , If cache hits , Direct return data ; If the cache misses , Just read the database , Take data from the database , After putting it in the cache , Return response at the same time .
    Writing process : When it's updated , Update the database first , Then delete the cache .
  • Read-Through/Write through
    Read/Write Through In the pattern , The server takes the cache as the main data storage . The application interacts with the database cache , It's all done through the abstract cache layer . This mode is divided into : Reading process and Writing process .
    Reading process : Read data from the cache , Read back to ; If you can't read , Load from database , After writing to the cache , And back to the response . This process and Cache-Aside Pattern The pattern is very similar to , Actually Read-Through It's just one more layer Cache-Provider. actually ,Read-Through It's just Cache-Aside There's a layer of encapsulation on it , It will make the code more concise , It also reduces the load on the data source .
    Writing process :Write-Through In mode , When a write request occurs , The cache abstraction layer also updates the data source and cache data .
  • Write behind
    Write behind Follow Read-Through/Write-Through There are similar places , It's all by Cache Provider To be responsible for cache and database reading and writing . There's a big difference between them :Read/Write Through Is to update the cache and data synchronously ,Write Behind It just updates the cache , Don't update the database directly , Through batch asynchronous way to update the database .

A comparison of the three models

  • Cache Aside The update mode is relatively simple to implement , But you need to maintain two data stores : One is caching (Cache), One is the database (Repository).
  • Read/Write Through The write mode needs to maintain a data store ( cache ), It's a little more complicated to implement .
  • Write Behind Caching Update mode and Read/Write Through Update mode is similar to , The difference is that Write Behind Caching The data persistence operation of update mode is asynchronous , however Read/Write Through The data persistence operation of update mode is synchronous .
  • Write Behind Caching The advantage of the is that the direct operation of the memory is fast , Multiple operations can be merged and persisted to the database . The disadvantage is that data can be lost , For example, power failure of the system .

Cache-Aside The problem of

in fact , What we use most in actual development is Cache-Aside Pattern Pattern , Next, let's analyze the use of Cache-Aside Pattern Possible problems with patterns .

Question 1 : When updating data ,Cache-Aside Delete cache , Or should we update the cache ?
The answer is to delete the cache rather than update it . Let's look at an example , Let's say I have two threads A and B, Write at the same time .A The thread updates the database first , But because of the network , here B The thread updates the database again , The cache is also updated , Last A The thread updates the cache , In this scenario, the database data and cache data are inconsistent . If the deletion cache replaces the update cache, the dirty data problem will not appear .
Update cache versus delete cache , There are two disadvantages :

  • If you write the cache value , If it's a complex calculation , If the cache is updated frequently , It's a waste of performance .
  • In the case of more writing and less reading , Most of the time, the data has not been read , It's been updated again , This also wastes performance ( actually , Write more scenes , It's not very cost-effective to use caching )

Question two : In the case of double writing , Operate database first or cache first ?
The answer is to operate the database first . for instance , Suppose there is A、B Two requests , request A Do update operation , request B Do query read operation , If you are operating the cache first , The operation sequence of two threads may appear as follows :

  1. Threads A Initiate a write operation , The first step is to delete the cache
  2. The thread B Initiate a read operation , Found no data in cache
  3. Threads B Continue to read DB, Read out an old data
  4. Then the thread B Set the old data into the cache
  5. Threads A write in DB The latest data

In this scenario, the database is new data , But the cache is old data , This leads to data inconsistency .

A scheme to ensure data double write consistency

There are roughly three schemes to guarantee Redis and DB Data consistency : Delay double delete strategy Delete cache retry mechanism and Read biglog Delete cache asynchronously . In fact, the scheme mentioned here and the ideas mentioned above can be considered as complementary , These three schemes also contain different design ideas , But they are more inclined to concrete logical implementation .

Delay double delete strategy

So-called Delay double delete Delete the cache first , Update the database , Sleep for a while , Delete cache again .
disadvantages : Combined with double deletion strategy + Cache timeout settings , The worst case scenario is that the data is inconsistent within the timeout period , It also increases the time-consuming of writing requests .

Question 1 : How to determine the sleep time ?
answer : The sleep time is determined according to the time spent reading the data business logic of your project . This is mainly to ensure that the read request ends before the write request is completed , Write requests can remove cached dirty data caused by read requests .
There may be scenes :A The write request first deletes the cached data , But in A Before updating the database, another one came B Read request , And B First, I read the old data in the database , then A Updated database , If A If the sleep time is very short , Soon A The cache is deleted for the second time after hibernation , But now because of B After reading the data , Need a bunch of calculations , Wait like this B When the calculation is completed and the data is updated into the cache ,A You may have completed the second deletion , This leads to data inconsistency .

Question two : What if you use a read-write separation architecture ?
answer : In fact, you can still use the delayed double deletion strategy , Only the sleep time is modified to be based on the delay time of master-slave synchronization , Add a few hundred ms that will do .

Question 3 : Adopt this synchronous elimination strategy , What to do with throughput reduction ?
answer : The second deletion can be implemented asynchronously . That is to start a thread by yourself , Delete asynchronously , The request written in this way does not need to sleep for a period of time before returning . Do it , Increase throughput .

Delete cache retry mechanism

Whether it's delayed double deletion or Cache-Aside First operate the database and then delete the cache , If the second step of deleting the cache fails ? If delete cache fails , There will also be data inconsistency . In this case, we can introduce Delete cache retry mechanism To solve this problem , The general steps are as follows :

  • Write requests to update the database
  • Caching for some reason , Delete failed
  • Delete the failed key Put it in the message queue
  • Consume messages from message queues , Get the key
  • Retry the delete cache operation

Read biglog Delete cache asynchronously

The delete cache retry mechanism can solve the problem of data inconsistency caused by cache deletion failure , But it will cause many business code intrusions . Actually , Also through the database of binlog To eliminate key. With mysql For example , You can use Ali's canal take binlog Log collection sent to MQ Inside the queue , Then write a simple cache to delete the message subscriber subscription binlog journal , According to the update log Delete cache , And through ACK The mechanism confirms the processing of this update log, Ensure data cache consistency .

Actually , It's easy to see Read biglog Delete cache asynchronously In fact, that is Delete cache retry mechanism Another way to realize .

Here we need to pay attention to the problems caused by master-slave delay , If a read request comes in after the write request updates the main database and deletes the cached data , But at this time, the master-slave replication is delayed, resulting in the old data in the slave database , At this time, the read request will get the expired data from the library and update it to the cache , This still causes data inconsistencies . The solution to this situation is : Read logs from the library . Some students may have questions , If a master database has multiple slave databases , Which slave database should we subscribe to ? To solve this problem, we need to know about open source software canel Principle :canal It's a disguised as slave subscribe mysql Of binlog, Middleware for data synchronization .

Redis The concurrency problem of contention in

What is? Redis The concurrency problem of contention in

This is also a very common online problem , Multiple clients write one concurrently key, Maybe the data that should have arrived first came later , Wrong data version , Or multiple clients get one at the same time key, Change the value and write it back , As long as the order is wrong , The data is wrong . These scenes are collectively referred to as “Redis Contention concurrency problem ”. It should be noted that ,Redis The contention concurrency problem usually occurs in Redis When faced with a large number of requests , For example, there are tens of thousands of read and write requests per second , For applications with low request volume, this problem will not occur .

How to solve Redis The concurrency problem of contention in

From the above description, we can see , There are many scenarios of competing concurrency problems , Different scenarios have different solutions .

Scene one
There are multiple clients that need to operate at the same time keyA, however keyA The value of must be queried in the database and then written to the cache , There are two points to be noted : First, query from the database and write to the cache should be atomic , 2. The sequence of multiple client requests .

Scene two
There are multiple requests to reduce the inventory of a certain commodity :1. Get the current inventory value 2. Calculate new inventory value 3. Write new inventory value
. But there is a wrong situation :A The requested inventory is 30,B Request for inventory is also 30, then A subtract 5, Inventory surplus 25,B subtract 5, Inventory surplus 25, At this time, there is no requirement for the order of updating inventory values , But obviously the final inventory value is different from what we expected .

Using distributed locks + Timestamp scheme

This scheme is suitable for scenarios that require sequence . for example , Scene one .
In this case, distributed locks are used + The timestamp method can be solved , The first point can be solved by using distributed locks , Using time stamps can achieve sequencing , When the following request is executed set Determine the timestamp before operation , If you find that the timestamp of an existing value is greater than your own timestamp , Just give up this modification , If it is less than, modify it directly .

However, this scheme requires that the time of each system is consistent , Otherwise, the timestamp is meaningless , The version number can be used instead of .

Use Redis Of watch

This scheme is suitable for scenarios that require sequence . for example , Scene two .
Be careful not to use in a partitioned cluster
In depth understanding of Redis Business

Using message queuing

This method is a general solution in some high concurrency scenarios . Whether or not order is required , Can use this scheme .

In the case of too much concurrency , It can be processed through message middleware , Serializing parallel reading and writing . hold Redis.set Actions are placed in a queue to serialize , One by one implementation required .

原网站

版权声明
本文为[Just put a flower in heaven and earth]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206252118150762.html