当前位置：网站首页>PHP solves the problems of cache avalanche, cache penetration and cache breakdown of redis

PHP solves the problems of cache avalanche, cache penetration and cache breakdown of redis

2022-07-05 10:18:00 【Emma'】

One ： Preface

Design a caching system , The question that has to be considered is ： Cache penetration 、 Avalanche effect in cache breakdown and failure .

Two ： Cache penetration

Cache penetration refers to querying a certain nonexistent data , Because the cache is written passively on Miss , And for the sake of fault tolerance , If no data can be found from the storage tier, it will not be written to the cache , This will cause the non-existent data to be queried by the storage layer every time it is requested , It loses the meaning of caching . A large amount of requested data is not obtained from the cache , Cause the database to go , It's possible to bring down the database , Paralyze the whole service . When the flow is large , Probably DB It's gone , If someone takes advantage of something that doesn't exist key Attack our applications frequently , This is the loophole .
For example, the article list , Generally, our primary key ID Are unsigned self increasing types , Some people want to destroy your database , Use negative numbers for each request ID, and ID Negative records don't exist in the database at all .

Solution

There are many ways to effectively solve the problem of cache penetration , The most common is the use of bloon filter , Hash all possible data to a large enough bitmap in , A certain nonexistent data will be This bitmap Intercept , Thus, the query pressure on the underlying storage system is avoided . There's also a simpler and rougher way （ This is what we use ）, If the data returned by a query is empty （ No matter how many There is no such thing as , Or a system failure ）, We still cache this empty result , But its expiration time will be very short , Up to five minutes .

For things like ID Illegal requests with negative numbers are directly filtered out , Use of Blum filter (Bloom Filter). The bloon filter ：

<?php
class Bloom {
    
 
    //  The number of hash functions 
    protected $hashNum = 3;
 
    //  The size of the digit group 
    protected $bitArrayCount = 1024*10;
 
    //  Digit group 
    protected $bitArray = [];
 
    public function __construct()
    {
    
        //  Build default digit group , All set to  false
        $this->bitArray = array_pad([], $this->bitArrayCount, false);
    }
 
    /** *  obtain  hash  function ; That is, in the bit array , Need to be changed to  true  The index of  * @param string $key  Elements  * @return array */
    protected function getIndexes($key)
    {
    
        $indexes = [];
        for ($i = 0; $i < $this->hashNum; $i ++) {
    
            $index = sprintf('%u', crc32($key . $i));  //  Use  crc32  hash 
            $index = $index % $this->bitArrayCount;       //  obtain   In digit group   Location 
            $indexes[] = $index;
        }
 
        return $indexes;
    }
 
    /** *  Add elements to the filter  * @param string $key  Elements to add  */
    public function addItem($key)
    {
    
        $indexes = $this->getIndexes($key);
 
        //  take  hash  The corresponding bit of the result is modified to  true
        foreach ($indexes as $index) {
    
            $this->bitArray[$index] = true;
        }
    }
 
    /** *  Whether this element exists in the filter ; true  Indicates that it is likely to exist ,false  It must not exist  * @param string $key  Elements  * @return array */
    public function mightExist($key)
    {
    
        $indexes = $this->getIndexes($key);
 
        foreach ($indexes as $index) {
    
            if (! $this->bitArray[$index]) {
    
                return false;
            }
        }
        return true;
    }
}
 
class Test
{
    
    public function run()
    {
    
        $bloom = new Bloom();
 
        //  Add... To the filter  1000  Elements 
        for ($i = 0; $i < 100; $i ++) {
    
            $bloom->addItem($i);
        }
 
        //  test   Filter judgment result 
        for ($i = 90; $i < 110; $i ++) {
    
            
            $mightExist = $bloom->mightExist($i);
 
            if ($mightExist) {
    
                echo "might exist ", $i, PHP_EOL;
            } else {
    
                echo "not exist ", $i, PHP_EOL;
            }
        }
    }
}
 
(new Test())->run();

For records that cannot be found in the database , We still store the empty data in the cache , Of course, a shorter expiration time is usually set .

// Set up articles ID by -10000 The cache of is empty 
$id = -10000;
$redis->set('article_content_' . $id, '', 60);
 
var_dump($redis->get('article_content_' . $id));

3、 ... and ： Cache avalanche

Cache avalanche is when we set up the cache with the same expiration time , Causes the cache to fail at the same time at a certain time , Forward all requests to DB,DB Avalanche with excessive instantaneous pressure .
The reason for invalidating the cache set ：

redis The server is down .
Set the same expiration time for cached data , This causes the cache set to fail in a certain period of time .

Solution

The avalanche effect of cache failure has a terrible impact on the underlying system . Most system designers consider using lock or queue to guarantee the single line of cache cheng （ process ） Write , So as to avoid a large number of concurrent requests falling on the underlying storage system in case of failure . Here's a simple solution to spread the cache failure time , For example, we can add a random value to the original failure time , such as 1-5 Minutes at random , In this way, the repetition rate of each cache expiration time will be reduced , It's hard to trigger a collective failure .
How to solve cache set invalidation ：

For the reason 1, Can achieve redis High availability ,Redis Cluster perhaps Redis Sentinel( sentry ) And so on .
For the reason 2, Add a random value when setting the cache expiration time , Avoid cache expiration at the same time .

<?php
 
$redis = new Redis();
$redis->connect('127.0.0.1', 6379, 60);
$redis->auth('');
 
// Set the expiration time plus a random value 
$redis->set('article_content_1', ' Article content ', 60 + mt_rand(1, 60));
$redis->set('article_content_2', ' Article content ', 60 + mt_rand(1, 60));

Use a dual cache strategy , Set up two caches , Raw cache and standby cache , When the original cache expires , Access alternate cache , Set the standby cache expiration time to be longer .

// Raw cache 
$redis->set('article_content_2', ' Article content ', 60);
// Set the standby cache , Set the expiration time longer 
$redis->set('article_content_backup_2', ' Article content ', 1800);

Four ： Cache breakdown

For some with expiration set key, If these key It may be accessed at some point in time with super high concurrency , It's a very “ hotspot ” The data of . This is the time , A question needs to be considered ： The cache is “ breakdown ” The problem of , The difference between this and cache avalanche is that this is for a certain key cache , The former is a lot of key.
The difference between cache breakdown and cache avalanche is that this is for a certain hot key cache , The avalanche is aimed at the centralized invalidation of a large number of caches .

When the cache expires at a certain point in time , Right at this point in time Key There are a lot of concurrent requests coming , These requests usually find that the cache is expired from the back end DB Load data and reset to cache , At this time, a large number of concurrent requests may instantly put the back end DB Overwhelmed .

Solution
1、 Make this popular key Your cache never expires .
there “ Never expire ” It has two meanings ：

(1) from redis Look up , It's true that the expiration time is not set , That's the guarantee , There will be no hot spots key Overdue problem , That is to say “ Physics ” Not overdue .
(2) functionally , If it doesn't expire , That's static ？ So we keep the expiration date key Corresponding value in , If it's found to be overdue , Build the cache through a background asynchronous thread , That is to say “ Logic ” Be overdue

From the perspective of actual combat , This method is very performance friendly , The only drawback is when building the cache , The rest of the threads ( Threads that are not building the cache ) Maybe it's old data , But for general Internet functions, this is tolerable .
2、 Use mutexes , adopt redis Of setnx Implement mutually exclusive lock .
A common practice in the industry , It's using mutex. To put it simply , When the cache fails （ Judge that the value is empty ）, Not immediately load db, Instead, use some operations with the return value of the successful operation of the caching tool first （ such as Redis Of SETNX perhaps Memcache Of ADD） Go to set One mutex key, When the operation returns success , Proceed again load db And reset the cache ; otherwise , Just try the whole thing again get Caching method .

SETNX, yes 「SET if Not eXists」 Abbreviation , That is, it can only be set when it does not exist , It can be used to achieve the lock effect .

<?php
 
function getRedis()
{
    
    $redis = new Redis();
    $redis->connect('127.0.0.1', 6379, 60);
    return $redis;
}
 
// Lock 
function lock($key, $random)
{
    
    $redis = getRedis();
    // Set the lock timeout , Avoid lock release failure ,del() operation failed , Generate deadlock .
    $ret = $redis->set($key, $random, ['nx', 'ex' => 3 * 60]);
    return $ret;
}
 
// Unlock 
function unLock($key, $random)
{
    
    $redis = getRedis();
    // The function of random numbers here is , Prevent the update cache operation from taking too long , The effective time of the lock has been exceeded , Cause other requests to get the lock .
    // But after the last request to update the cache , Delete the lock without judgment , The locks created by other requests will be deleted by mistake .
    if ($redis->get($key) == $random) {
    
        $redis->del($key);
    }
}
 
// Get article data from cache 
function getArticleInCache($id)
{
    
    $redis = getRedis();
    $key = 'article_content_' . $id;
    $ret = $redis->get($key);
    if ($ret === false) {
    
        // Generate lock key
        $lockKey = $key . '_lock';
        // Generate random number , Used to set the value of the lock , It will be used when releasing the lock later 
        $random = mt_rand();
        // Get the mutex 
        if (lock($lockKey, $random)) {
    
            // This is pseudo code , Means to get article data from the database 
            $value = $db->getArticle($id);
            // Update cache , The expiration time can be adjusted according to the situation 
            $redis->set($key, $value, 2 * 60);
            // Release the lock 
            unLock($lockKey, $random);
        } else {
    
            // wait for 200 millisecond , Then get the cache value again , Let other processes that get the lock get the data and set the cache 
            usleep(200);
            getArticleInCache($id);
        }
    } else {
    
        return $ret;
    }
}

3、" advance " Use mutexes (mutex key)：
stay value Internal settings 1 Timeout value (timeout1), timeout1 More than practical memcache timeout(timeout2) Small . When from cache Read timeout1 When it's found out it's overdue , Extend now timeout1 And reset to cache. Then load the data from the database and set it to cache in .

4、 resource protection ：
use netflix Of hystrix, It can be used as a resource isolation and protection main thread pool , If you apply this to the construction of cache, it's not too bad .

Four solutions ： There is no best but the best

8、 ... and ： summary

For business systems , It's always a case by case analysis , No best , Only the most suitable . Last , For the cache system common cache full and data loss problems , It needs to be analyzed according to the specific business , Usually we use LRU Policy handling overflow ,Redis Of RDB and AOF Persistence strategy to ensure the data security under certain circumstances .

原网站

版权声明
本文为[Emma']所创，转载请带上原文链接，感谢
https://yzsam.com/2022/186/202207050948044821.html