当前位置:网站首页>Redis - hot key issues
Redis - hot key issues
2022-07-02 06:25:00 【Qihai Jianren】
This paper mainly introduces Redis Hot spots in Key problem , Including hot spots Key Cause of occurrence 、 How to monitor and discover hotspots key And hot spots Key Solutions for ;
In fact, it's hot key The problem is very simple , There are a lot of requests to visit in an instant redis Go to some fixed key,Redis Will be based on key Assign hashico , And in the Redis When the number of nodes remains unchanged , The range of hash slots allocated by each node is generally unchanged , This leads to so much for a single fixation key Your request hit a redis Node , Thus crushing the cache service ;
In fact, there are many such examples in life , For example, for bilibili APP Come on ,2022 MSI RNG The information on the hot news page of winning the championship is usually placed in Redis Medium , It's pushing PUSH After giving it to the full number of users , There may be a moment when a large number of users click to enter the news page , So about this hot news Key Your request will instantly increase , Will appear Redis heat Key problem ;
1. What is a hot spot Key problem
heat key The problem is that there are a lot of requests to visit at a certain moment Redis Go to some fixed key, Cause cache breakdown , All requests have been made DB On , Crushing cache services and DB service , Thus affecting the availability of application services ;
Be published in large numbers 、 Browsing hot news 、 Hot reviews 、 Star live broadcast, etc , These typical scenarios of reading more and writing less will produce hot spots key problem ; relative DB,Redis The query performance of will be much higher , But no matter how good the query performance is, there is a threshold , For example, I met zero when I was on duty during the Spring Festival APP The splash screen activity at startup hung up a Redis node , Causes the node to restart , All on this node key Query not available ;
Contacted the business of the operation and maintenance team DBA, Be informed of the company Rdis The query performance of a single node is generally 2W Of QPS, therefore , For a single fixed key The query of cannot exceed this value ;
When the server reads data for access , Data is often segmented (Redis The hashico of ), This process will be in A certain Redis Node host Server Up to the corresponding Key Visit , When access exceeds this node Server At the limit of , It will lead to hot spots Key Problem generation ;
2. Like what? key It is called heat key
Usually, the Key Determined by the requested frequency , At present, there is no specific numerical value to define heat key, But the following example can be used as a reference , Such as :
(1)QPS focus In particular Key:Redis Total number of instances QPS( Query rate per second ) by 2W, And one of them Key The number of visits per second has reached 1W above ;
(2) Bandwidth usage focus In particular Key: For a with thousands of members and a total size of 1MB The above HASH Key, Send a large number of messages per second HGETALL Operation request ;
(3)CPU Proportion of use time focus In particular Key: For a person with tens of thousands of members Key(ZSET type ) Send a large number of messages per second ZRANGE Operation request ;
3. hotspot Key The harm of the problem
Redis The hot key When it appears , It often brings great harm and hidden danger ;
- Flow concentration , The physical network card limit is reached
When a hot spot Key When the request of exceeds the upper limit of the network card traffic of the host on which a node is located , Because of the over concentration of traffic , This will cause other services in the server of this node to fail ;
- Too many requests , Cache fragmentation service is broken
It is also introduced above ,Redis Single point query performance is limited , When hot key Your query exceeds Redis The performance threshold of the node , Requests take up a lot of CPU resources , Affect other requests and cause overall performance degradation ; In serious cases, the cache fragmentation service will be broken , One of the manifestations is Redis Node self restart , At this time, all stored in this node key All queries are unavailable , It will affect other businesses ;
- Under the cluster architecture , Generate access skew
That is, a piece of data is accessed in large numbers , Other data slices are idle , The number of connections that may cause this data fragment to be exhausted , The new connection establishment request is rejected ;
- DB breakdown , Cause a business avalanche
heat Key The number of requested pressures exceeds Redis It is easy to cause cache breakdown , When the cache hangs , At this time, another request is generated , You may directly call a large number of requests directly to DB On the floor , because DB The query performance of layer is weaker than that of cache layer , It's easy to happen in the face of big requests DB avalanche phenomena , Serious impact on business ;
- In the rush buying or spike scenario , It may be because the goods correspond to the inventory Key Too many requests for , beyond Redis Processing capacity causes oversold .
3. How to monitor and discover hotspots Key
- With business experience , Make an estimate of what's hot key
In fact, this method is quite feasible , For example, a whole point spike , Activity information key、 Store the information of seckill products on the head floor key Generally, it is a hot spot key; But not every hot key Can be accurately predicted , For example, for e-commerce platforms , It is difficult to predict when businesses will launch the relatively popular seckill activities , However, we can make some reference by analyzing the historical activities of different businesses ;
- The business side monitors and collects by itself
This way is to operate redis Before , Add a line of code for data statistics , Asynchronous escalation behavior ; Such as log collection , Will single redis Command operation / result / Time consuming statistics , Send asynchronous messages to the collection message queue , The disadvantage is to invade the code , Generally, it can be given to the middleware to add in its own package redis Two party package ; If there is a better one Daas platform , Can be in proxy Layer monitoring , Business doesn't need to be perceived , Unified in Daas Platform view redis monitor ;
- use redis Own command
(1)monitor command : This command can capture and retrieve in real time redis Command received by the server , Then write code to count the heat key What is it ; Of course , There are also ready-made analysis tools for you to use , such as redis-faina; But the command is under high concurrency conditions , There is a hidden danger of memory explosion , It will also lower redis Performance of .
(2)hotkeys Parameters :redis 4.0.3 Provides redis-cli The hot key Discovery function , perform redis-cli When combined with –hotkeys Options can be ; However, when the parameter is executed , If key More , It's slow to implement ; Reference resources :Redis 4.0 hotspot Key Query methods ;
This method is theoretically feasible , But generally, companies do not allow direct connection redis The node inputs the command itself , But directly through Daas Platform view hotspot key Analysis and monitoring of ;
4. hotspot Key Solutions for
From the above Redis The hot key The cause and harm of , There are several solutions in practice ;
(1) Use L2 cache
Use local cache , Such as utilization ehcache、GuavaCache etc. , Even a HashMap Fine ; In the discovery of heat key in the future , Heat up key Loaded into the system JVM in , For this heat key request , It will fetch directly from the local cache , Instead of directly requesting redis;
The local cache will naturally be the same key A lot of requests , According to the load balancing of the network layer , Evenly distributed to different machine nodes , Avoid fixing key Hit all to a single redis Node situation , And it reduces 1 Second network interaction ;
Of course , The inevitable problem of using local cache is , For businesses that require strong cache consistency , We need to spend more energy on ensuring the consistency of distributed cache , It will increase the complexity of the system ;
(2) Will be hot key Distributed to different servers
The plan is also very simple , Don't let the fixed key Always go to the same station redis Node ; So let's take this key, In more than one redis Just back up one copy on each node , In the heat key When asked to come in , We're having backup redis Choose a random one , Access values , Return the data , Can alleviate redis Single point heat key Query pressure of ;
because redis It's based on key Assign hashico , So when initializing , Can be key Splice random tail patches , The following figure 0-2N, Multiple backups generated key Scattered in all redis Node , When querying, it is also randomly spliced into these multiple backups key One of them , The query , So that reading and writing are no longer focused on a single redis node ;
The above method is just a way of thinking , Heat up key The value of is passed key The tails are , Backup in different Redis Node ; If you really want to be in every Redis All nodes are backed up hot key, It is suggested that Proxy Layer to complete , It is insensitive to the client , The premise is the company's DBA The team is awesome , Otherwise, you need to calculate and maintain by yourself ;
Of course , Backup key The method also inevitably faces the problem of distributed cache consistency , and redis Its publish and subscribe function can also support this , primary key When there is a change , Each backup node monitors and completes synchronization , Of course, you can also traverse all backups key Perform synchronous updates ;
(3) heat key Split
heat key One of the key problems is that there are too many requests , The reason for this is heat key Stored hot information , You need to query when each user requests to come , Such as zero second kill activity information , And the amount of user requests per unit time is huge , So as to heat key The number of query requests is huge ;—— So solve the heat key One of the ideas is whether we can find a way to put this key To refine and split , Let different users request key It's different ;
Such as seckill activity scene , The activity strategies that different users hit according to crowd rules ID It could be different , Therefore, we can split the whole activity meta information into strategies , Put the activity information key elaboration ; When you ask to come here like this , According to the user population strategy , Only the activity information bound by the policy will be found key, The query requests of all users for activity information will be dispersed to different activity strategies key On , So as to avoid fixing key A single point of a large number of queries ; The random suffix above is a similar idea , Namely, fix key Split or backup ;
(4) Will the core / Non core business Redis The isolation
Redis Single point query has limited performance , When hot key Your query exceeds Redis The performance threshold of the node , It will cause the cache fragmentation service to be broken , At this time, all businesses on the current node redis Both reading and writing are not available ; To prevent hot spots key Cause problems , The core business will not be affected , The core should be done well in advance / Non core business Redis The isolation , At least hot key There is redis Clusters should be isolated from core businesses ;
5. Mature solutions in the industry
After introducing the above hot spots key The idea of , Let's see if there are mature automation solutions in the industry , In fact, the core of the plan is only two steps :1. The system continuously monitors hot spots key;2. Find hot spots key Send a notice and deal with it accordingly ; There was a praise article 《 There is a transparent multi-level cache solution (TMC)》, It also mentions hot spots key problem , We just take this to explain ;
Before introducing a scheme, let's see why we should design this scheme —— That is, what pain points he comes to solve ?
There are a lot of e-commerce businesses using like service , Businesses will Irregular Do some “ Commodity seckill ”、“ Commodity promotion ” Activities , Lead to “ Marketing activities ”、“ Goods details ”、“ Trade order ” And other link applications have access to cache hotspots :
- Activity time 、 Type of activity 、 Information such as event goods is unpredictable , Cause cache hotspot access Unpredictable ;
- During cache hotspot access , Access to a few hotspots in the application layer key Generate a large number of cache access requests : Impact on distributed cache system , Occupy a large number of Intranet bandwidth , Finally, it affects the stability of the application layer system ;
In order to deal with the above problems , We need a solution that can automatically discover hotspots and pre cache hotspot cache access requests locally in the application layer , This is it. TMC Cause of occurrence ; The following is the system architecture ;
Jedis-Client:Java The direct entry for interaction between application and cache server , Interface definition and native Jedis-Client It's no different ;
Hermes-SDK: Since the research “ Hot spot discovery + Local cache ” Functional SDK encapsulation ,Jedis-Client Integrate capabilities by interacting with it ;
Hermes Server cluster : receive Hermes-SDK Reported cache access data , Hot spot detection , Will be hot key Push to Hermes-SDK Do local caching ;
Cache cluster : It consists of agent layer and storage layer , Provide a unified distributed cache service entrance for application clients ;
Basic components :etcd colony 、Apollo Configuration center , by TMC Provide “ Cluster push ” and “ Unified configuration ” Ability ;
(1) Monitoring heat key
Monitoring the heat key aspect , What's good is —— Collect on the client side . stay 《 There is a transparent multi-level cache solution (TMC) Design thinking 》 There is a sentence in it that mentions
“TMC Primordial jedis Bag JedisPool and Jedis Class has been modified , stay JedisPool Integration during initialization TMC“ Hot spot discovery ”+“ Local cache ” function Hermes-SDK Initialization logic of the package .”
That is to say, he rewrites jedis Native jar package , Joined the Hermes-SDK package , The purpose is to do hotspot discovery and local caching ;
From the perspective of monitoring , This bag is for Jedis-Client Every time key Value access request ,Hermes-SDK Through its communication module key Access events are reported asynchronously to Hermes Server cluster , So that it can carry out according to the reported data “ Hot spot detection ”. The process of hotspot discovery is as follows :
(2) Inform the system to handle
In heat treatment key Programme , What I like is L2 cache ;
You're monitoring the heat key after ,Hermes The server cluster will notify the users in each business system through various means Hermes-SDK, Tell them :" bro , This key It's heat key, Remember to do local caching ." therefore Hermes-SDK It will key Cache locally , For the latter request ;Hermes-SDK Find out this is a heat key, Take... Directly from the local , Instead of accessing the cluster ; There are various ways of notification , This article just provides an idea ;
How to ensure cache consistency
Here, I would like to add how to ensure cache consistency when using L2 cache ;
Hermes-SDK The hotspot module of only caches hotspots key data , Most of them are not hot spots key The data is stored by the cache cluster ;
hotspot key Change results in value When the failure ,Hermes-SDK Synchronization failure local cache , Ensure local strong consistency ;
hotspot key Change results in value When the failure ,Hermes-SDK adopt etcd Cluster broadcast events , The local cache of other nodes in the asynchronous failure business application cluster , Ensure the final consistency of the cluster ;
Reference resources :
Talk about redis Heat of key How to solve the problem
There is a transparent multi-level cache solution (TMC) Design thinking
边栏推荐
猜你喜欢
随机推荐
LeetCode 27. 移除元素
华为MindSpore开源实习机试题
一起学习SQL中各种join以及它们的区别
CUDA用户对象
标签属性disabled selected checked等布尔类型赋值不生效?
State machine in BGP
Sudo right raising
【每日一题】—华为机试01
Contest3145 - the 37th game of 2021 freshman individual training match_ H: Eat fish
锐捷EBGP 配置案例
WLAN相关知识点总结
Top 10 classic MySQL errors
Three suggestions for all students who have graduated and will graduate
Step by step | help you easily submit Google play data security form
Generic classes and parameterized classes of SystemVerilog
Reading classic literature -- Suma++
When requesting resttemplate, set the request header, request parameters, and request body.
数据科学【八】:SVD(一)
BGP 路由优选规则和通告原则
Hydration failed because the initial UI does not match what was rendered on the server.问题原因之一