当前位置:网站首页>Design a key value cache to save the results of the most recent Web server queries
Design a key value cache to save the results of the most recent Web server queries
2022-07-06 13:08:00 【Geometer】
The old rule is to look at the question first
Use cases We’ll scope the problem to handle only the following use cases
User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability Constraints and assumptions
State assumptions Traffic is not evenly distributed
Popular queries should almost always be in the cache
Need to determine how to expire/refresh Serving from cache requires fast lookups
Low latency between machines Limited memory in cache
Need to determine what to keep/remove
Need to cache millions of queries 10 million users 10 billion queries per month
Calculation :
The ordered lists of cached keys are query, value: results
query - 50 bytes
title - 20 bytes
snippet - 200 bytes
Total: 270 bytes
That is to say, a single cache probably needs 270bytes
If 10 billion Every data of our users is unique
It can be calculated that the storage consumption of a month is 2.7t
4000 requests per second = 01 billion requests per month
That is, about 4000 requests per second, on average
Create a high level design
Because the capacity of cache is limited , So we need to set an expiration policy , Here we can use LRU Mechanism
The specific design is as follows :
1.client Send a request to webserver, Run the reverse proxy
2.webserver Execute the query api
3. Inquire about api Will perform the following operations
1) Analytic grammar :
Delete tag
Break the text down into terms
Fix spelling mistakes
Normalized capitalization
Convert queries to use Boolean operations
2) Check whether the cache is hit :
If hit :
utilize LRU Mechanism update cache
Return cached content
If you don't hit :
Use Reverse Index Service Find matching documents , Rank the documents found by using matching pairs , The highest ranking document is the document to be found
Use Document Service return titles and snippets
Update the contents of the cache , to update LRU
About LRU The implementation of is actually a good design , Yes, it is hash+ The structural design of the linked list can make the time complexity of its insertion and search constant ,lc There is a related algorithm problem , If you forget, you can have a look , This question Java It's faster to write ,cpp Write all the basic operations by yourself hhh
https://leetcode.cn/problems/lru-cache/
Inquire about api The implementation of the :
class QueryApi(object):
def __init__(self, memory_cache, reverse_index_service):
self.memory_cache = memory_cache
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
"""Remove markup, break text into terms, deal with typos, normalize capitalization, convert to use boolean operations. """
...
def process_query(self, query):
query = self.parse_query(query)
results = self.memory_cache.get(query)
if results is None:
results = self.reverse_index_service.process_search(query)
self.memory_cache.set(query, results)
return results
Implementation of nodes :
class Node(object):
def __init__(self, query, results):
self.query = query
self.results = results
The realization of linked list :
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
...
def append_to_front(self, node):
...
def remove_from_tail(self):
...
Cache implementation :
class Cache(object):
def __init__(self, MAX_SIZE):
self.MAX_SIZE = MAX_SIZE
self.size = 0
self.lookup = {
} # key: query, value: node
self.linked_list = LinkedList()
def get(self, query)
"""Get the stored query result from the cache. Accessing a node updates its position to the front of the LRU list. """
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
def set(self, results, query):
"""Set the result for the given query key in the cache. When updating an entry, updates its position to the front of the LRU list. If the entry is new and the cache is at capacity, removes the oldest entry before the new entry is added. """
node = self.lookup[query]
if node is not None:
# Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
# Key does not exist in cache
if self.size == self.MAX_SIZE:
# Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
Cache finer conditions :
Page content changes
Delete a page or add a new page
Page ranking changes
The most direct way to solve these problems is to set a maximum time
Thank you very much for reading to the end , Thank you for your support . Interested students can see this post of mine
System design study ( One )Design Pastebin.com (or Bit.ly)
边栏推荐
- 继承和多态(下)
- 记录:动态Web项目servlet访问数据库404错误之解决
- Basic DOS commands
- [算法] 劍指offer2 golang 面試題2:二進制加法
- [algorithm] sword finger offer2 golang interview question 13: sum of numbers of two-dimensional submatrix
- [algorithm] sword finger offer2 golang interview question 7: 3 numbers with 0 in the array
- XV Function definition and call
- TYUT太原理工大学2022软工导论考试题型大纲
- First acquaintance with C language (Part 2)
- [算法] 剑指offer2 golang 面试题8:和大于或等于k的最短子数组
猜你喜欢

Novatel board oem617d configuration step record

十分钟彻底掌握缓存击穿、缓存穿透、缓存雪崩
![[Chongqing Guangdong education] Shandong University College Physics reference materials](/img/56/4ac44729c3e480a4f779d6a890363a.jpg)
[Chongqing Guangdong education] Shandong University College Physics reference materials

How to ensure data consistency between MySQL and redis?
![[算法] 剑指offer2 golang 面试题2:二进制加法](/img/c2/6f6c3bd4d70252ba73addad6a3a9c1.png)
[算法] 剑指offer2 golang 面试题2:二进制加法
![[dry goods] cycle slip detection of suggestions to improve the fixed rate of RTK ambiguity](/img/9d/7284c1399964d3fb48886f12e4941c.jpg)
[dry goods] cycle slip detection of suggestions to improve the fixed rate of RTK ambiguity
![[algorithm] sword finger offer2 golang interview question 9: subarray with product less than k](/img/65/fc3fb5a217a3b44f506b695af53e2c.png)
[algorithm] sword finger offer2 golang interview question 9: subarray with product less than k

基本Dos命令
![[algorithm] sword finger offer2 golang interview question 12: the sum of the left and right sub arrays is equal](/img/11/ee0628a68542236fc641966579a31a.png)
[algorithm] sword finger offer2 golang interview question 12: the sum of the left and right sub arrays is equal

编辑距离(多源BFS)
随机推荐
Compile GDAL source code with nmake (win10, vs2022)
Shortest Hamilton path (pressure DP)
Comparative analysis of the execution efficiency of MySQL 5.7 statistical table records
MySQL shutdown is slow
[算法] 剑指offer2 golang 面试题10:和为k的子数组
TYUT太原理工大学2022数据库考试题型大纲
[algorithm] sword finger offer2 golang interview question 2: binary addition
[algorithm] sword finger offer2 golang interview question 7: 3 numbers with 0 in the array
[dry goods] cycle slip detection of suggestions to improve the fixed rate of RTK ambiguity
How to ensure data consistency between MySQL and redis?
最短Hamilton路径 (状压DP)
[Chongqing Guangdong education] Shandong University College Physics reference materials
GPS高程拟合抗差中误差的求取代码实现
How to reduce the shutdown time of InnoDB database?
异常:IOException:Stream Closed
分支语句和循环语句
String类
165. Compare version number - string
10 minutes pour maîtriser complètement la rupture du cache, la pénétration du cache, l'avalanche du cache
一文搞定 UDP 和 TCP 高频面试题!