当前位置:网站首页>系统设计学习(二)Design a key-value cache to save the results of the most recent web server queries
系统设计学习(二)Design a key-value cache to save the results of the most recent web server queries
2022-07-06 09:19:00 【几何学家】
老规矩先看题
Use cases We’ll scope the problem to handle only the following use cases
User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability Constraints and assumptions
State assumptions Traffic is not evenly distributed
Popular queries should almost always be in the cache
Need to determine how to expire/refresh Serving from cache requires fast lookups
Low latency between machines Limited memory in cache
Need to determine what to keep/remove
Need to cache millions of queries 10 million users 10 billion queries per month
计算:
缓存存储的键的有序列表分别为 query, value: results
query - 50 bytes
title - 20 bytes
snippet - 200 bytes
Total: 270 bytes
也就是说单条缓存大概需要270bytes
如果10 billion的用户每条数据都是独一无二的话
可以计算得到一个月的存储消耗为2.7t
4000 requests per second = 01 billion requests per month
也就是说每秒大概四千次请求平均
Create a high level design
因为缓存的容量有限制,所以我们要设置一个过期策略,这里的话可以采用LRU机制
具体设计大概如下:
1.client发送一个请求到webserver,运行反向代理
2.webserver执行查询api
3.查询api会执行下述操作
1)解析语法:
删除标记
将文本分解为术语
修复拼写错误
规范化大写
将查询转换为使用布尔运算
2)检查是否命中缓存:
如果命中:
利用LRU机制更新缓存
返回缓存内容
如果没有命中:
使用Reverse Index Service查找匹配的文档,对查找出来的文档利用匹配对进行排序得出排名,其中排名最高的就是要查找的文档
使用Document Service 返回 titles and snippets
更新缓存的内容,更新LRU
关于LRU的实现其实是一个很好的设计,是用hash+链表的结构设计能够使得其插入和查找的时间复杂度都为常数级别,lc上有一道相关的算法题,忘了的可以看一下,这题Java写起来比较快,cpp要自己写一遍所有的基本操作hhh
https://leetcode.cn/problems/lru-cache/
查询api的实现:
class QueryApi(object):
def __init__(self, memory_cache, reverse_index_service):
self.memory_cache = memory_cache
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
"""Remove markup, break text into terms, deal with typos, normalize capitalization, convert to use boolean operations. """
...
def process_query(self, query):
query = self.parse_query(query)
results = self.memory_cache.get(query)
if results is None:
results = self.reverse_index_service.process_search(query)
self.memory_cache.set(query, results)
return results
节点的实现:
class Node(object):
def __init__(self, query, results):
self.query = query
self.results = results
链表的实现:
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
...
def append_to_front(self, node):
...
def remove_from_tail(self):
...
缓存的实现:
class Cache(object):
def __init__(self, MAX_SIZE):
self.MAX_SIZE = MAX_SIZE
self.size = 0
self.lookup = {
} # key: query, value: node
self.linked_list = LinkedList()
def get(self, query)
"""Get the stored query result from the cache. Accessing a node updates its position to the front of the LRU list. """
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
def set(self, results, query):
"""Set the result for the given query key in the cache. When updating an entry, updates its position to the front of the LRU list. If the entry is new and the cache is at capacity, removes the oldest entry before the new entry is added. """
node = self.lookup[query]
if node is not None:
# Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
# Key does not exist in cache
if self.size == self.MAX_SIZE:
# Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
缓存更细的条件:
页面内容更改
删除页面或添加新页面
页面排名更改
解决这些最直接的办法是设置一个最大时间
非常感谢阅读到最后,也感谢您的支持。感兴趣的同学可以看一下我的这一篇帖子
系统设计学习(一)Design Pastebin.com (or Bit.ly)
边栏推荐
- VLSM variable length subnet mask partition tips
- GNSS positioning accuracy index calculation
- Fabrication d'un sac à dos simple fairygui
- Basic DOS commands
- [算法] 剑指offer2 golang 面试题13:二维子矩阵的数字之和
- [Chongqing Guangdong education] Shandong University College Physics reference materials
- Fairygui bar subfamily (scroll bar, slider, progress bar)
- 【干货】提升RTK模糊度固定率的建议之周跳探测
- 【GNSS】抗差估计(稳健估计)原理及程序实现
- 地球围绕太阳转
猜你喜欢
Fairygui bar subfamily (scroll bar, slider, progress bar)
Edit distance (multi-source BFS)
FairyGUI簡單背包的制作
面试必备:聊聊分布式锁的多种实现!
【干货】提升RTK模糊度固定率的建议之周跳探测
[算法] 劍指offer2 golang 面試題2:二進制加法
FairyGUI按钮动效的混用
[算法] 剑指offer2 golang 面试题12:左右两边子数组的和相等
XV Function definition and call
Role movement in the first person perspective
随机推荐
记录:动态Web项目servlet访问数据库404错误之解决
FGUI工程打包发布&导入Unity&将UI显示出来的方式
[算法] 剑指offer2 golang 面试题13:二维子矩阵的数字之和
Error: sorting and subscript out of bounds
Basic DOS commands
Fabrication d'un sac à dos simple fairygui
[algorithm] sword finger offer2 golang interview question 9: subarray with product less than k
341. Flatten nested list iterator
Record: newinstance() obsolete replacement method
All in one 1405: sum and product of prime numbers
FairyGUI簡單背包的制作
[algorithm] sword finger offer2 golang interview question 13: sum of numbers of two-dimensional submatrix
【干货】提升RTK模糊度固定率的建议之周跳探测
【RTKLIB 2.4.3 b34 】版本更新简介一
面试必备:聊聊分布式锁的多种实现!
[algorithm] sword finger offer2 golang interview question 3: the number of 1 in the binary form of the first n numbers
The earth revolves around the sun
音乐播放(Toggle && PlayerPrefs)
What are the advantages of using SQL in Excel VBA
Problems and solutions of robust estimation in rtklib single point location spp