当前位置:网站首页>系统设计学习(二)Design a key-value cache to save the results of the most recent web server queries
系统设计学习(二)Design a key-value cache to save the results of the most recent web server queries
2022-07-06 09:19:00 【几何学家】
老规矩先看题
Use cases We’ll scope the problem to handle only the following use cases
User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability Constraints and assumptions
State assumptions Traffic is not evenly distributed
Popular queries should almost always be in the cache
Need to determine how to expire/refresh Serving from cache requires fast lookups
Low latency between machines Limited memory in cache
Need to determine what to keep/remove
Need to cache millions of queries 10 million users 10 billion queries per month
计算:
缓存存储的键的有序列表分别为 query, value: results
query - 50 bytes
title - 20 bytes
snippet - 200 bytes
Total: 270 bytes
也就是说单条缓存大概需要270bytes
如果10 billion的用户每条数据都是独一无二的话
可以计算得到一个月的存储消耗为2.7t
4000 requests per second = 01 billion requests per month
也就是说每秒大概四千次请求平均
Create a high level design
因为缓存的容量有限制,所以我们要设置一个过期策略,这里的话可以采用LRU机制
具体设计大概如下:
1.client发送一个请求到webserver,运行反向代理
2.webserver执行查询api
3.查询api会执行下述操作
1)解析语法:
删除标记
将文本分解为术语
修复拼写错误
规范化大写
将查询转换为使用布尔运算
2)检查是否命中缓存:
如果命中:
利用LRU机制更新缓存
返回缓存内容
如果没有命中:
使用Reverse Index Service查找匹配的文档,对查找出来的文档利用匹配对进行排序得出排名,其中排名最高的就是要查找的文档
使用Document Service 返回 titles and snippets
更新缓存的内容,更新LRU
关于LRU的实现其实是一个很好的设计,是用hash+链表的结构设计能够使得其插入和查找的时间复杂度都为常数级别,lc上有一道相关的算法题,忘了的可以看一下,这题Java写起来比较快,cpp要自己写一遍所有的基本操作hhh
https://leetcode.cn/problems/lru-cache/
查询api的实现:
class QueryApi(object):
def __init__(self, memory_cache, reverse_index_service):
self.memory_cache = memory_cache
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
"""Remove markup, break text into terms, deal with typos, normalize capitalization, convert to use boolean operations. """
...
def process_query(self, query):
query = self.parse_query(query)
results = self.memory_cache.get(query)
if results is None:
results = self.reverse_index_service.process_search(query)
self.memory_cache.set(query, results)
return results
节点的实现:
class Node(object):
def __init__(self, query, results):
self.query = query
self.results = results
链表的实现:
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
...
def append_to_front(self, node):
...
def remove_from_tail(self):
...
缓存的实现:
class Cache(object):
def __init__(self, MAX_SIZE):
self.MAX_SIZE = MAX_SIZE
self.size = 0
self.lookup = {
} # key: query, value: node
self.linked_list = LinkedList()
def get(self, query)
"""Get the stored query result from the cache. Accessing a node updates its position to the front of the LRU list. """
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
def set(self, results, query):
"""Set the result for the given query key in the cache. When updating an entry, updates its position to the front of the LRU list. If the entry is new and the cache is at capacity, removes the oldest entry before the new entry is added. """
node = self.lookup[query]
if node is not None:
# Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
# Key does not exist in cache
if self.size == self.MAX_SIZE:
# Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
缓存更细的条件:
页面内容更改
删除页面或添加新页面
页面排名更改
解决这些最直接的办法是设置一个最大时间
非常感谢阅读到最后,也感谢您的支持。感兴趣的同学可以看一下我的这一篇帖子
系统设计学习(一)Design Pastebin.com (or Bit.ly)
边栏推荐
- Combination of fairygui check box and progress bar
- [算法] 剑指offer2 golang 面试题5:单词长度的最大乘积
- Affichage du changement de valeur du Buff de gain de l'interface graphique de défaillance
- Acwing-116 pilot brother
- 【rtklib】在rtk下使用抗差自适应卡尔曼滤波初步实践
- Halcon knowledge: gray_ Tophat transform and bottom cap transform
- GPS高程拟合抗差中误差的求取代码实现
- [算法] 剑指offer2 golang 面试题13:二维子矩阵的数字之和
- Record: I accidentally wrote a recursion next time
- 错误: 找不到符号
猜你喜欢
FairyGUI按钮动效的混用
Novatel board oem617d configuration step record
Heap sort [handwritten small root heap]
Mixed use of fairygui button dynamics
Basic DOS commands
MySQL 三万字精华总结 + 面试100 问,吊打面试官绰绰有余(收藏系列
[algorithm] sword finger offer2 golang interview question 12: the sum of the left and right sub arrays is equal
Liste des boucles de l'interface graphique de défaillance
堆排序【手写小根堆】
[algorithm] sword finger offer2 golang interview question 5: maximum product of word length
随机推荐
Fairygui gain buff value change display
FairyGUI复选框与进度条的组合使用
抗差估计在rtklib的pntpos函数(标准单点定位spp)中的c代码实现
几道高频的JVM面试题
Employment of cashier [differential constraint]
Particle system for introduction to unity3d Foundation (attribute introduction + case production of flame particle system)
MySQL 三万字精华总结 + 面试100 问,吊打面试官绰绰有余(收藏系列
[算法] 剑指offer2 golang 面试题7:数组中和为0的3个数字
RTKLIB: demo5 b34f.1 vs b33
Chromatic judgement bipartite graph
FairyGUI增益BUFF數值改變的顯示
[algorithm] sword finger offer2 golang interview question 3: the number of 1 in the binary form of the first n numbers
[算法] 劍指offer2 golang 面試題2:二進制加法
[algorithm] sword finger offer2 golang interview question 8: the shortest subarray with a sum greater than or equal to K
isEmpty 和 isBlank 的用法区别
[Chongqing Guangdong education] reference materials for regional analysis and planning of Pingdingshan University
Shortest Hamilton path (pressure DP)
All in one 1405: sum and product of prime numbers
Record: solution of 404 error of servlet accessing database in dynamic web project
[rtklib 2.4.3 B34] version update introduction I