当前位置:网站首页>系统设计学习(二)Design a key-value cache to save the results of the most recent web server queries
系统设计学习(二)Design a key-value cache to save the results of the most recent web server queries
2022-07-06 09:19:00 【几何学家】
老规矩先看题
Use cases We’ll scope the problem to handle only the following use cases
User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability Constraints and assumptions
State assumptions Traffic is not evenly distributed
Popular queries should almost always be in the cache
Need to determine how to expire/refresh Serving from cache requires fast lookups
Low latency between machines Limited memory in cache
Need to determine what to keep/remove
Need to cache millions of queries 10 million users 10 billion queries per month
计算:
缓存存储的键的有序列表分别为 query, value: results
query - 50 bytes
title - 20 bytes
snippet - 200 bytes
Total: 270 bytes
也就是说单条缓存大概需要270bytes
如果10 billion的用户每条数据都是独一无二的话
可以计算得到一个月的存储消耗为2.7t
4000 requests per second = 01 billion requests per month
也就是说每秒大概四千次请求平均
Create a high level design
因为缓存的容量有限制,所以我们要设置一个过期策略,这里的话可以采用LRU机制
具体设计大概如下:
1.client发送一个请求到webserver,运行反向代理
2.webserver执行查询api
3.查询api会执行下述操作
1)解析语法:
删除标记
将文本分解为术语
修复拼写错误
规范化大写
将查询转换为使用布尔运算
2)检查是否命中缓存:
如果命中:
利用LRU机制更新缓存
返回缓存内容
如果没有命中:
使用Reverse Index Service查找匹配的文档,对查找出来的文档利用匹配对进行排序得出排名,其中排名最高的就是要查找的文档
使用Document Service 返回 titles and snippets
更新缓存的内容,更新LRU
关于LRU的实现其实是一个很好的设计,是用hash+链表的结构设计能够使得其插入和查找的时间复杂度都为常数级别,lc上有一道相关的算法题,忘了的可以看一下,这题Java写起来比较快,cpp要自己写一遍所有的基本操作hhh
https://leetcode.cn/problems/lru-cache/
查询api的实现:
class QueryApi(object):
def __init__(self, memory_cache, reverse_index_service):
self.memory_cache = memory_cache
self.reverse_index_service = reverse_index_service
def parse_query(self, query):
"""Remove markup, break text into terms, deal with typos, normalize capitalization, convert to use boolean operations. """
...
def process_query(self, query):
query = self.parse_query(query)
results = self.memory_cache.get(query)
if results is None:
results = self.reverse_index_service.process_search(query)
self.memory_cache.set(query, results)
return results
节点的实现:
class Node(object):
def __init__(self, query, results):
self.query = query
self.results = results
链表的实现:
class LinkedList(object):
def __init__(self):
self.head = None
self.tail = None
def move_to_front(self, node):
...
def append_to_front(self, node):
...
def remove_from_tail(self):
...
缓存的实现:
class Cache(object):
def __init__(self, MAX_SIZE):
self.MAX_SIZE = MAX_SIZE
self.size = 0
self.lookup = {
} # key: query, value: node
self.linked_list = LinkedList()
def get(self, query)
"""Get the stored query result from the cache. Accessing a node updates its position to the front of the LRU list. """
node = self.lookup[query]
if node is None:
return None
self.linked_list.move_to_front(node)
return node.results
def set(self, results, query):
"""Set the result for the given query key in the cache. When updating an entry, updates its position to the front of the LRU list. If the entry is new and the cache is at capacity, removes the oldest entry before the new entry is added. """
node = self.lookup[query]
if node is not None:
# Key exists in cache, update the value
node.results = results
self.linked_list.move_to_front(node)
else:
# Key does not exist in cache
if self.size == self.MAX_SIZE:
# Remove the oldest entry from the linked list and lookup
self.lookup.pop(self.linked_list.tail.query, None)
self.linked_list.remove_from_tail()
else:
self.size += 1
# Add the new key and value
new_node = Node(query, results)
self.linked_list.append_to_front(new_node)
self.lookup[query] = new_node
缓存更细的条件:
页面内容更改
删除页面或添加新页面
页面排名更改
解决这些最直接的办法是设置一个最大时间
非常感谢阅读到最后,也感谢您的支持。感兴趣的同学可以看一下我的这一篇帖子
系统设计学习(一)Design Pastebin.com (or Bit.ly)
边栏推荐
- [Chongqing Guangdong education] Shandong University College Physics reference materials
- MySQL 三万字精华总结 + 面试100 问,吊打面试官绰绰有余(收藏系列
- 平衡二叉树详解 通俗易懂
- Fairygui loop list
- Database table splitting strategy
- Pride-pppar source code analysis
- Record: I accidentally wrote a recursion next time
- 雇佣收银员【差分约束】
- XV Function definition and call
- What are the advantages of using SQL in Excel VBA
猜你喜欢
使用rtknavi进行RT-PPP测试
Mysql database reports an error: row size too large (> 8126) Changing some columns to TEXT or BLOB or using ROW_ FORMAT=DY
[algorithm] sword finger offer2 golang interview question 4: numbers that appear only once
编辑距离(多源BFS)
[algorithm] sword finger offer2 golang interview question 10: subarray with sum K
音乐播放(Toggle && PlayerPrefs)
Basic DOS commands
[algorithm] sword finger offer2 golang interview question 9: subarray with product less than k
一文搞定 UDP 和 TCP 高频面试题!
KF UD分解之UD分解基础篇【1】
随机推荐
几道高频的JVM面试题
编辑距离(多源BFS)
Compile GDAL source code with nmake (win10, vs2022)
最短Hamilton路径 (状压DP)
Mixed use of fairygui button dynamics
How to reduce the shutdown time of InnoDB database?
Mysql database reports an error: row size too large (> 8126) Changing some columns to TEXT or BLOB or using ROW_ FORMAT=DY
堆排序【手写小根堆】
[algorithm] sword finger offer2 golang interview question 13: sum of numbers of two-dimensional submatrix
Fgui project packaging and Publishing & importing unity & the way to display the UI
[algorithm] sword finger offer2 golang interview question 10: subarray with sum K
Role movement in the first person perspective
C code implementation of robust estimation in rtklib's pntpos function (standard single point positioning spp)
rtklib单点定位spp使用抗差估计遇到的问题及解决
[算法] 剑指offer2 golang 面试题1:整数除法
Comparative analysis of the execution efficiency of MySQL 5.7 statistical table records
Fairygui loop list
RTKLIB: demo5 b34f. 1 vs b33
Unity3d, Alibaba cloud server, platform configuration
【GNSS】抗差估计(稳健估计)原理及程序实现