当前位置:网站首页>系统设计学习(三)Design Amazon‘s sales rank by category feature
系统设计学习(三)Design Amazon‘s sales rank by category feature
2022-07-06 09:19:00 【几何学家】
老规矩,先看题
Use cases
We’ll scope the problem to handle only the following use case
Service calculates the past week’s most popular products by category
User views the past week’s most popular products by category
Service has high availability
Out of scope
The general e-commerce site:Design components only for calculating sales rank Constraints and assumptions
State assumptions Traffic is not evenly distributed
Items can be in multiple categories
Items cannot change categories
There are no subcategories ie foo/bar/baz Results must be updated hourly
More popular products might need to be updated more frequently
10 million products
1000 categories
1 billion transactions per month
100 billion read requests per month
100:1 read to write ratio
那么接下来的第一步是进行计算,理清楚,如果要进行粗略的使用情况的计算的话
首先看下每笔交易要用到多大的存储空间
creared_at 5bytes
product_at 8bytes
category_id 4bytes
seller_id 8bytes
buyer_id 8bytes
quantity 4bytes
total_price 5bytes
total 40bytes
因为他说一个月1billion次交易,所以每个月要花费40gb去进行存储这些交易国恒
如果三年的话需要1.44tb进行存储
那么完成粗略的计算之后我们要简单的设计一下这个系统
完成了系统轮廓的设计之后我们要进行细节的填充了
首先,既然服务要统计过去一周最受欢迎的商品的话,那他就一定要统计所有的订单的信息,我们可以使用sale api去存储处理好的日志,在Object Store而不是管理我们自己的文件系统
我们假设看到的日志如下
timestamp product_id category_id qty total_price seller_id buyer_id
t1 product1 category1 2 20.00 1 1
t2 product1 category2 2 20.00 2 2
t2 product1 category2 1 10.00 2 3
t3 product2 category1 3 7.00 3 4
t4 product3 category2 7 2.00 4 5
t5 product4 category1 1 5.00 5 6
...
Sales Rank service可以使用sale api打印的日志作为输入数据,然后内部使用mapreduce进行一个分布式日志的统计,最后得出sales_rank等信息存储在一个SQL数据库中
我们将使用一个多部的mapreduce
1.转换数据为 (category, product_id), sum(quantity)
2.进行一个分布式的排序
class SalesRanker(MRJob):
def within_past_week(self, timestamp):
"""Return True if timestamp is within past week, False otherwise."""
...
def mapper(self, _ line):
"""Parse each log line, extract and transform relevant lines. Emit key value pairs of the form: (category1, product1), 2 (category2, product1), 2 (category2, product1), 1 (category1, product2), 3 (category2, product3), 7 (category1, product4), 1 """
timestamp, product_id, category_id, quantity, total_price, seller_id, \
buyer_id = line.split('\t')
if self.within_past_week(timestamp):
yield (category_id, product_id), quantity
def reducer(self, key, value):
"""Sum values for each key. (category1, product1), 2 (category2, product1), 3 (category1, product2), 3 (category2, product3), 7 (category1, product4), 1 """
yield key, sum(values)
def mapper_sort(self, key, value):
"""Construct key to ensure proper sorting. Transform key and value to the form: (category1, 2), product1 (category2, 3), product1 (category1, 3), product2 (category2, 7), product3 (category1, 1), product4 The shuffle/sort step of MapReduce will then do a distributed sort on the keys, resulting in: (category1, 1), product4 (category1, 2), product1 (category1, 3), product2 (category2, 3), product1 (category2, 7), product3 """
category_id, product_id = key
quantity = value
yield (category_id, quantity), product_id
def reducer_identity(self, key, value):
yield key, value
def steps(self):
"""Run the map and reduce steps."""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
self.mr(mapper=self.mapper_sort,
reducer=self.reducer_identity),
]
结果是下面的排好序的列表,我们可以将其存入sales_rank表
(category1, 1), product4
(category1, 2), product1
(category1, 3), product2
(category2, 3), product1
(category2, 7), product3
sales_rank表的解构大概如下
id int NOT NULL AUTO_INCREMENT
category_id int NOT NULL
total_sold int NOT NULL
product_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
我们将在 id 、 category_id 和 product_id 上创建索引以加快查找速度(记录时间而不是扫描整个表)并将数据保存在内存中。
这样的话,当用户想要访问过去一周最受欢迎的产品(每一个类别里面)的时候
client发送请求到webserver,执行反向代理,
webserver直接请求read api,
read api读数据从sql数据库中
我们使用一个公共的rest api
$ curl https://amazon.com/api/v1/popular?category_id=1234
返回:
{
"id": "100",
"category_id": "1234",
"total_sold": "100000",
"product_id": "50",
},
{
"id": "53",
"category_id": "1234",
"total_sold": "90000",
"product_id": "200",
},
{
"id": "75",
"category_id": "1234",
"total_sold": "80000",
"product_id": "3",
},
接下来的话就是对现有的结构进行一个扩展
边栏推荐
- FGUI工程打包发布&导入Unity&将UI显示出来的方式
- 2022国赛Re1 baby_tree
- Database table splitting strategy
- FairyGUI循环列表
- Music playback (toggle & playerprefs)
- [algorithm] sword finger offer2 golang interview question 7: 3 numbers with 0 in the array
- 记录:下一不小心写了个递归
- On March 15, the official version of go 1.18 was released to learn about the latest features and usage
- [算法] 剑指offer2 golang 面试题8:和大于或等于k的最短子数组
- [算法] 剑指offer2 golang 面试题1:整数除法
猜你喜欢
编辑距离(多源BFS)
[算法] 劍指offer2 golang 面試題2:二進制加法
[算法] 剑指offer2 golang 面试题6:排序数组中的两个数字之和
Fairygui bar subfamily (scroll bar, slider, progress bar)
Heap sort [handwritten small root heap]
FGUI工程打包发布&导入Unity&将UI显示出来的方式
Mixed use of fairygui button dynamics
MySQL shutdown is slow
Fairygui loop list
【干货】提升RTK模糊度固定率的建议之周跳探测
随机推荐
音乐播放(Toggle && PlayerPrefs)
Fgui project packaging and Publishing & importing unity & the way to display the UI
Particle system for introduction to unity3d Foundation (attribute introduction + case production of flame particle system)
C code implementation of robust estimation in rtklib's pntpos function (standard single point positioning spp)
Containers and Devops: container based Devops delivery pipeline
The port is occupied because the service is not shut down normally
Music playback (toggle & playerprefs)
记录:下一不小心写了个递归
FairyGUI摇杆
Mysql database index
2022 National Games RE1 baby_ tree
[algorithm] sword finger offer2 golang interview question 13: sum of numbers of two-dimensional submatrix
[算法] 剑指offer2 golang 面试题5:单词长度的最大乘积
堆排序【手写小根堆】
[algorithm] sword finger offer2 golang interview question 12: the sum of the left and right sub arrays is equal
[算法] 剑指offer2 golang 面试题10:和为k的子数组
(core focus of software engineering review) Chapter V detailed design exercises
[untitled]
341. Flatten nested list iterator
[algorithm] sword finger offer2 golang interview question 4: numbers that appear only once