当前位置:网站首页>系统设计学习(三)Design Amazon‘s sales rank by category feature
系统设计学习(三)Design Amazon‘s sales rank by category feature
2022-07-06 09:19:00 【几何学家】
老规矩,先看题
Use cases
We’ll scope the problem to handle only the following use case
Service calculates the past week’s most popular products by category
User views the past week’s most popular products by category
Service has high availability
Out of scope
The general e-commerce site:Design components only for calculating sales rank Constraints and assumptions
State assumptions Traffic is not evenly distributed
Items can be in multiple categories
Items cannot change categories
There are no subcategories ie foo/bar/baz Results must be updated hourly
More popular products might need to be updated more frequently
10 million products
1000 categories
1 billion transactions per month
100 billion read requests per month
100:1 read to write ratio
那么接下来的第一步是进行计算,理清楚,如果要进行粗略的使用情况的计算的话
首先看下每笔交易要用到多大的存储空间
creared_at 5bytes
product_at 8bytes
category_id 4bytes
seller_id 8bytes
buyer_id 8bytes
quantity 4bytes
total_price 5bytes
total 40bytes
因为他说一个月1billion次交易,所以每个月要花费40gb去进行存储这些交易国恒
如果三年的话需要1.44tb进行存储
那么完成粗略的计算之后我们要简单的设计一下这个系统
完成了系统轮廓的设计之后我们要进行细节的填充了
首先,既然服务要统计过去一周最受欢迎的商品的话,那他就一定要统计所有的订单的信息,我们可以使用sale api去存储处理好的日志,在Object Store而不是管理我们自己的文件系统
我们假设看到的日志如下
timestamp product_id category_id qty total_price seller_id buyer_id
t1 product1 category1 2 20.00 1 1
t2 product1 category2 2 20.00 2 2
t2 product1 category2 1 10.00 2 3
t3 product2 category1 3 7.00 3 4
t4 product3 category2 7 2.00 4 5
t5 product4 category1 1 5.00 5 6
...
Sales Rank service可以使用sale api打印的日志作为输入数据,然后内部使用mapreduce进行一个分布式日志的统计,最后得出sales_rank等信息存储在一个SQL数据库中
我们将使用一个多部的mapreduce
1.转换数据为 (category, product_id), sum(quantity)
2.进行一个分布式的排序
class SalesRanker(MRJob):
def within_past_week(self, timestamp):
"""Return True if timestamp is within past week, False otherwise."""
...
def mapper(self, _ line):
"""Parse each log line, extract and transform relevant lines. Emit key value pairs of the form: (category1, product1), 2 (category2, product1), 2 (category2, product1), 1 (category1, product2), 3 (category2, product3), 7 (category1, product4), 1 """
timestamp, product_id, category_id, quantity, total_price, seller_id, \
buyer_id = line.split('\t')
if self.within_past_week(timestamp):
yield (category_id, product_id), quantity
def reducer(self, key, value):
"""Sum values for each key. (category1, product1), 2 (category2, product1), 3 (category1, product2), 3 (category2, product3), 7 (category1, product4), 1 """
yield key, sum(values)
def mapper_sort(self, key, value):
"""Construct key to ensure proper sorting. Transform key and value to the form: (category1, 2), product1 (category2, 3), product1 (category1, 3), product2 (category2, 7), product3 (category1, 1), product4 The shuffle/sort step of MapReduce will then do a distributed sort on the keys, resulting in: (category1, 1), product4 (category1, 2), product1 (category1, 3), product2 (category2, 3), product1 (category2, 7), product3 """
category_id, product_id = key
quantity = value
yield (category_id, quantity), product_id
def reducer_identity(self, key, value):
yield key, value
def steps(self):
"""Run the map and reduce steps."""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
self.mr(mapper=self.mapper_sort,
reducer=self.reducer_identity),
]
结果是下面的排好序的列表,我们可以将其存入sales_rank表
(category1, 1), product4
(category1, 2), product1
(category1, 3), product2
(category2, 3), product1
(category2, 7), product3
sales_rank表的解构大概如下
id int NOT NULL AUTO_INCREMENT
category_id int NOT NULL
total_sold int NOT NULL
product_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
我们将在 id 、 category_id 和 product_id 上创建索引以加快查找速度(记录时间而不是扫描整个表)并将数据保存在内存中。
这样的话,当用户想要访问过去一周最受欢迎的产品(每一个类别里面)的时候
client发送请求到webserver,执行反向代理,
webserver直接请求read api,
read api读数据从sql数据库中
我们使用一个公共的rest api
$ curl https://amazon.com/api/v1/popular?category_id=1234
返回:
{
"id": "100",
"category_id": "1234",
"total_sold": "100000",
"product_id": "50",
},
{
"id": "53",
"category_id": "1234",
"total_sold": "90000",
"product_id": "200",
},
{
"id": "75",
"category_id": "1234",
"total_sold": "80000",
"product_id": "3",
},
接下来的话就是对现有的结构进行一个扩展
边栏推荐
- Music playback (toggle & playerprefs)
- Novatel board oem617d configuration step record
- isEmpty 和 isBlank 的用法区别
- C code implementation of robust estimation in rtklib's pntpos function (standard single point positioning spp)
- Prove the time complexity of heap sorting
- FairyGUI增益BUFF数值改变的显示
- [untitled]
- Basic DOS commands
- 面试必备:聊聊分布式锁的多种实现!
- Unity3d, Alibaba cloud server, platform configuration
猜你喜欢
Fairygui character status Popup
The master of double non planning left the real estate company and became a programmer with an annual salary of 25W. There are too many life choices at the age of 25
Dark chain lock (lca+ difference on tree)
[算法] 劍指offer2 golang 面試題2:二進制加法
【GNSS数据处理】赫尔默特(helmert)方差分量估计解析及代码实现
Unity3d, Alibaba cloud server, platform configuration
XV Function definition and call
C code implementation of robust estimation in rtklib's pntpos function (standard single point positioning spp)
[dry goods] cycle slip detection of suggestions to improve the fixed rate of RTK ambiguity
Chromatic judgement bipartite graph
随机推荐
RTKLIB: demo5 b34f. 1 vs b33
国企秋招经验总结
[GNSS] robust estimation (robust estimation) principle and program implementation
[algorithm] sword finger offer2 golang interview question 12: the sum of the left and right sub arrays is equal
Fairygui bar subfamily (scroll bar, slider, progress bar)
记录:下一不小心写了个递归
[算法] 剑指offer2 golang 面试题6:排序数组中的两个数字之和
[算法] 剑指offer2 golang 面试题7:数组中和为0的3个数字
FairyGUI循環列錶
记录:newInstance()过时的代替方法
C code implementation of robust estimation in rtklib's pntpos function (standard single point positioning spp)
Implementation of Excel import and export functions
GPS高程拟合抗差中误差的求取代码实现
GNSS positioning accuracy index calculation
[算法] 剑指offer2 golang 面试题8:和大于或等于k的最短子数组
[算法] 剑指offer2 golang 面试题12:左右两边子数组的和相等
Role movement in the first person perspective
Fabrication d'un sac à dos simple fairygui
Rt-ppp test using rtknavi
FGUI工程打包发布&导入Unity&将UI显示出来的方式