当前位置:网站首页>System design learning (III) design Amazon's sales rank by category feature
System design learning (III) design Amazon's sales rank by category feature
2022-07-06 13:09:00 【Geometer】
Old rules , First look at the question
Use cases
We’ll scope the problem to handle only the following use case
Service calculates the past week’s most popular products by category
User views the past week’s most popular products by category
Service has high availability
Out of scope
The general e-commerce site:Design components only for calculating sales rank Constraints and assumptions
State assumptions Traffic is not evenly distributed
Items can be in multiple categories
Items cannot change categories
There are no subcategories ie foo/bar/baz Results must be updated hourly
More popular products might need to be updated more frequently
10 million products
1000 categories
1 billion transactions per month
100 billion read requests per month
100:1 read to write ratio
Then the first step is to calculate , Clear understanding , If you want to calculate the rough usage
First, let's see how much storage space each transaction needs
creared_at 5bytes
product_at 8bytes
category_id 4bytes
seller_id 8bytes
buyer_id 8bytes
quantity 4bytes
total_price 5bytes
total 40bytes
Because he said one month 1billion Transactions , So every month it costs 40gb To store these transactions Guoheng
If it takes three years 1.44tb For storage
Then after completing the rough calculation, we should simply design this system 
After completing the design of the system outline, we need to fill in the details
First , Since the service needs to count the most popular goods in the past week , Then he must count the information of all orders , We can use sale api Store the processed logs , stay Object Store Instead of managing our own file system
We assume that the log we see is as follows
timestamp product_id category_id qty total_price seller_id buyer_id
t1 product1 category1 2 20.00 1 1
t2 product1 category2 2 20.00 2 2
t2 product1 category2 1 10.00 2 3
t3 product2 category1 3 7.00 3 4
t4 product3 category2 7 2.00 4 5
t5 product4 category1 1 5.00 5 6
...
Sales Rank service have access to sale api Print the log as input data , Then use it internally mapreduce Conduct a distributed log statistics , At last, it comes to sales_rank Such information is stored in a SQL In the database
We will use a multipart mapreduce
1. Convert data to (category, product_id), sum(quantity)
2. Do a distributed sort
class SalesRanker(MRJob):
def within_past_week(self, timestamp):
"""Return True if timestamp is within past week, False otherwise."""
...
def mapper(self, _ line):
"""Parse each log line, extract and transform relevant lines. Emit key value pairs of the form: (category1, product1), 2 (category2, product1), 2 (category2, product1), 1 (category1, product2), 3 (category2, product3), 7 (category1, product4), 1 """
timestamp, product_id, category_id, quantity, total_price, seller_id, \
buyer_id = line.split('\t')
if self.within_past_week(timestamp):
yield (category_id, product_id), quantity
def reducer(self, key, value):
"""Sum values for each key. (category1, product1), 2 (category2, product1), 3 (category1, product2), 3 (category2, product3), 7 (category1, product4), 1 """
yield key, sum(values)
def mapper_sort(self, key, value):
"""Construct key to ensure proper sorting. Transform key and value to the form: (category1, 2), product1 (category2, 3), product1 (category1, 3), product2 (category2, 7), product3 (category1, 1), product4 The shuffle/sort step of MapReduce will then do a distributed sort on the keys, resulting in: (category1, 1), product4 (category1, 2), product1 (category1, 3), product2 (category2, 3), product1 (category2, 7), product3 """
category_id, product_id = key
quantity = value
yield (category_id, quantity), product_id
def reducer_identity(self, key, value):
yield key, value
def steps(self):
"""Run the map and reduce steps."""
return [
self.mr(mapper=self.mapper,
reducer=self.reducer),
self.mr(mapper=self.mapper_sort,
reducer=self.reducer_identity),
]
The result is an ordered list below , We can deposit it in sales_rank surface
(category1, 1), product4
(category1, 2), product1
(category1, 3), product2
(category2, 3), product1
(category2, 7), product3
sales_rank The deconstruction of the table is as follows
id int NOT NULL AUTO_INCREMENT
category_id int NOT NULL
total_sold int NOT NULL
product_id int NOT NULL
PRIMARY KEY(id)
FOREIGN KEY(category_id) REFERENCES Categories(id)
FOREIGN KEY(product_id) REFERENCES Products(id)
We will be in id 、 category_id and product_id Create an index on to speed up the search ( Record the time instead of scanning the entire table ) And keep the data in memory .
In this case , When users want to access the most popular products of the past week ( In each category ) When
client Send a request to webserver, Perform reverse proxy ,
webserver Direct request read api,
read api Read data from sql In the database
We use a public rest api
$ curl https://amazon.com/api/v1/popular?category_id=1234
return :
{
"id": "100",
"category_id": "1234",
"total_sold": "100000",
"product_id": "50",
},
{
"id": "53",
"category_id": "1234",
"total_sold": "90000",
"product_id": "200",
},
{
"id": "75",
"category_id": "1234",
"total_sold": "80000",
"product_id": "3",
},
The next step is to extend the existing structure 
边栏推荐
- How to reduce the shutdown time of InnoDB database?
- Iterable、Collection、List 的常见方法签名以及含义
- String class
- [algorithm] sword finger offer2 golang interview question 7: 3 numbers with 0 in the array
- 继承和多态(下)
- [algorithm] sword finger offer2 golang interview question 5: maximum product of word length
- Dark chain lock (lca+ difference on tree)
- Small exercise of library management system
- [dry goods] cycle slip detection of suggestions to improve the fixed rate of RTK ambiguity
- Error: sorting and subscript out of bounds
猜你喜欢

MYSQL索引钟B-TREE ,B+TREE ,HASH索引之间的区别和应用场景

系统设计学习(三)Design Amazon‘s sales rank by category feature

2022国赛Re1 baby_tree

基本Dos命令

阿里云微服务(二) 分布式服务配置中心以及Nacos的使用场景及实现介绍
![[algorithm] sword finger offer2 golang interview question 4: numbers that appear only once](/img/f7/23ffc81ec8e9161c15d863c1a67916.png)
[algorithm] sword finger offer2 golang interview question 4: numbers that appear only once

TYUT太原理工大学2022“mao gai”必背

MySQL shutdown is slow

Redis介绍与使用

国企秋招经验总结
随机推荐
Edit distance (multi-source BFS)
Abstract classes and interfaces
[算法] 剑指offer2 golang 面试题3:前n个数字二进制形式中1的个数
Code example of MATLAB reading GNSS observation value o file
系统设计学习(三)Design Amazon‘s sales rank by category feature
系统设计学习(一)Design Pastebin.com (or Bit.ly)
面渣逆袭:Redis连环五十二问,三万字+八十图详解。
【快趁你舍友打游戏,来看道题吧】
[algorithm] sword finger offer2 golang interview question 3: the number of 1 in the binary form of the first n numbers
Iterable、Collection、List 的常见方法签名以及含义
Usage differences between isempty and isblank
The earth revolves around the sun
The port is occupied because the service is not shut down normally
TYUT太原理工大学2022“mao gai”必背
Record: I accidentally wrote a recursion next time
初识C语言(上)
Problems and solutions of robust estimation in rtklib single point location spp
阿里云微服务(二) 分布式服务配置中心以及Nacos的使用场景及实现介绍
Excel导入,导出功能实现
Record: the solution of MySQL denial of access when CMD starts for the first time