当前位置:网站首页>系统设计学习(一)Design Pastebin.com (or Bit.ly)
系统设计学习(一)Design Pastebin.com (or Bit.ly)
2022-07-06 09:19:00 【几何学家】
其实系统设计这个东西个人感觉国内的面试中不经常考,但是个人感觉用处还是很大的,相比于八股文,其更有实际价值,因此在北美的面试中经常会考到(即使是new grad),anyway,本着不是特别功利的思想,自己想要学这门课程,长远看来应该很有益。
Learning how to design scalable systems will help you become a better engineer.
其实系统设计的精髓应该就是寻找trade off的过程
Consistency - Every read receives the most recent write or an error
Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures
网络不可能同时满足这三个条件,因此你需要在consistency 和 availability中找到一个trade off
Design Pastebin.com (or Bit.ly)
We’ll scope the problem to handle only the following use cases:
User enters a block of text and gets a randomly generated link Expiration
Default setting does not expire
Can optionally set a timed expiration
User enters a paste’s url and views the contents
User is anonymous
Service tracks analytics of pages Monthly visit stats
Service deletes expired pastes Service has high availability
Out of scope User:
registers for an account
User verifies email
User logs into a registered account
User edits the document
User can set visibility
User can set the shortlink
Constraints and assumptions:
State assumptions
Traffic is not evenly distributed
Following a short link should be fast
Pastes are text only
Page view analytics do not need to be realtime 10 million
users 10 million paste writes per month 100
million paste reads per month 10:1 read to write ratio
1 KB content per paste
shortlink - 7 bytes
expiration_length_in_minutes - 4 bytes
created_at - 5 bytes
paste_path - 255 bytes
total = ~1.27 KB
先看第一个case:User enters a block of text and gets a randomly generated link Expiration
因此应该有一个关系型数据库作为哈希表的用途去进行存储url到文件的一个映射,但是除此之外NoSQL也可以充当哈希表,关于是用关系型还是非关系型数据库,我觉得还是要找到一个trade off,下面的讨论都是基于关系型数据库:
client发送创建请求到web server,运行反向代理
web server请求write api server
write api server做了下列操作:
3.保存到Object Store
shortlink char(7) NOT NULL//url
expiration_length_in_minutes int NOT NULL//过期时间
created_at datetime NOT NULL//创建时间
paste_path varchar(255) NOT NULL//路径,也就是哈希表 的value
PRIMARY KEY(shortlink)
为了生成url,可以使用ip+时间戳在进行哈希,再用Base 62进行编码,取前7位作为输出,62^7大概能够映射36亿的关系
url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
使用一个公共的REST API
$ curl -X POST --data '{ "expiration_length_in_minutes": "60", \
"paste_contents": "Hello World!" }' https://pastebin.com/api/v1/paste
"shortlink": "foobar"
Use case: User enters a paste’s url and views the contents
client发送请求到web server
web server调用read api
read api做下列操作:
Use case: Service tracks analytics of pages
class HitCounts(MRJob):
def extract_url(self, line):
"""Extract the generated url from the log line."""
def extract_year_month(self, line):
"""Return the year and month portions of the timestamp."""
def mapper(self, _, line):
"""Parse each log line, extract and transform relevant lines. Emit key value pairs of the form: (2016-01, url0), 1 (2016-01, url0), 1 (2016-01, url1), 1 """
url = self.extract_url(line)
period = self.extract_year_month(line)
yield (period, url), 1
def reducer(self, key, values):
"""Sum values for each key. (2016-01, url0), 2 (2016-01, url1), 1 """
yield key, sum(values)
Use case: Service deletes expired pastes
讨论瓶颈问题是非常重要的,比如添加多个服务器的负载均衡可以解决哪些问题?CDN?主从副本?怎么在这些之间找到trade off呢?这些东西都没有标准答案
- 记录:下一不小心写了个递归
- 10 minutes pour maîtriser complètement la rupture du cache, la pénétration du cache, l'avalanche du cache
- MySQL backup -- common errors in xtrabackup backup
- [算法] 剑指offer2 golang 面试题4:只出现一次的数字
- What are the functions and features of helm or terrain
- The earth revolves around the sun
- Fairygui loop list
- Agile development helps me
- Introduction to the daily practice column of the Blue Bridge Cup
- [算法] 剑指offer2 golang 面试题1:整数除法
Mysql database index
[algorithme] swordfinger offer2 golang question d'entrevue 2: addition binaire
The master of double non planning left the real estate company and became a programmer with an annual salary of 25W. There are too many life choices at the age of 25
Music playback (toggle & playerprefs)
Heap sort [handwritten small root heap]
The earth revolves around the sun
Unity3d, Alibaba cloud server, platform configuration
It has been solved by personal practice: MySQL row size too large (> 8126) Changing some columns to TEXT or BLOB or using ROW_ FORMAT
GNSS positioning accuracy index calculation
[算法] 剑指offer2 golang 面试题2:二进制加法
Edit distance (multi-source BFS)
Knowledge system of digital IT practitioners | software development methods -- agile
Rt-ppp test using rtknavi
[算法] 剑指offer2 golang 面试题4:只出现一次的数字
[rtklib] preliminary practice of using robust adaptive Kalman filter under RTK
Record: the solution of MySQL denial of access when CMD starts for the first time
Implementation of Excel import and export functions
Devops' future: six trends in 2022 and beyond
KF UD分解之伪代码实现进阶篇【2】
[rtklib 2.4.3 B34] version update introduction I
Matlab读取GNSS 观测值o文件代码示例
Fairygui character status Popup
(core focus of software engineering review) Chapter V detailed design exercises