当前位置:网站首页>System design learning (I) design pastebin com (or Bit.ly)
System design learning (I) design pastebin com (or Bit.ly)
2022-07-06 13:08:00 【Geometer】
The idea of this project is to study system design recently , I want to sort out what I learned every time , At the same time, it can also help you better understand what I learned .
In fact, I personally feel that the system design is not often tested in domestic interviews , But personally, I think it's very useful , Compared with eight part essay , It has more practical value , Therefore, in the interview in North America, you often get ( Even if it's new grad),anyway, In the spirit of not being particularly utilitarian , I want to learn this course , It should be beneficial in the long run .
Learning how to design scalable systems will help you become a better engineer.
In fact, the essence of system design should be to find trade off The process of
Just to give you an example , With CAP Take the principle as an example :
Consistency - Every read receives the most recent write or an error
Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures
The network cannot meet these three conditions at the same time , So you need to consistency and availability Find one of them trade off
The following link can help you
scalability
Design Pastebin.com (or Bit.ly)
Let's take a look at the title requirements first
We’ll scope the problem to handle only the following use cases:
User enters a block of text and gets a randomly generated link Expiration
Default setting does not expire
Can optionally set a timed expiration
User enters a paste’s url and views the contents
User is anonymous
Service tracks analytics of pages Monthly visit stats
Service deletes expired pastes Service has high availability
Out of scope User:
registers for an account
User verifies email
User logs into a registered account
User edits the document
User can set visibility
User can set the shortlink
Constraints and assumptions:
State assumptions
Traffic is not evenly distributed
Following a short link should be fast
Pastes are text only
Page view analytics do not need to be realtime 10 million
users 10 million paste writes per month 100
million paste reads per month 10:1 read to write ratio
After getting the title, we can get :
1 KB content per paste
shortlink - 7 bytes
expiration_length_in_minutes - 4 bytes
created_at - 5 bytes
paste_path - 255 bytes
total = ~1.27 KB
That is to say, each piece of data will probably consume 1.27KB Resources for
so
It will consume about every month 12.7GB
It can also be calculated that about four write operations are performed per second , Forty read operations
After getting these basic information, we can use it in system design !
Draw what you need outline
The next step is to start the formal design :
Let's look at the first one case:User enters a block of text and gets a randomly generated link Expiration
Therefore, there should be a relational database to store as a hash table url A mapping to a file , But beyond that NoSQL It can also act as a hash table , About whether to use relational or non relational database , I think we still need to find one trade off, The following discussion is based on relational database :
client Send a create request to web server, Run the reverse proxy
web server request write api server
write api server The following operations have been done :
1. Generate a url( Check url Does it exist in hash table , If there is , Regenerate a , otherwise , Storage , If it supports user creation url Words , It can also not be generated url Switch to the user's url, Of course, it also needs to be checked )
2. preservation SQL database
3. Save to Object Store
4. Go back to this url
First, define the table structure of the database
shortlink char(7) NOT NULL//url
expiration_length_in_minutes int NOT NULL// Expiration time
created_at datetime NOT NULL// Creation time
paste_path varchar(255) NOT NULL// route , It's a hash table Of value
PRIMARY KEY(shortlink)
Set the primary key to shortlink Create an index back , The database uses this index to enforce uniqueness , Besides, it can also be found in create_at Create an additional index on the field to speed up the search , And save these data in memory to speed up the search
In order to generate url, have access to ip+ The timestamp is hashing , Reuse Base 62 Encoding , Take before 7 Bits as output ,62^7 Probably able to map 36 Billion relationship
url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
Use a common REST API
$ curl -X POST --data '{ "expiration_length_in_minutes": "60", \
"paste_contents": "Hello World!" }' https://pastebin.com/api/v1/paste
Returnee
{
"shortlink": "foobar"
}
Use case: User enters a paste’s url and views the contents
client Send a request to web server
web server call read api
read api Do the following :
Check sql Of url, If url In a relational database , Then we find value, otherwise , Return error
Use case: Service tracks analytics of pages
Because real-time analysis is not required , So we can simply use MapReduce To make statistics
class HitCounts(MRJob):
def extract_url(self, line):
"""Extract the generated url from the log line."""
...
def extract_year_month(self, line):
"""Return the year and month portions of the timestamp."""
...
def mapper(self, _, line):
"""Parse each log line, extract and transform relevant lines. Emit key value pairs of the form: (2016-01, url0), 1 (2016-01, url0), 1 (2016-01, url1), 1 """
url = self.extract_url(line)
period = self.extract_year_month(line)
yield (period, url), 1
def reducer(self, key, values):
"""Sum values for each key. (2016-01, url0), 2 (2016-01, url1), 1 """
yield key, sum(values)
Use case: Service deletes expired pastes
If you want to delete expired data , You need to scan the entire database regularly , Find the expired timestamp , Then delete ( The latter is marked as expired )
The above is the basic implementation of the system
Then we need to consider scalability
About scalability , I think everyone should have his own opinion , But it is very important to keep repeating the test , Find bottlenecks , There are three steps to find a solution to the bottleneck
It is very important to discuss the bottleneck problem , For example, what problems can be solved by adding load balancing of multiple servers ?CDN? Master slave copy ? How to find between these trade off Well ? There is no standard answer to these things
边栏推荐
- 167. Sum of two numbers II - input ordered array - Double pointers
- [Chongqing Guangdong education] Shandong University College Physics reference materials
- TYUT太原理工大学2022软工导论大题汇总
- 面试必备:聊聊分布式锁的多种实现!
- [algorithm] sword finger offer2 golang interview question 9: subarray with product less than k
- Novatel board oem617d configuration step record
- Excel导入,导出功能实现
- Role movement in the first person perspective
- Rt-ppp test using rtknavi
- [算法] 剑指offer2 golang 面试题12:左右两边子数组的和相等
猜你喜欢
系统设计学习(三)Design Amazon‘s sales rank by category feature
[GNSS data processing] Helmert variance component estimation analysis and code implementation
[algorithm] sword finger offer2 golang interview question 10: subarray with sum K
Chromatic judgement bipartite graph
平衡二叉树详解 通俗易懂
XV Function definition and call
TYUT太原理工大学2022“mao gai”必背
闇の連鎖(LCA+树上差分)
3月15号 Go 1.18 正式版发布 了解最新特色以及使用方法
[算法] 剑指offer2 golang 面试题5:单词长度的最大乘积
随机推荐
Introduction pointer notes
The earth revolves around the sun
[algorithm] sword finger offer2 golang interview question 4: numbers that appear only once
Ten minutes to thoroughly master cache breakdown, cache penetration, cache avalanche
[algorithm] sword finger offer2 golang interview question 12: the sum of the left and right sub arrays is equal
[算法] 劍指offer2 golang 面試題2:二進制加法
Tyut Taiyuan University of technology 2022 introduction to software engineering examination question outline
[rtklib] preliminary practice of using robust adaptive Kalman filter under RTK
GPS高程拟合抗差中误差的求取代码实现
162. Find peak - binary search
染色法判定二分图
Realization of the code for calculating the mean square error of GPS Height Fitting
Record: solution of 404 error of servlet accessing database in dynamic web project
TYUT太原理工大学2022数据库大题之E-R图转关系模式
记录:初次cmd启动MySQL拒接访问之解决
PRIDE-PPPAR源码解析
Detailed explanation of balanced binary tree is easy to understand
错误: 找不到符号
继承和多态(下)
[algorithm] sword finger offer2 golang interview question 8: the shortest subarray with a sum greater than or equal to K