当前位置：网站首页>System design: File Hosting Service

System design: File Hosting Service

2022-06-24 02:54:00 【Xiaochengxin post station】

demand

Let's design a file hosting service , such as Dropbox or Google Drive. Cloud file storage allows users to store data on remote servers . Usually , These servers are maintained by cloud storage providers , And through the Internet （ Usually through the Internet ） For users . Users pay cloud data storage fees every month . Similar services ：OneDrive、Google Drive

Difficulty level ： secondary

1. Why cloud storage ？

Cloud file storage services have become very popular recently , Because they simplify the storage and exchange of digital resources between multiple devices . From using a single PC to using multiple devices with different platforms and operating systems , Like smartphones and tablets , Each can be accessed from different geographical locations at any time , This is considered to be the reason for the great popularity of cloud storage services . Here are some of the main benefits of such services ：

Usability ： The motto of cloud storage services is to provide data availability anytime, anywhere . Users can access their files from any device anytime, anywhere / Photo .

Reliability and durability ： Another benefit of cloud storage is that it provides 100% Data reliability and durability . Cloud storage stores multiple copies of data on servers in different geographical locations , Ensure that users never lose data .

Extensibility ： Users never have to worry about insufficient storage space . Using cloud storage , As long as you are willing to pay , You can have unlimited storage space .

If you haven't used it before dropbox.com, It is highly recommended to create an account there , Upload / Edit a file , And see the different options offered by their services . This will help you understand the knowledge points of this article .

2. System requirements and objectives

You should make it clear at the beginning of the interview that . Be sure to ask questions , Find out the exact scope of the system in the interviewer's mind

What we hope to achieve through cloud storage systems ？ The following are the top requirements of our system ：

1. Users should be able to upload and download their files from any device / Photo .

2. Users should be able to share files or folders with other users .

3. Our service should support automatic synchronization between devices , That is, on a device after updating the file , It should be synchronized on all devices .

4. The system shall support storage up to GB Large files of .

5.ACID It's necessary . Atomicity of all file operations 、 Uniformity 、 Isolation and persistence should be guaranteed .

6. Our system should support offline editing . The user should be able to add / Delete / Modify file ： offline , And once online , All changes should be synchronized to remote servers and other online devices .

Extension requirements

• The system shall support data snapshot , So that the user can return to any version of the file .

3. Some design considerations

• We should have huge read and write capacity .

• The expected read-write ratio is almost the same .

• In the internal , Files can be stored in small sections or blocks （ such as 4MB）; This can provide many information benefits , That is, all failed operations can only be retried for a small part of the file . If the user fails to upload the file , Then only the failed blocks will be retried .

• We can reduce the amount of data exchanged by transferring only updated data blocks .

• By removing duplicate blocks , We can save storage space and bandwidth .

• Put metadata （ file name 、 Size, etc ） Saving a local copy of the on the client can save us a lot of time to go back and forth to the server .

• For small changes , The client can upload differences intelligently , Not the entire block , That is, incremental upload

4. Capacity estimates and limitations

• Suppose we have 5 Billion total users and 1 Billion daily active users （DAU）.

• Suppose that each user connects from three different devices on average .

• On average, , If the user owns 200 File / Photo , We will have 1000 Billion documents .

• Assume that the average file size is 100KB, This will provide us with 10 PB Total storage space .

100B*100KB=>10PB

• We also assume that there will be onemillion active connections per minute .

5. Advanced design

Users will specify a folder as the workspace on their device . Any files placed in this folder / Photo / Folders will be uploaded to the cloud , Whenever a file is modified or deleted , Will be reflected in cloud storage in the same way . Users can specify similar workspaces on all their devices , And any changes made on one device will be propagated to all other devices , So that you have the same workspace view everywhere .

At a higher level , We need to store the file and its metadata information , Such as file name 、 file size 、 Directory etc. , And who to share this file with . therefore , We need something that can help the client upload files / Download to the cloud storage server , And servers that can help update files and user metadata . We also need some mechanisms , To notify all clients when an update occurs , So they can synchronize their files .

As shown in the figure below , The block server will upload from cloud storage together with the client / Download the file , The metadata server will be in SQL or NoSQL Update the metadata of the file in the database . The synchronization server processes a workflow that notifies all clients of different synchronization changes .

6. Component design

Let's introduce the main components of the system one by one ：

A. client

Client applications monitor workspace folders on the user's computer , And all the files in it / Folders are synchronized with remote cloud storage . The client application will work with the storage server , Upload the actual file 、 Download and modify to backend cloud storage . The client also interacts with the remote server

Synchronization service , Used to handle any file metadata updates , For example, file name 、 size 、 Change of modification date, etc .

Here are some basic customer operations ：

1. Upload and download files .

2. Detect file changes in workspace folders .

3. Handle conflicts caused by offline or concurrent updates .

How do we handle file transfers effectively ？

As mentioned above , We can break each file into smaller chunks , So that only modified blocks are transferred , Not the whole document . Suppose we divide each file into fixed size 4MB block . We can use 1） We use storage devices in the cloud to optimize space utilization and input per second / Output operation （IOPS）2） network bandwidth 3） Statically calculate the optimal block size, such as the average file size in the storage . In our metadata , We should also record each file and the blocks that make up it .

Should we keep a copy of the metadata on the client side ？

Keeping a local copy of the metadata not only enables us to update offline , It also saves a lot of round trip time for updating remote metadata .

How clients can effectively listen for changes to other clients ？

One solution is , The client periodically checks with the server for any changes . The problem with this approach is , There is a delay when we reflect changes locally , Because the client will periodically check for changes , The server notifies when changes occur . If the client frequently checks for changes to the server , It's not just a waste of bandwidth , Because the server must return an empty response most of the time , And keep the server busy . Extracting information in this way is not scalable .

The solution to the above problem can be to use HTTP Long polling .

By long polling , The client requests information from the server , Expect the server not to respond immediately . If the server does not have new data from the client when polling is received , The server will keep the request open and wait for the response information to become available , Instead of sending an empty response . Once you have new information , The server immediately sends... To the client HTTP/S Respond to , Complete the open HTTP/S request . After receiving the response from the server , The client can immediately make another server request , For future updates

Based on the above considerations , We can divide our customers into the following four parts ：

I. Internal metadata database , All files will be tracked 、 block 、 Its version and its location in the file system .

Two ,.Chunker Split the file into smaller blocks . It will also be responsible for rebuilding files from file blocks . Our blocking algorithm will detect the part of the file modified by the user , And only transfer these parts to cloud storage ; This will save us bandwidth and synchronization time .

3、 ... and 、 Watcher Local workspace folders will be monitored , And any operation performed by the user （ for example , When users create 、 When deleting or updating a file or folder ） Notification indexer （ As follows ）.Watcher Also listens for any changes that occur on other clients that are broadcast by the synchronization service .

Four 、 Indexer The events received from the observer will be processed , And update the internal metadata database with information about modifying file blocks . Once the block is successfully submitted / Download to cloud storage , The indexer will communicate with the remote synchronization service , Broadcast changes to other clients and update the remote metadata database .

How clients should handle slower servers ？

If the server is busy / No response , The client should exit exponentially . It means , If the server response is too slow , The client should delay retrying , And the delay should multiply .

Whether the mobile client should immediately synchronize remote changes ？

With desktop or web The client is different , Mobile clients usually synchronize on demand to save users' bandwidth and space .

B Meta database

The metadata database is responsible for maintaining relevant documents / block 、 Version control and metadata information for users and workspaces . The metadata database can be a relational database （ Such as MySQL） or NoSQL Database services （ Such as DynamoDB）. Regardless of the type of database , Synchronization services should be able to provide a consistent view of files using databases , Especially when multiple users use the same file at the same time . because NoSQL Datastores do not support the benefits of scalability and performance ACID attribute , So we need to programmatically ACID Property support is incorporated into the logic of the synchronization service , in case

Choose this database . however , Using a relational database can simplify the implementation of synchronization services , Because they natively support ACID attribute .

The metadata database should store information about the following objects ：

1.Chunks

2. Folder

3. Users

4. device

5. work area （ Sync folder ）

C Synchronization service

The synchronization service is a component that processes file updates made by the client and applies these changes to other subscription clients . It also synchronizes the client's local database with the information stored in the remote metadata database . Synchronization service is the most important part of the system architecture , Because it plays a key role in managing metadata and synchronizing user files . The desktop client communicates with the synchronization service , To get updates from cloud storage , Or send files and updates to cloud storage , And may be sent to other users . If the client is offline for a period of time , It will poll the system immediately after the new update goes online . When the synchronization service receives an update request , It checks the consistency of the metadata database , Then continue to update . And then , Notifications will be sent to all subscribed users or devices , Update with report file

The design of the synchronization service should ensure that less data is transferred between the client and cloud storage , For better response time . In order to achieve this design goal , Synchronization services can use differential algorithms to reduce the amount of data that needs to be synchronized . We can transfer only the differences between the two versions of the file , Instead of transferring the entire file from the client to the server , Or vice versa . therefore , Transfer only the portion of the file that has changed . This also reduces end-user bandwidth consumption and cloud data storage . As mentioned above , We will divide the document into 4MB The block , And only the modified blocks are transferred . The server and client can compute hashes （ for example ,SHA-256）, To see if the local copy of the block is updated . On the server , If we already have a block with a similar hash （ Even from another user ）, We don't need to create another copy , We can use the same block . This will be discussed in detail later in data De duplication .

In order to provide an efficient and scalable synchronization protocol , We can consider using the communication middleware between the client and the synchronization service . Message passing middleware should provide extensible message queue and change notification , To support a large number of clients using pull or push policies . such , Multiple synchronization service instances can receive requests from the global request queue , And the communication middleware will be able to balance its load .

D Message Queuing service

An important part of our architecture is messaging middleware , It should be able to handle a large number of requests . An extensible Message Queuing service that supports asynchronous message based communication between clients and synchronous services is best suited to the requirements of our applications . Message queue service supports asynchronous and loosely coupled message based communication between distributed components of the system . The Message Queuing service should be able to efficiently store any number of messages in high availability 、 Reliable and scalable queues .

The Message Queuing service will implement two types of queues in our system . The request queue is a global queue , All clients will share it . The client's request to update the metadata database will first be sent to the request queue , The synchronization service will get the request to update metadata from there . The response queue corresponding to a single subscription client is responsible for delivering update messages to each client . Because once the client receives the message , The message is deleted from the queue , So we need to create a separate response queue for each subscribed client to share update messages .

E cloud / Block storage

cloud / Block storage stores the file blocks uploaded by the user . The client interacts directly with the storage , To send and receive objects from memory . The separation of metadata from storage enables us to use any storage in the cloud or within .

7. File processing workflow

The following sequence shows when the client a Update with client B and C Shared files , Interaction between application components , So they should also receive updates . If other clients are not online at the time of update , Then the Message Queuing service will keep the update notification in a separate response queue , Until they come online later .

1. client A Upload blocks to cloud storage .

2. client A Update metadata and commit changes .

3. client A To be confirmed , And to the client B and C Send notification of changes .4. client B and C Receive metadata changes and download updated blocks .

8. Data De duplication

Data De duplication is a technology used to de duplicate data copies to improve storage utilization . It can also be applied to network data transmission , To reduce the number of bytes that must be sent . For each new incoming block , We can calculate its hash , And compare the hash with all the hashes of the existing block , To see if the same block already exists in our storage .

There are two ways we can implement data De duplication in our systems ：

A. Post process de duplication

Use post-processing deduplication , The new data block is first stored on the storage device , Then a process analyzes the data to find duplicates . The advantage of this is , The client does not need to wait for hash calculations or lookups to complete before storing data , This ensures that storage performance does not degrade . The disadvantage of this method is ：1） We will store duplicate data unnecessarily , Although in a short time ,2） The transmission of duplicate data will consume bandwidth .

B Online de duplication

perhaps , When a client enters data on its device , De duplication hash calculation can be completed in real time . If our system recognizes a stored block , Then only references to existing blocks will be added to the metadata , Instead of a complete copy of the block . This approach will provide us with the best network and storage utilization .

9 Metadata Partition

To extend the metadata database , We need to partition it , So that it can store information about millions of users and billions of files / Block information . We need to propose a zoning scheme , Divide and store data in different places DB Server .

1. Vertical zones ：

We can partition the database , To store tables related to a particular function on a server . for example , We can store all user related tables in a database , Link all with the file / Block related tables are stored in another database . Although this approach is easy to implement , But there are also some problems ：

Will we still have the problem of scale ？ If we want to store trillions of data blocks , And our database cannot support such a large number of records , What should I do ？ How can we further divide these tables ？

2. Joining two tables in two separate databases can cause performance and consistency problems . How often do we have to connect the user table and the file table ？

2. Range based partitioning ：

If we set the file according to the first letter of the file path / Blocks are stored in separate partitions , What will happen? ？ under these circumstances , We'll take all the letters “A” The first file is saved in a partition , Will be written in letters “B” The first file is saved to another partition , And so on . This approach is called range based partitioning . We can even combine some less common letters into a database partition . We should propose this partitioning scheme statically , So we can always store in a predictable way / Find files .

The main problem with this method is , It may cause server imbalance . for example , If we decide to put all the letters “E” The first file is placed in a DB partition , Later we found that the letters “E” Too many files at the beginning , So that we can't put them in one DB partition

3. Hash based partitioning ：

In this scheme , We hash the objects being stored , And calculate what the object should go to according to the hash DB Partition . In our case , We can take

Of the file objects we are storing “FileID” hash , To determine the partition where the file will be stored . Our hash function randomly distributes objects into different partitions , for example , Our hash function can always put any ID Mapping to [1…256] A number between , This number will be the partition where we store the objects .

This approach still leads to partition overload , This can be solved by using consistent hashing .

10 cache

There are two kinds of cache in our system . In order to process thermal documents / block , We can introduce caching for block storage . We can use an off the shelf solution , such as Memcached, It can use its own id/ Hash stores the entire block , And before clicking on block storage , The block server can quickly check whether the cache has the required blocks . According to the usage mode of the client , We can determine how many cache servers we need . High end commercial servers can have 144GB Of memory ; Such a server can cache 36K Block .

Which cache replacement strategy best suits our needs ？ When the cache is full , And we hope to use newer / When a hotter block replaces a block , How are we going to choose ？ For our system , Recently at least use （LRU） It's a reasonable strategy . Under this strategy , We first discard the least recently used block . Similarly , We can provide cache for metadata database .

11 Load balancer （ pounds ）

We can add load balancing layers in two places in the system ：

1） Between the client and the block server ,

2） Between the client and the metadata server .

first , A simple looping method can be used , Distribute incoming requests evenly between back-end servers . this LB Easy to implement , No overhead will be introduced . Another advantage of this approach is , If the server crashes ,LB Will stop it from rotating , And stop sending any traffic to it .

loop LB One of the problems is , It does not take into account the server load . If the server is overloaded or slow ,LB Will not stop sending new requests to the server . To solve this problem , You can put a smarter LB Solution , Query the load of the back-end server regularly , And adjust the flow according to the load .

12 Security 、 Permissions and file sharing

When storing files in the cloud , One of the biggest concerns of users is the privacy and security of their data , Especially in our system , Users can share their files with other users , Even publicly shared with everyone . In order to deal with this problem , We will store the permissions of each file in the metadata database , To reflect which files any user can see or modify .

Reference material

grok_system_design_interview.pdf

原网站

版权声明
本文为[Xiaochengxin post station]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/10/20211025162420922U.html