当前位置:网站首页>Design of distributed (cluster) file system
Design of distributed (cluster) file system
2022-07-29 08:38:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack , I wish every programmer can learn more languages .
This article talks about distributed file system , It is implemented through clusters , Therefore, it is also a cluster file system . This article introduces the common problems in distributed file system and GFS The solution given in .
Design The main points of :
performance
The way to improve performance is parallelism , Talk about the decomposition of a task into multiple tasks , Run at the same time .
GFS The idea in is to block the file , Each block is a chunk, every last chunk Save separately , preservation chunk The node of is chunkserver. Reading and writing files , Can be turned into pairs chunk Read and write , Different chunk Can run in parallel , Increase of efficiency . every last chunk There is only one chunk handle Express ,chunk Size (chunk size) It needs to be determined according to the application characteristics .chunk size Too big , Affect parallelism ; Too small , Take up a lot of other metadata Space , Many other chunk Parsing time .
High availability (availability) And reliability (reliability)
Availability refers to the proportion of the average downtime of the system to the total usage time , The smaller the better. .
Controllability refers to the mean time between failures of the system , The longer the better .
High availability can be achieved through hardware redundancy (redundency) Realization , When a hardware cannot work , Quickly switch to the backup system . Reducing the switching time can add � Availability of the system .
Improve fault tolerance (fault tolerance) It can improve the reliability of the system , One component invalid , Will not affect the overall function .
GFS The method of hardware redundancy is adopted to realize high availability and high reliability .master and chunkserver There are backups , about chunkserver, Yes 3 Nodes backup each other , Backup to chunk In units .
master Active and standby (master/slave) Mode backup .master Master node (primary master) Will send their status information (operation log and checkpoint) Synchronize to the backup node , When the primary node is not available , The backup node can be used by the new primary node .
chunkserver Adopt load balancing (load balance) Mode backup . every last chunk Will be kept in 3 platform chunkserver On , every last chunkserver Will participate in processing requests . For read requests , Will choose distance client Closer chunkserver; For write requests ,master Will choose one primary replica, from primary replica Responsible for client The data of is synchronized to other backup nodes .
Extensibility
The scale of the cluster can be dynamically expanded according to needs , Mainly refers to the scalability of storage capacity , Add �chunkserver The ability of . To support dynamic extension , Need to do chunk Transparency of storage location (Location transparency), Users don't need to know every chunk On which machines , It is dynamically maintained by the system . In this system chunkserver The addition of � And reduction will not affect client Use .
GFS The idea in , Adopt one master Node to save all metadata (metadata), Realization chunk Location transparency .
master The main work of :
Of documents namespace management
Similar to the folder structure in traditional file system , Maintain existing files in the system .
Map the file to chunks
Each file consists of several chunk form . from client It seems ,chunk from 0 Start numbering until N.master You need to chunk The logical number of (0,1,…N) Map to internal chunk handle(chunk Unique global identifier for ).
chunk Location management
Record each chunk Of chunkserver The location of , every last chunk Will be saved in multiple chunkserver On . Based on this information , Can achieve chunk Position independence of .master Will be with everyone chunkserver signal communication , Have got every chunkserver Last saved chunk Information . Create a new chunk when ,master It will also be selected according to a certain algorithm chunkserver( Default choice 3 platform , Deposit 3 Backup ) To preserve chunk.
Fault tolerant processing
The system is executing , May appear chunkserver The condition of the damage .master It is necessary to store what is stored on this machine chunk Make another deployment (re-replicatoin). And with the addition of machines �, Also right chunk Store and balance again (rebalancing,chunk migration between chunk servers).
Metadata saving and backup 、 recovery
The realization of the above functions depends on metadata,metadata Mainly stored in memory . Changes to files in the system (mutation) Will be recorded in the operation log (operation log, This log is similar to the log in a relational database system ). In memory metadata Will also checkpoint Stored on disk in the form of . Based on a checkpoint And on it operation log, Can restore the state of the system ( stay checkpoint On this basis, you can replay all the operations that have been run ).master Will be checkpoint and operation log Back up to other machines , To realize backup and recovery .
Consistency model (consistency model)
Consistency refers to being a client When the contents of the file are changed , The rest of the clients Can you see these contents , When to see .
GFS Strong consistency is adopted in (strong consistency model) Model , In this model , When one client After the change , Everything else clients Will immediately see changes ( No matter what client From which backup is it read ). This is consistent with the ordinary stand-alone file system .GFS The concurrency control of file reading and writing is not provided in , When more than one client When changing at the same time , The content of the document is uncertain (undefined)
concurrency control (concurrency control)
When there is more than one client When running an operation at the same time , To ensure the correctness of the system .
GFS Middle to file namespace Atomic operation , And multiple client Using it at the same time will not cause problems , There are certain concurrency control strategies inside ( Through lock lock Mechanism realization ). But there is no concurrency control for reading and writing file contents .
GFS There is also an atomic operation , Additional records (Atomic Record Appends).
summary :
These problems occur in every distributed file system , Different systems may have different solutions . While studying , Be able to focus on these points first .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/118611.html Link to the original text :https://javaforall.cn
边栏推荐
- Analysis of zorder sampling partition process in Hudi - "deepnova developer community"
- Sword finger offer 27. image of binary tree
- Cluster usage specification
- ADB common command list
- [[first blog]p controller implementation instructions in UDA course]
- Day4: SQL server is easy to use
- (视频+图文)机器学习入门系列-第3章 逻辑回归
- 110道 MySQL面试题及答案 (持续更新)
- 01背包关于从二维优化到一维
- Arfoundation starts from scratch 5-ar image tracking
猜你喜欢

Segment paging and segment page combination

Day5: PHP simple syntax and usage

Day13: file upload vulnerability

What if official account does not support markdown format file preparation?

Virtual augmentation and reality Part 2 (I'm a Firebird)

Common query optimization technology of data Lake - "deepnova developer community"

C language sorts n integers with pointers pointing to pointers

优秀的Allegro Skill推荐

ROS tutorial (Xavier)

Requests library simple method usage notes
随机推荐
How to quickly experience oneos
用户身份标识与账号体系实践
Day6: using PHP to write landing pages
C language function output I love you
The computer video pauses and resumes, and the sound suddenly becomes louder
MySQL statement mind map
Flask reports an error runtimeerror: the session is unavailable because no secret key was set
Is the sub database and sub table really suitable for your system? Talk about how to select sub databases, sub tables and newsql
Play Parkour with threejs Technology
Count the list of third-party components of an open source project
Fastjson's tojsonstring() source code analysis for special processing of time classes - "deepnova developer community"
Temperature acquisition and control system based on WiFi
Proteus simulation based on msp430f2491
ROS common instructions
C language calculates the length of string
搜索与回溯经典题型(八皇后)
2022 spsspro certification cup mathematical modeling problem B phase II scheme and post game summary
预训练模型与传统方法在排序上有啥不同?
pnpm install出现:ERR_PNPM_PEER_DEP_ISSUES Unmet peer dependencies
Day4: the establishment of MySQL database and its simplicity and practicality