当前位置:网站首页>Design of distributed (cluster) file system
Design of distributed (cluster) file system
2022-07-29 08:38:00 【Full stack programmer webmaster】
Hello everyone , I meet you again , I'm the king of the whole stack , I wish every programmer can learn more languages .
This article talks about distributed file system , It is implemented through clusters , Therefore, it is also a cluster file system . This article introduces the common problems in distributed file system and GFS The solution given in .
Design The main points of :
performance
The way to improve performance is parallelism , Talk about the decomposition of a task into multiple tasks , Run at the same time .
GFS The idea in is to block the file , Each block is a chunk, every last chunk Save separately , preservation chunk The node of is chunkserver. Reading and writing files , Can be turned into pairs chunk Read and write , Different chunk Can run in parallel , Increase of efficiency . every last chunk There is only one chunk handle Express ,chunk Size (chunk size) It needs to be determined according to the application characteristics .chunk size Too big , Affect parallelism ; Too small , Take up a lot of other metadata Space , Many other chunk Parsing time .
High availability (availability) And reliability (reliability)
Availability refers to the proportion of the average downtime of the system to the total usage time , The smaller the better. .
Controllability refers to the mean time between failures of the system , The longer the better .
High availability can be achieved through hardware redundancy (redundency) Realization , When a hardware cannot work , Quickly switch to the backup system . Reducing the switching time can add � Availability of the system .
Improve fault tolerance (fault tolerance) It can improve the reliability of the system , One component invalid , Will not affect the overall function .
GFS The method of hardware redundancy is adopted to realize high availability and high reliability .master and chunkserver There are backups , about chunkserver, Yes 3 Nodes backup each other , Backup to chunk In units .
master Active and standby (master/slave) Mode backup .master Master node (primary master) Will send their status information (operation log and checkpoint) Synchronize to the backup node , When the primary node is not available , The backup node can be used by the new primary node .
chunkserver Adopt load balancing (load balance) Mode backup . every last chunk Will be kept in 3 platform chunkserver On , every last chunkserver Will participate in processing requests . For read requests , Will choose distance client Closer chunkserver; For write requests ,master Will choose one primary replica, from primary replica Responsible for client The data of is synchronized to other backup nodes .
Extensibility
The scale of the cluster can be dynamically expanded according to needs , Mainly refers to the scalability of storage capacity , Add �chunkserver The ability of . To support dynamic extension , Need to do chunk Transparency of storage location (Location transparency), Users don't need to know every chunk On which machines , It is dynamically maintained by the system . In this system chunkserver The addition of � And reduction will not affect client Use .
GFS The idea in , Adopt one master Node to save all metadata (metadata), Realization chunk Location transparency .
master The main work of :
Of documents namespace management
Similar to the folder structure in traditional file system , Maintain existing files in the system .
Map the file to chunks
Each file consists of several chunk form . from client It seems ,chunk from 0 Start numbering until N.master You need to chunk The logical number of (0,1,…N) Map to internal chunk handle(chunk Unique global identifier for ).
chunk Location management
Record each chunk Of chunkserver The location of , every last chunk Will be saved in multiple chunkserver On . Based on this information , Can achieve chunk Position independence of .master Will be with everyone chunkserver signal communication , Have got every chunkserver Last saved chunk Information . Create a new chunk when ,master It will also be selected according to a certain algorithm chunkserver( Default choice 3 platform , Deposit 3 Backup ) To preserve chunk.
Fault tolerant processing
The system is executing , May appear chunkserver The condition of the damage .master It is necessary to store what is stored on this machine chunk Make another deployment (re-replicatoin). And with the addition of machines �, Also right chunk Store and balance again (rebalancing,chunk migration between chunk servers).
Metadata saving and backup 、 recovery
The realization of the above functions depends on metadata,metadata Mainly stored in memory . Changes to files in the system (mutation) Will be recorded in the operation log (operation log, This log is similar to the log in a relational database system ). In memory metadata Will also checkpoint Stored on disk in the form of . Based on a checkpoint And on it operation log, Can restore the state of the system ( stay checkpoint On this basis, you can replay all the operations that have been run ).master Will be checkpoint and operation log Back up to other machines , To realize backup and recovery .
Consistency model (consistency model)
Consistency refers to being a client When the contents of the file are changed , The rest of the clients Can you see these contents , When to see .
GFS Strong consistency is adopted in (strong consistency model) Model , In this model , When one client After the change , Everything else clients Will immediately see changes ( No matter what client From which backup is it read ). This is consistent with the ordinary stand-alone file system .GFS The concurrency control of file reading and writing is not provided in , When more than one client When changing at the same time , The content of the document is uncertain (undefined)
concurrency control (concurrency control)
When there is more than one client When running an operation at the same time , To ensure the correctness of the system .
GFS Middle to file namespace Atomic operation , And multiple client Using it at the same time will not cause problems , There are certain concurrency control strategies inside ( Through lock lock Mechanism realization ). But there is no concurrency control for reading and writing file contents .
GFS There is also an atomic operation , Additional records (Atomic Record Appends).
summary :
These problems occur in every distributed file system , Different systems may have different solutions . While studying , Be able to focus on these points first .
Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/118611.html Link to the original text :https://javaforall.cn
边栏推荐
- Several ways of debugging support under oneos
- Squareline partners with visual GUI development of oneos graphical components
- QT version of Snake game project
- What if official account does not support markdown format file preparation?
- 2022 spsspro certification cup mathematical modeling problem B phase II scheme and post game summary
- Common query optimization technology of data Lake - "deepnova developer community"
- Count the list of third-party components of an open source project
- C language macro define command exercise
- Flask reports an error runtimeerror: the session is unavailable because no secret key was set
- What is the working principle of the noise sensor?
猜你喜欢
![A little knowledge [synchronized]](/img/4d/4a8beee749328b5867b59740fd7e78.png)
A little knowledge [synchronized]

(视频+图文)机器学习入门系列-第3章 逻辑回归

用户身份标识与账号体系实践

Eggjs create application knowledge points

Implementation of support vector machine with ml11 sklearn

Day6: using PHP to write landing pages

Google browser cross domain configuration free

C language watch second kill assist repeatedly

Reading papers on false news detection (4): a novel self-learning semi supervised deep learning network to detect fake news on

预训练模型与传统方法在排序上有啥不同?
随机推荐
Count the list of third-party components of an open source project
(视频+图文)机器学习入门系列-第2章 线性回归
7.1-default-arguments
How does xjson implement four operations?
Osg3.6.5 failed to compile freetype
PostgreSQL手动创建HikariDataSource解决报错Cannot commit when autoCommit is enabled
GBase 8s数据库有哪些备份恢复方式
Common query optimization technology of data Lake - "deepnova developer community"
Several ways of debugging support under oneos
Intel will gradually end the optane storage business and will not develop new products in the future
Osgsimplegl3 example analysis
Time function in MySQL
What are the backup and recovery methods of gbase 8s database
Personal study notes
Ar virtual augmentation and reality
WQS binary learning notes
commonjs导入导出与ES6 Modules导入导出简单介绍及使用
Arfoundation Getting Started tutorial 7-url dynamically loading image tracking Library
C language calculates the length of string
Four pin OLED display based on stm32