当前位置:网站首页>Research on next generation distributed file system
Research on next generation distributed file system
2022-06-10 18:24:00 【bandaoyu】
Next generation distributed file system
The shortcomings of the previous generation
Metadata exists in the back end :I/O The path is long 、 Synchronization and interlock mechanism
The previous generation of distributed file systems , Due to the hardware limitations at that time , To solve the capacity bottleneck of metadata , Some products ( Such as CephFS) Save the metadata in the backend RADOS In cluster ,I/O The path is long , And because of the complex synchronization and interlock mechanism , High performance loss , Cost performance is not ideal ;
Some products ( Such as HDFS) Use memory to store all metadata , Although the metadata performance is good , However, due to the limited memory capacity , The number of files supported by the system is relatively small , Limited expansion capacity .
Next generation improvements
New storage media :SSD (NVMe agreement )(Non Volatile Memory Express)
Storage engine : High performance LSM Storage engine
New network module : High speed network transmission module , The original is RoCE/RDMA High performance networks are tailored
NVMe The emergence of the agreement , Greatly reduce the cost of interface protocol ,SCM( Storage level memory ) Appearance , Greatly improve the performance of media , Plus the drop in particle costs , bring 5TB More than capacity NVMe SSD More common .
Take advantage of the latest multicore CPU、 High capacity and performance NVMe SSD, It only needs 3 All flash metadata high availability nodes ( Data nodes can be shared ), It can be stored and processed efficiently 100 Data on the scale of hundreds of millions of documents , At the same time, it provides the processing capacity of millions of metadata read and write requests per second , High cost performance .
· Based on the latest generation NVMe/SCM Storage media design , Give full play to nearly one million new media IOPS sum GB Performance advantages of bandwidth , Easily meet the high-frequency metadata access requirements for the file system ;
· Take advantage of high performance LSM Storage engine , combination XSKY Key value design of unique patented technology , Build a completely autonomous metadata service ; Compatible POSIX File semantics and S3 Object semantics , It also supports users / User group 、 jurisdiction /ACL、 Extended attributes, etc ;
· Log protection of metadata in this node and strong consistency replication between nodes , This makes it easy for the metadata cluster to deal with slow disks 、 Network anomalies 、 Node restart / Power failure and other fault scenarios , Provide RPO=0 Metadata access to ;
· Use XSKY Self developed high-speed network transmission module , The original is RoCE/RDMA High performance networks are tailored , Greatly reduce the transmission delay of metadata replication packets between nodes , This makes the whole metadata cluster have a higher IOPS performance .
XGFS The distributed file storage system is composed of metadata service cluster and mixed disk data service , This makes the product inherit XSKY Years of deep accumulation on distributed hybrid disks and large-scale storage, operation and maintenance capabilities :
Multi level cache technology 、 Supports replicas and EC Erasure code 、 Support extended cluster dual active 、 Hard disk and network sub-health processing, etc , Mature and stable , Rich features and simple operation and maintenance .
XSKY Star Trinasolar releases a new next generation distributed file system XGFS
There will be global 80% Our data is unstructured . File protocol is the most common way to access unstructured data , according to IDC Statistics ,2019 year , The software defined storage market in China is about 60% It's file storage .

Traditional file system
Traditional file systems have many limitations :
1、 Metadata and data are stored locally , Can't scale horizontally , No node level high availability ;
2、 Limited by the storage space and performance of metadata , The actual number of files that can be saved is limited , Generally less than 1 Billion , The storage space is TB Level ;
3、 Non uniform namespace , Multiple mount directories cannot be interconnected , Complex use ;
4、 The file storage gateway is not extensible , Unable to increase bandwidth , Create access bottlenecks ;
5、 New businesses such as big data and containers are not supported .
Distributed file storage
Distributed file storage , The most complex is the storage and processing of metadata . According to the statistics , Most of AI/ML Analytical applications ,90% Of I/O All are requests for metadata operations .
The previous generation of distributed file systems , Due to the hardware limitations at that time , To solve the capacity bottleneck of metadata , Some products ( Such as CephFS) Save the metadata in the backend RADOS In cluster ,I/O The path is long , And because of the complex synchronization and interlock mechanism , High performance loss , Cost performance is not ideal ;
Some products ( Such as HDFS) Use memory to store all metadata , Although the metadata performance is good , However, due to the limited memory capacity , The number of files supported by the system is relatively small , Limited expansion capacity .
Is there an architecture , Be able to... At a lower cost , Minimalist architecture , Meet the performance and capacity requirements of modern file system metadata processing ?
Now? , Large capacity and high speed SSD The popularity of , Make it a reality to have both fish and bear's paws .NVMe The emergence of the agreement , Greatly reduce the cost of interface protocol ,SCM( Storage level memory ) Appearance , Greatly improve the performance of media , Plus the drop in particle costs , bring 5TB More than capacity NVMe SSD More common .

these SSD The development of new technology , add CPU The number of cores is increasing , The full flash metadata node can fully meet the needs of large-scale file systems , such as , It only needs 5TB Of NVMe SSD Metadata space of , You can easily save and process tens of billions of files .
02XGFS Redefining the next generation of distributed file systems
XGFS(XSKY Global File System) yes XSKY A new generation of distributed file storage system , A namespace with a single global .
XGFS Based on Flexible SDS framework , Support NFS、SMB、FTP、POSIX、HDFS、Kubernetes CSI( Container storage interface ) And other rich agreements , It can not only be used for enterprise file sharing , Backup and archive common scenarios , It can also be applied to video surveillance 、 Media management 、 High performance computing, etc 、 Large bandwidth 、 Large capacity scenarios , It also supports the latest big data and container scenarios .

XGFS Enterprise level distributed storage system architecture diagram
XGFS Innovative use of the latest multi-core CPU、 High capacity and performance NVMe SSD, It only needs 3 All flash metadata high availability nodes ( Data nodes can be shared ), It can be stored and processed efficiently 100 Data on the scale of hundreds of millions of documents , At the same time, it provides the processing capacity of millions of metadata read and write requests per second , High cost performance .
and XGFS Data nodes of , Make full use of XSKY Reliable independent distributed storage cluster that has been tested by the market for a long time , Mature and stable , It can be easily extended to thousands of nodes .

XGFS Enterprise distributed storage system user interface
XGFS The metadata service architecture has the following advantages :
· Based on the latest generation NVMe/SCM Storage media design , Give full play to nearly one million new media IOPS sum GB Performance advantages of bandwidth , Easily meet the high-frequency metadata access requirements for the file system ;
· Take advantage of high performance LSM Storage engine , combination XSKY Key value design of unique patented technology , Build a completely autonomous metadata service ; Compatible POSIX File semantics and S3 Object semantics , It also supports users / User group 、 jurisdiction /ACL、 Extended attributes, etc ;
· Log protection of metadata in this node and strong consistency replication between nodes , This makes it easy for the metadata cluster to deal with slow disks 、 Network anomalies 、 Node restart / Power failure and other fault scenarios , Provide RPO=0 Metadata access to ;
· Use XSKY Self developed high-speed network transmission module , The original is RoCE/RDMA High performance networks are tailored , Greatly reduce the transmission delay of metadata replication packets between nodes , This makes the whole metadata cluster have a higher IOPS performance .
XGFS The distributed file storage system is composed of metadata service cluster and mixed disk data service , This makes the product inherit XSKY Years of deep accumulation on distributed hybrid disks and large-scale storage, operation and maintenance capabilities :
Multi level cache technology 、 Supports replicas and EC Erasure code 、 Support extended cluster dual active 、 Hard disk and network sub-health processing, etc , Mature and stable , Rich features and simple operation and maintenance .
03 Product features
1、 Global namespace
· Single namespace : Provide a single global namespace for consistent high-performance files , Easy to use ;
· Rich protocol support : Support NFS, SMB, POSIX, FTP, HDFS,Kubernetes CSI Such agreement , Simplify business IT Architecture and unlock the business ;
· Emerging business scenarios support : Support HPC、 Emerging loads such as big data and containers .
2、 Flexible expansion
· Software definition , Customizable node attributes , And support the common use of various brands x86 Servers and domestic servers ;
· Flexible deployment , Can be obtained from 3 Nodes extended to 4096 Nodes , Meet different business needs ;
· On demand expansion , Performance and capacity increase as the number of nodes increases , Meet the performance and capacity requirements of growing businesses .
3、 Rich enterprise class functionality
· data redundancy : Support multiple copies and EC Different redundancy strategies , Provide server based 、 frame 、 Three levels of fault domain management in the data center . Support snapshot protection ;
· Support file gateway load balancing and HA Protect , Support AD Domain 、LDAP Domain docking , Local authentication and other authentication methods . Support quota management ;
· By embedding X3DS You can copy files and objects 、 transfer 、 Backup 、 Rich data management functions such as archiving , It also supports Alibaba cloud, baidu cloud and other public cloud platforms .
04 Typical application scenarios
XGFS It can be used as an enterprise distributed file system , Support rich large capacity unstructured data storage and analysis scenarios :
1、 File sharing 、 Enterprise office storage
Single global namespace , Easy to use . Support file sharing 、 Network disk 、FTP Wait for office scenes .
2、 Video surveillance 、 Streaming media 、CDN Storage
Horizontal scaling , Rolling upgrade , Data is permanently stored .
3、 big data 、HPC Back end storage
compatible HDFS, Efficient file metadata processing mechanism , Flexible coping AI/ML Data analysis requirements .
4、 Container shared storage
Support Kubernetes CSI Interface , Support multiple PODs Shared data .
5、 Centralized disaster recovery resource pool
utilize X3DS(XSKY Stereo data management system ) and , It can be used as a large capacity shared disaster recovery resource pool .
6、 Enterprise data lake base
Support Hadoop Deposit and settlement separation deployment , Rich interface protocols , It can be extended to thousands of nodes .
XSKY XGFS make the best of SDS advantage , Adapt to the latest NVMe SSD New technology , Support the latest HDFS and Kubernetes CSI agreement , High cost performance , No compromise between performance and capacity , It is an ideal base for the construction of enterprise data lake .
Excerpt from :XSKY Star Trinasolar releases a new next generation distributed file system XGFS__ Ifeng.com
边栏推荐
- Generate XML based on annotations and reflection
- Set up an online help center to easily help customers solve problems
- 4. ssh
- Win7系统下无法正常安装JLINK CDC UART驱动的问题解决
- Abbexa 1,3-二棕榈素 CLIA 试剂盒解决方案
- Domestic cosmetics, lost 618
- 美学心得(第二百三十七集) 罗国正
- [FAQ] summary of common problems and solutions during the use of rest API interface of sports health service
- c语言---10 初识结构体
- PCA principal component analysis tutorial (origin analysis & drawing, without R language)
猜你喜欢

Wireshark learning notes (I) common function cases and skills

【FAQ】运动健康服务REST API接口使用过程中常见问题和解决方法总结

Canvas大火燃烧h5动画js特效

云计算搭建全部内容总结,保证可以搭建一个完整的云计算服务器,包括节点安装、实例的分配和网络的配置等内容

ACL2022 | bert2BERT:参数复用的高效预训练方法,显著降低超大模型的训练成本

Abbexa 细菌基因组 DNA 试剂盒介绍

Leetcode 875. Coco, who likes bananas

True thesis of information system project manager in the first half of 2022

微信小程序仿陶票票课程设计

c语言---14 循环语句for
随机推荐
领导提拔你的原因,只有这点最真实,其他都是瞎扯!
ACL2022 | bert2BERT:参数复用的高效预训练方法,显著降低超大模型的训练成本
js手机端复制文本到剪切板代码
用脚本添加URP的RendererData
云计算搭建全部内容总结,保证可以搭建一个完整的云计算服务器,包括节点安装、实例的分配和网络的配置等内容
The latest good article | interpretable confrontation defense based on causal inference
C language -- 14 loop statement for
CodeCraft-22 and Codeforces Round #795 (Div. 2)
Developers changing the world - Yao Guang teenagers playing Tetris
How to locate the hot problem of the game
AOE网关键路径
Classic topics of leetcode tree (I)
yml文件配置参数定义字典和列表
一个WPF开发的打印对话框-PrintDialogX
Noise line h5js effect realized by canvas
Linear mobile chess
pwnable start
Can the "no password era" that apple is looking forward to really come true?
改变世界的开发者丨玩转“俄罗斯方块”的瑶光少年
c语言---11 分支语句if else