当前位置:网站首页>Research on next generation distributed file system

Research on next generation distributed file system

2022-06-10 18:24:00 bandaoyu

Next generation distributed file system

The shortcomings of the previous generation

Metadata exists in the back end :I/O The path is long 、 Synchronization and interlock mechanism

The previous generation of distributed file systems , Due to the hardware limitations at that time , To solve the capacity bottleneck of metadata , Some products ( Such as CephFS) Save the metadata in the backend RADOS In cluster ,I/O The path is long , And because of the complex synchronization and interlock mechanism , High performance loss , Cost performance is not ideal ;

Some products ( Such as HDFS) Use memory to store all metadata , Although the metadata performance is good , However, due to the limited memory capacity , The number of files supported by the system is relatively small , Limited expansion capacity .

Next generation improvements

New storage media :SSD (NVMe agreement (Non Volatile Memory Express)

Storage engine : High performance LSM Storage engine

New network module : High speed network transmission module , The original is RoCE/RDMA High performance networks are tailored

NVMe The emergence of the agreement , Greatly reduce the cost of interface protocol ,SCM( Storage level memory ) Appearance , Greatly improve the performance of media , Plus the drop in particle costs , bring 5TB More than capacity NVMe SSD More common .

Take advantage of the latest multicore CPU、 High capacity and performance NVMe SSD, It only needs 3 All flash metadata high availability nodes ( Data nodes can be shared ), It can be stored and processed efficiently 100 Data on the scale of hundreds of millions of documents , At the same time, it provides the processing capacity of millions of metadata read and write requests per second , High cost performance .

· Based on the latest generation NVMe/SCM Storage media design , Give full play to nearly one million new media IOPS sum GB Performance advantages of bandwidth , Easily meet the high-frequency metadata access requirements for the file system ;

· Take advantage of high performance LSM Storage engine , combination XSKY Key value design of unique patented technology , Build a completely autonomous metadata service ; Compatible POSIX File semantics and S3 Object semantics , It also supports users / User group 、 jurisdiction /ACL、 Extended attributes, etc ;

· Log protection of metadata in this node and strong consistency replication between nodes , This makes it easy for the metadata cluster to deal with slow disks 、 Network anomalies 、 Node restart / Power failure and other fault scenarios , Provide RPO=0 Metadata access to ;

· Use XSKY Self developed high-speed network transmission module , The original is RoCE/RDMA High performance networks are tailored , Greatly reduce the transmission delay of metadata replication packets between nodes , This makes the whole metadata cluster have a higher IOPS performance .

XGFS The distributed file storage system is composed of metadata service cluster and mixed disk data service , This makes the product inherit XSKY Years of deep accumulation on distributed hybrid disks and large-scale storage, operation and maintenance capabilities :

Multi level cache technology 、 Supports replicas and EC Erasure code 、 Support extended cluster dual active 、 Hard disk and network sub-health processing, etc , Mature and stable , Rich features and simple operation and maintenance .

XSKY Star Trinasolar releases a new next generation distributed file system XGFS

There will be global 80% Our data is unstructured . File protocol is the most common way to access unstructured data , according to IDC Statistics ,2019 year , The software defined storage market in China is about 60% It's file storage .

Traditional file system

Traditional file systems have many limitations :

1、 Metadata and data are stored locally , Can't scale horizontally , No node level high availability ;

2、 Limited by the storage space and performance of metadata , The actual number of files that can be saved is limited , Generally less than 1 Billion , The storage space is TB Level ;

3、 Non uniform namespace , Multiple mount directories cannot be interconnected , Complex use ;

4、 The file storage gateway is not extensible , Unable to increase bandwidth , Create access bottlenecks ;

5、 New businesses such as big data and containers are not supported .

Distributed file storage

Distributed file storage , The most complex is the storage and processing of metadata . According to the statistics , Most of AI/ML Analytical applications ,90% Of I/O All are requests for metadata operations .

The previous generation of distributed file systems , Due to the hardware limitations at that time , To solve the capacity bottleneck of metadata , Some products ( Such as CephFS) Save the metadata in the backend RADOS In cluster ,I/O The path is long , And because of the complex synchronization and interlock mechanism , High performance loss , Cost performance is not ideal ;

Some products ( Such as HDFS) Use memory to store all metadata , Although the metadata performance is good , However, due to the limited memory capacity , The number of files supported by the system is relatively small , Limited expansion capacity .

Is there an architecture , Be able to... At a lower cost , Minimalist architecture , Meet the performance and capacity requirements of modern file system metadata processing ?

Now? , Large capacity and high speed SSD The popularity of , Make it a reality to have both fish and bear's paws .NVMe The emergence of the agreement , Greatly reduce the cost of interface protocol ,SCM( Storage level memory ) Appearance , Greatly improve the performance of media , Plus the drop in particle costs , bring 5TB More than capacity NVMe SSD More common .

these SSD The development of new technology , add CPU The number of cores is increasing , The full flash metadata node can fully meet the needs of large-scale file systems , such as , It only needs 5TB Of NVMe SSD Metadata space of , You can easily save and process tens of billions of files .

02XGFS Redefining the next generation of distributed file systems

XGFS(XSKY Global File System) yes XSKY A new generation of distributed file storage system , A namespace with a single global .

XGFS Based on Flexible SDS framework , Support NFS、SMB、FTP、POSIX、HDFS、Kubernetes CSI( Container storage interface ) And other rich agreements , It can not only be used for enterprise file sharing , Backup and archive common scenarios , It can also be applied to video surveillance 、 Media management 、 High performance computing, etc 、 Large bandwidth 、 Large capacity scenarios , It also supports the latest big data and container scenarios .

XGFS Enterprise level distributed storage system architecture diagram

XGFS Innovative use of the latest multi-core CPU、 High capacity and performance NVMe SSD, It only needs 3 All flash metadata high availability nodes ( Data nodes can be shared ), It can be stored and processed efficiently 100 Data on the scale of hundreds of millions of documents , At the same time, it provides the processing capacity of millions of metadata read and write requests per second , High cost performance .

and XGFS Data nodes of , Make full use of XSKY Reliable independent distributed storage cluster that has been tested by the market for a long time , Mature and stable , It can be easily extended to thousands of nodes .

XGFS Enterprise distributed storage system user interface

XGFS The metadata service architecture has the following advantages :

· Based on the latest generation NVMe/SCM Storage media design , Give full play to nearly one million new media IOPS sum GB Performance advantages of bandwidth , Easily meet the high-frequency metadata access requirements for the file system ;

· Take advantage of high performance LSM Storage engine , combination XSKY Key value design of unique patented technology , Build a completely autonomous metadata service ; Compatible POSIX File semantics and S3 Object semantics , It also supports users / User group 、 jurisdiction /ACL、 Extended attributes, etc ;

· Log protection of metadata in this node and strong consistency replication between nodes , This makes it easy for the metadata cluster to deal with slow disks 、 Network anomalies 、 Node restart / Power failure and other fault scenarios , Provide RPO=0 Metadata access to ;

· Use XSKY Self developed high-speed network transmission module , The original is RoCE/RDMA High performance networks are tailored , Greatly reduce the transmission delay of metadata replication packets between nodes , This makes the whole metadata cluster have a higher IOPS performance .

XGFS The distributed file storage system is composed of metadata service cluster and mixed disk data service , This makes the product inherit XSKY Years of deep accumulation on distributed hybrid disks and large-scale storage, operation and maintenance capabilities :

Multi level cache technology 、 Supports replicas and EC Erasure code 、 Support extended cluster dual active 、 Hard disk and network sub-health processing, etc , Mature and stable , Rich features and simple operation and maintenance .

03 Product features

1、 Global namespace

· Single namespace : Provide a single global namespace for consistent high-performance files , Easy to use ;

· Rich protocol support : Support NFS, SMB, POSIX, FTP, HDFS,Kubernetes CSI Such agreement , Simplify business IT Architecture and unlock the business ;

· Emerging business scenarios support : Support HPC、 Emerging loads such as big data and containers .

2、 Flexible expansion

· Software definition , Customizable node attributes , And support the common use of various brands x86 Servers and domestic servers ;

· Flexible deployment , Can be obtained from 3 Nodes extended to 4096 Nodes , Meet different business needs ;

· On demand expansion , Performance and capacity increase as the number of nodes increases , Meet the performance and capacity requirements of growing businesses .

3、 Rich enterprise class functionality

· data redundancy : Support multiple copies and EC Different redundancy strategies , Provide server based 、 frame 、 Three levels of fault domain management in the data center . Support snapshot protection ;

· Support file gateway load balancing and HA Protect , Support AD Domain 、LDAP Domain docking , Local authentication and other authentication methods . Support quota management ;

· By embedding X3DS You can copy files and objects 、 transfer 、 Backup 、 Rich data management functions such as archiving , It also supports Alibaba cloud, baidu cloud and other public cloud platforms .

04 Typical application scenarios

XGFS It can be used as an enterprise distributed file system , Support rich large capacity unstructured data storage and analysis scenarios :

1、 File sharing 、 Enterprise office storage

Single global namespace , Easy to use . Support file sharing 、 Network disk 、FTP Wait for office scenes .

2、 Video surveillance 、 Streaming media 、CDN Storage

Horizontal scaling , Rolling upgrade , Data is permanently stored .

3、 big data 、HPC Back end storage

compatible HDFS, Efficient file metadata processing mechanism , Flexible coping AI/ML Data analysis requirements .

4、 Container shared storage

Support Kubernetes CSI Interface , Support multiple PODs Shared data .

5、 Centralized disaster recovery resource pool

utilize X3DS(XSKY Stereo data management system ) and , It can be used as a large capacity shared disaster recovery resource pool .

6、 Enterprise data lake base

Support Hadoop Deposit and settlement separation deployment , Rich interface protocols , It can be extended to thousands of nodes .

XSKY XGFS make the best of SDS advantage , Adapt to the latest NVMe SSD New technology , Support the latest HDFS and Kubernetes CSI agreement , High cost performance , No compromise between performance and capacity , It is an ideal base for the construction of enterprise data lake .

  Excerpt from :XSKY Star Trinasolar releases a new next generation distributed file system XGFS__ Ifeng.com

原网站

版权声明
本文为[bandaoyu]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/161/202206101735468810.html