当前位置:网站首页>Implementation of super large-scale warehouse clusters in large commercial banks

Implementation of super large-scale warehouse clusters in large commercial banks

2022-07-04 17:24:00 51CTO

This article is based on Teacher Chen Xiaoxin is 2021 Gdevops Global agile operations Summit - Guangzhou Railway Station 〗 The content of the live speech is organized .

 The landing practice of super large-scale warehouse clusters in large commercial banks _ data

Chen Xiaoxin

  Jianxin Jinke DB Product owner


  • have 8 year MPP Database work experience , CCB is developing a new generation MPP Architecture database Long Yun MPP DB Product owner , Responsible for CCB 4000 platform Greenplum Cluster planning 、 build 、 O & M and optimization .


Share summary

One 、 R & D background

Two 、 Application solutions

3、 ... and 、 Operation and maintenance solution


Hello everyone , I'm Chen Xiaoxin from CCB financial technology . It is a great honor to be here today to share our experience in the construction of super large-scale data warehouse clusters , We Jianxin Jinke introduced the technology of many cooperative companies , Jointly developed a product called Longyun MPP DB New generation cloud native data warehouse .


The data warehouse adopts metadata 、 Calculation 、 Storage three-tier separation architecture design , In the reserved MPP Under the premise of high-performance computing power of database , At the same time, it has high concurrency 、 High scalability 、 Dynamic resource scaling 、 Fault self-healing and other capabilities , It provides a foundation for the construction of super large-scale data clusters .


2020 year 3 month , The first application is launched on the data warehouse cluster . And then , Tieyuan 、 Public access 、 Journey management 、 Group consolidation 、 Bad assets and so on , Have been successfully launched . By the end of 2021 year 6 month , The scale of the data warehouse cluster has reached 16000 Servers , The amount of data exceeds 9PB, Run millions of jobs every day , function SQL Reach ten million level .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ database _02

surface (1)


 The landing practice of super large-scale warehouse clusters in large commercial banks _ Metadata _03

chart (2)


chart (3) It's our whole dragon MPP DB Monitoring screen of . You can see , Our current version is 3.9.8, Calculate the cluster size 79 set , And near 24 Hour run SQL Count 、 near 1 Run for hours SQL Count 、 The number of connections 、 Resource utilization 、 Various health conditions and other information .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _04

chart (3)


From tradition MPP database , To Longyun MPP DB, Here we first make a simple performance comparison .


Take the post source integration application of CCB as an example , Pictured (4). At present, we use Longyun in our post source application MPP DB Computing resources for , And the previous tradition MPP The computing resources of are basically equal , But the amount of data carried has reached the traditional MPP(200TB) Of 5 times , That is to say 1000TB.


Tieyuan runs every day 7 Ten thousand assignments ,100 About ten thousand SQL. chart (4) The graph on the left shows the number of jobs completed in each time period , It's on it base Job comparison , It's on it stage Job comparison . You can see , At every point in time , Red represents the Dragon MPP DB Number of jobs completed , Basically, it is larger than the tradition represented by blue MPP Number of jobs completed . in other words , When the amount of data expands 5 In the case of times , Long Yun MPP DB The performance of can still meet the application requirements .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _05

chart (4)


One 、 R & D background


CCB has been in the construction of several warehouses for more than 20 years , Great achievements have been made , But also encountered many problems . Tradition MPP Database products , There are several common problems :


  • Insufficient concurrency and scalability , A large number of sub databases and sub tables cause serious data redundancy ;
  • Data storage and calculation are not separated , This leads to serious database isolation ;
  • upgrade 、 Capacity expansion 、 Fault recovery and other operations are complex and time-consuming , The operation and maintenance cost is high ;
  • Non cloud native architecture , Dynamic resource scheduling is difficult , And it is difficult to integrate into the cloud construction of CCB .


To solve the above problems , Our dragon MPP DB emerge as the times require .


Long Yun MPP DB The logical architecture can be divided into two modules , One is the management module , One is the user module , Pictured (5). The management module is mainly responsible for the management of basic resources 、 Create cluster 、 Start stop 、 Expansion and contraction, monitoring and alarm services . User modules are divided into 3 layer , That is, the metadata layer 、 Computing tier and shared storage tier .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ Metadata _06

chart (5)


chart (6) It's our management console UI Interface . All resources are created 、 The destruction 、 Expansion and contraction capacity 、 upgrade 、 Fault self healing , And monitoring , This can be done on the console .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _07

chart (6)


User module , chart (7) It's our metadata cluster , It is mainly used to provide metadata persistence storage, read and write 、 Business 、 Lock management and other services . Metadata cluster uses ETCD As service discovery and load balancing , Use FDB As a data storage layer . The stateless service layer in the middle is responsible for receiving and processing metadata requests from all computing clusters . Each layer of services can be expanded according to the load demand , To improve the service capacity .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ database _08

chart (7)


Next is the computing layer , Pictured (8). In the computing layer , Each computing cluster is a database service of independent computing resources , Users can create computing clusters on demand 、 Delete 、 Expand and shrink capacity etc , Jobs can also be flexibly deployed among existing computing clusters . When the concurrency and expansion capacity of a set of computing clusters are insufficient , Users can realize the linear expansion of concurrency by creating new clusters .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ database _09

chart (8)


Finally, the shared storage layer , Pictured (9). Shared storage uses object storage to persist user data , Data is written once , All computing clusters share . By using the massive file storage of object storage 、 High concurrency 、 High availability and persistence of data , Meet the application of massive data access 、 High job concurrency 、 Data security and other requirements .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ database _10

chart (9)


Two 、 Application solutions


By using dragon MPP DB Such a service hierarchy , The architecture of data sharing , We optimize our application solutions . Pictured (10), The traditional MPP database , The application construction is vertical chimney , Each application needs to create one or more independent clusters . A large amount of data needs to be replicated between different clusters , Managing complex , And the waste of resources is serious . And the use of dragon MPP DB, The computing and concurrency requirements of applications can be met by creating computing clusters , Data replication is no longer required , At the same time, application jobs can be flexibly scheduled to different clusters in real time according to requirements , Greatly improve application flexibility and resource utilization .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _11

Pictured (10)


3、 ... and 、 Operation and maintenance solution


In terms of operation and maintenance , Long Yun MPP DB It also provides a more efficient and convenient solution , Pictured (11). Because of the Dragon MPP DB All computing clusters are stateless , With the help of IaaS Rapid resource supply of services , We can quickly complete the creation or destruction of some nodes and even the whole cluster . It looks like , We can realize the dynamic expansion of the cluster 、 Shrinkage capacity 、 Upgrade and other operations . When a node failure occurs , It can also quickly isolate and recover failed nodes , Realize self-healing of faults , Greatly improve the operation and maintenance efficiency .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _12

chart (11)


Over the past year , CCB Longyun MPP DB The server size of the cluster has increased 50 times , The amount of data has increased 45 times , There are already dozens of applications running on it . However, with the continuous increase of cluster size and application load , It turns out that all kinds of trivial problems have also begun to be solved by infinite methods , Cause a serious chain reaction :


  • Ten billion levels of metadata every day RPC How to respond stably to requests ;
  • How to efficiently meet the massive data access requirements of object storage ;
  • How to efficiently operate and maintain a super large-scale cluster ;
  • How to guarantee the high availability demand at the bank level .


To address these issues , We have carried out research and development in the following aspects .


Metadata service capability improvement , According to the service type and load , We split and distributed the metadata service , From the original day can handle a billion levels RPC request , Upgrade to a level that can handle 10 billion RPC request , While improving the service ability , It also improves high availability , Pictured (12):


 The landing practice of super large-scale warehouse clusters in large commercial banks _ Metadata _13

chart (12)


Storage service capability improvement , On the one hand, we merge through small files 、 Data prefetching 、 Unified cache layer establishment and other methods , Greatly reduce the pressure on storage ; On the other hand , Store each... For the object bucket The number of objects that can be stored and IO The problem of limited capacity , We create separate for each application tablespace, Every tablespace According to the demand, there are several bucket. This way bucket Split , Realize the shared storage IO Isolation and flow control , And avoid single bucket Problems of insufficient ability and inclination .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _14

chart (13)


In terms of automatic monitoring and operation and maintenance , As mentioned earlier , Long Yun MPP DB It has the function of fault self-healing . meanwhile , By collecting jobs in real time 、SQL、 Storage 、 Server and other operation data , And aggregate and analyze these data , Such as whether the load meets historical expectations 、 Completion of key operations, etc , We can further judge whether the database performance is normal 、 Whether the load is inclined 、 Whether the resources are sufficient , And provide support for dynamic scheduling of resources and fault analysis and location , Pictured (14)、 chart (15).


 The landing practice of super large-scale warehouse clusters in large commercial banks _ Metadata _15

chart (14)


 The landing practice of super large-scale warehouse clusters in large commercial banks _ Metadata _16

chart (15)


Finally, high availability guarantee . Everybody knows , The system used by the bank , The requirements for high availability are very high . Based on the original distributed architecture and the high availability guarantee of fault self-healing , In order to cope with the overall failure at the cluster level 、AZ Level service failure 、 Data loss / Delete by mistake , We also offer cross AZ Deploy 、 Continuous metadata backup 、 Double active deployment and other schemes , It further improves the level of Longyun MPP High availability service capability of , Pictured (16).


 The landing practice of super large-scale warehouse clusters in large commercial banks _ database _17

chart (16)


Over the past few years , We have completed countless version iterations and online optimization . The mature development of a database product , Need products 、 framework 、 Research and development 、 Operation and maintenance 、 The long-term cooperation and investment of many people, such as application . In the Dragon MPP DB On , We :


  • It has gathered a large number of excellent R & D personnel from Jianxin Jinke and the industry ;
  • Provides the most complex 、 Richest 、 The application scenario with the highest load ;
  • CCB has more than 20 years of experience in data warehouse construction and operation , It can find product pain points fastest , Put forward the product design that best meets the needs of users .


 The landing practice of super large-scale warehouse clusters in large commercial banks _ database _18


 The landing practice of super large-scale warehouse clusters in large commercial banks _ data _19

原网站

版权声明
本文为[51CTO]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/185/202207041533159384.html

随机推荐