当前位置:网站首页>Implementation of super large-scale warehouse clusters in large commercial banks
Implementation of super large-scale warehouse clusters in large commercial banks
2022-07-04 17:24:00 【51CTO】
This article is based on Teacher Chen Xiaoxin is 〖2021 Gdevops Global agile operations Summit - Guangzhou Railway Station 〗 The content of the live speech is organized .
Chen Xiaoxin
Jianxin Jinke DB Product owner
- have 8 year MPP Database work experience , CCB is developing a new generation MPP Architecture database Long Yun MPP DB Product owner , Responsible for CCB 4000 platform Greenplum Cluster planning 、 build 、 O & M and optimization .
Share summary
One 、 R & D background
Two 、 Application solutions
3、 ... and 、 Operation and maintenance solution
Hello everyone , I'm Chen Xiaoxin from CCB financial technology . It is a great honor to be here today to share our experience in the construction of super large-scale data warehouse clusters , We Jianxin Jinke introduced the technology of many cooperative companies , Jointly developed a product called Longyun MPP DB New generation cloud native data warehouse .
The data warehouse adopts metadata 、 Calculation 、 Storage three-tier separation architecture design , In the reserved MPP Under the premise of high-performance computing power of database , At the same time, it has high concurrency 、 High scalability 、 Dynamic resource scaling 、 Fault self-healing and other capabilities , It provides a foundation for the construction of super large-scale data clusters .
2020 year 3 month , The first application is launched on the data warehouse cluster . And then , Tieyuan 、 Public access 、 Journey management 、 Group consolidation 、 Bad assets and so on , Have been successfully launched . By the end of 2021 year 6 month , The scale of the data warehouse cluster has reached 16000 Servers , The amount of data exceeds 9PB, Run millions of jobs every day , function SQL Reach ten million level .
surface (1)
chart (2)
chart (3) It's our whole dragon MPP DB Monitoring screen of . You can see , Our current version is 3.9.8, Calculate the cluster size 79 set , And near 24 Hour run SQL Count 、 near 1 Run for hours SQL Count 、 The number of connections 、 Resource utilization 、 Various health conditions and other information .
chart (3)
From tradition MPP database , To Longyun MPP DB, Here we first make a simple performance comparison .
Take the post source integration application of CCB as an example , Pictured (4). At present, we use Longyun in our post source application MPP DB Computing resources for , And the previous tradition MPP The computing resources of are basically equal , But the amount of data carried has reached the traditional MPP(200TB) Of 5 times , That is to say 1000TB.
Tieyuan runs every day 7 Ten thousand assignments ,100 About ten thousand SQL. chart (4) The graph on the left shows the number of jobs completed in each time period , It's on it base Job comparison , It's on it stage Job comparison . You can see , At every point in time , Red represents the Dragon MPP DB Number of jobs completed , Basically, it is larger than the tradition represented by blue MPP Number of jobs completed . in other words , When the amount of data expands 5 In the case of times , Long Yun MPP DB The performance of can still meet the application requirements .
chart (4)
One 、 R & D background
CCB has been in the construction of several warehouses for more than 20 years , Great achievements have been made , But also encountered many problems . Tradition MPP Database products , There are several common problems :
- Insufficient concurrency and scalability , A large number of sub databases and sub tables cause serious data redundancy ;
- Data storage and calculation are not separated , This leads to serious database isolation ;
- upgrade 、 Capacity expansion 、 Fault recovery and other operations are complex and time-consuming , The operation and maintenance cost is high ;
- Non cloud native architecture , Dynamic resource scheduling is difficult , And it is difficult to integrate into the cloud construction of CCB .
To solve the above problems , Our dragon MPP DB emerge as the times require .
Long Yun MPP DB The logical architecture can be divided into two modules , One is the management module , One is the user module , Pictured (5). The management module is mainly responsible for the management of basic resources 、 Create cluster 、 Start stop 、 Expansion and contraction, monitoring and alarm services . User modules are divided into 3 layer , That is, the metadata layer 、 Computing tier and shared storage tier .
chart (5)
chart (6) It's our management console UI Interface . All resources are created 、 The destruction 、 Expansion and contraction capacity 、 upgrade 、 Fault self healing , And monitoring , This can be done on the console .
chart (6)
User module , chart (7) It's our metadata cluster , It is mainly used to provide metadata persistence storage, read and write 、 Business 、 Lock management and other services . Metadata cluster uses ETCD As service discovery and load balancing , Use FDB As a data storage layer . The stateless service layer in the middle is responsible for receiving and processing metadata requests from all computing clusters . Each layer of services can be expanded according to the load demand , To improve the service capacity .
chart (7)
Next is the computing layer , Pictured (8). In the computing layer , Each computing cluster is a database service of independent computing resources , Users can create computing clusters on demand 、 Delete 、 Expand and shrink capacity etc , Jobs can also be flexibly deployed among existing computing clusters . When the concurrency and expansion capacity of a set of computing clusters are insufficient , Users can realize the linear expansion of concurrency by creating new clusters .
chart (8)
Finally, the shared storage layer , Pictured (9). Shared storage uses object storage to persist user data , Data is written once , All computing clusters share . By using the massive file storage of object storage 、 High concurrency 、 High availability and persistence of data , Meet the application of massive data access 、 High job concurrency 、 Data security and other requirements .
chart (9)
Two 、 Application solutions
By using dragon MPP DB Such a service hierarchy , The architecture of data sharing , We optimize our application solutions . Pictured (10), The traditional MPP database , The application construction is vertical chimney , Each application needs to create one or more independent clusters . A large amount of data needs to be replicated between different clusters , Managing complex , And the waste of resources is serious . And the use of dragon MPP DB, The computing and concurrency requirements of applications can be met by creating computing clusters , Data replication is no longer required , At the same time, application jobs can be flexibly scheduled to different clusters in real time according to requirements , Greatly improve application flexibility and resource utilization .
Pictured (10)
3、 ... and 、 Operation and maintenance solution
In terms of operation and maintenance , Long Yun MPP DB It also provides a more efficient and convenient solution , Pictured (11). Because of the Dragon MPP DB All computing clusters are stateless , With the help of IaaS Rapid resource supply of services , We can quickly complete the creation or destruction of some nodes and even the whole cluster . It looks like , We can realize the dynamic expansion of the cluster 、 Shrinkage capacity 、 Upgrade and other operations . When a node failure occurs , It can also quickly isolate and recover failed nodes , Realize self-healing of faults , Greatly improve the operation and maintenance efficiency .
chart (11)
Over the past year , CCB Longyun MPP DB The server size of the cluster has increased 50 times , The amount of data has increased 45 times , There are already dozens of applications running on it . However, with the continuous increase of cluster size and application load , It turns out that all kinds of trivial problems have also begun to be solved by infinite methods , Cause a serious chain reaction :
- Ten billion levels of metadata every day RPC How to respond stably to requests ;
- How to efficiently meet the massive data access requirements of object storage ;
- How to efficiently operate and maintain a super large-scale cluster ;
- How to guarantee the high availability demand at the bank level .
To address these issues , We have carried out research and development in the following aspects .
Metadata service capability improvement , According to the service type and load , We split and distributed the metadata service , From the original day can handle a billion levels RPC request , Upgrade to a level that can handle 10 billion RPC request , While improving the service ability , It also improves high availability , Pictured (12):
chart (12)
Storage service capability improvement , On the one hand, we merge through small files 、 Data prefetching 、 Unified cache layer establishment and other methods , Greatly reduce the pressure on storage ; On the other hand , Store each... For the object bucket The number of objects that can be stored and IO The problem of limited capacity , We create separate for each application tablespace, Every tablespace According to the demand, there are several bucket. This way bucket Split , Realize the shared storage IO Isolation and flow control , And avoid single bucket Problems of insufficient ability and inclination .
chart (13)
In terms of automatic monitoring and operation and maintenance , As mentioned earlier , Long Yun MPP DB It has the function of fault self-healing . meanwhile , By collecting jobs in real time 、SQL、 Storage 、 Server and other operation data , And aggregate and analyze these data , Such as whether the load meets historical expectations 、 Completion of key operations, etc , We can further judge whether the database performance is normal 、 Whether the load is inclined 、 Whether the resources are sufficient , And provide support for dynamic scheduling of resources and fault analysis and location , Pictured (14)、 chart (15).
chart (14)
chart (15)
Finally, high availability guarantee . Everybody knows , The system used by the bank , The requirements for high availability are very high . Based on the original distributed architecture and the high availability guarantee of fault self-healing , In order to cope with the overall failure at the cluster level 、AZ Level service failure 、 Data loss / Delete by mistake , We also offer cross AZ Deploy 、 Continuous metadata backup 、 Double active deployment and other schemes , It further improves the level of Longyun MPP High availability service capability of , Pictured (16).
chart (16)
Over the past few years , We have completed countless version iterations and online optimization . The mature development of a database product , Need products 、 framework 、 Research and development 、 Operation and maintenance 、 The long-term cooperation and investment of many people, such as application . In the Dragon MPP DB On , We :
- It has gathered a large number of excellent R & D personnel from Jianxin Jinke and the industry ;
- Provides the most complex 、 Richest 、 The application scenario with the highest load ;
- CCB has more than 20 years of experience in data warehouse construction and operation , It can find product pain points fastest , Put forward the product design that best meets the needs of users .
边栏推荐
- Visual studio 2019 (localdb) mssqllocaldb SQL Server 2014 database version is 852 and cannot be opened. This server supports 782
- NFT流动性市场安全问题频发—NFT交易平台Quixotic被黑事件分析
- It's too convenient. You can complete the code release and approval by nailing it!
- 上网成瘾改变大脑结构:语言功能受影响,让人话都说不利索
- Spark 中的 Rebalance 操作以及与Repartition操作的区别
- Is it safe to open an account online
- 世界环境日 | 周大福用心服务推动减碳环保
- Why do you say that the maximum single table of MySQL database is 20million? Based on what?
- Start by counting
- 电子宠物小狗-内部结构是什么?
猜你喜欢
【云原生】服务网格是什么“格”?
Load test practice of pingcode performance test
NFT liquidity market security issues occur frequently - Analysis of the black incident of NFT trading platform quixotic
上网成瘾改变大脑结构:语言功能受影响,让人话都说不利索
一加10 Pro和iPhone 13怎么选?
Kunming Third Ring Road Closure project will pass through these places. Is there one near your home?
离线、开源版的 Notion—— 笔记软件Anytype 综合评测
智慧物流园区供应链管理系统解决方案:数智化供应链赋能物流运输行业供应链新模式
Years of training, towards Kata 3.0! Enter the safe container experience out of the box | dragon lizard Technology
7 RSA密码体制
随机推荐
【测试开发】软件测试——基础篇
La 18e Conférence internationale de l'IET sur le transport d'électricité en courant alternatif et en courant continu (acdc2022) s'est tenue avec succès en ligne.
上网成瘾改变大脑结构:语言功能受影响,让人话都说不利索
Rebalance operation in spark and its difference from repartition operation
安信证券属于什么档次 开户安全吗
图像检索(image retrieval)
将Opencv绘制图片显示在MFC Picture Control控件上
公司要上监控,Zabbix 和 Prometheus 怎么选?这么选准没错!
被PMP考试“折磨”出来的考试心得,值得你一览
GO开发:如何利用Go单例模式保障流媒体高并发的安全性?
Solution of commercial supply chain coordination system in the mineral industry: build a digital intelligent supply chain platform to ensure the safe supply of mineral resources
The Ministry of human resources and Social Security announced the new construction occupation
go-micro教程 — 第二章 go-micro v3 使用Gin、Etcd
Load test practice of pingcode performance test
ECCV 2022 released: 1629 papers were selected, and the employment rate was less than 20%
tp配置多数据库
Pytorch deep learning quick start tutorial
Is it safe for Anxin securities to open an account online? Is the account opening fee charged
World Environment Day | Chow Tai Fook serves wholeheartedly to promote carbon reduction and environmental protection
利用win10计划任务程序定时自动运行jar包