当前位置:网站首页>Implementation of super large-scale warehouse clusters in large commercial banks
Implementation of super large-scale warehouse clusters in large commercial banks
2022-07-04 17:24:00 【51CTO】
This article is based on Teacher Chen Xiaoxin is 〖2021 Gdevops Global agile operations Summit - Guangzhou Railway Station 〗 The content of the live speech is organized .
Chen Xiaoxin
Jianxin Jinke DB Product owner
- have 8 year MPP Database work experience , CCB is developing a new generation MPP Architecture database Long Yun MPP DB Product owner , Responsible for CCB 4000 platform Greenplum Cluster planning 、 build 、 O & M and optimization .
Share summary
One 、 R & D background
Two 、 Application solutions
3、 ... and 、 Operation and maintenance solution
Hello everyone , I'm Chen Xiaoxin from CCB financial technology . It is a great honor to be here today to share our experience in the construction of super large-scale data warehouse clusters , We Jianxin Jinke introduced the technology of many cooperative companies , Jointly developed a product called Longyun MPP DB New generation cloud native data warehouse .
The data warehouse adopts metadata 、 Calculation 、 Storage three-tier separation architecture design , In the reserved MPP Under the premise of high-performance computing power of database , At the same time, it has high concurrency 、 High scalability 、 Dynamic resource scaling 、 Fault self-healing and other capabilities , It provides a foundation for the construction of super large-scale data clusters .
2020 year 3 month , The first application is launched on the data warehouse cluster . And then , Tieyuan 、 Public access 、 Journey management 、 Group consolidation 、 Bad assets and so on , Have been successfully launched . By the end of 2021 year 6 month , The scale of the data warehouse cluster has reached 16000 Servers , The amount of data exceeds 9PB, Run millions of jobs every day , function SQL Reach ten million level .
surface (1)
chart (2)
chart (3) It's our whole dragon MPP DB Monitoring screen of . You can see , Our current version is 3.9.8, Calculate the cluster size 79 set , And near 24 Hour run SQL Count 、 near 1 Run for hours SQL Count 、 The number of connections 、 Resource utilization 、 Various health conditions and other information .
chart (3)
From tradition MPP database , To Longyun MPP DB, Here we first make a simple performance comparison .
Take the post source integration application of CCB as an example , Pictured (4). At present, we use Longyun in our post source application MPP DB Computing resources for , And the previous tradition MPP The computing resources of are basically equal , But the amount of data carried has reached the traditional MPP(200TB) Of 5 times , That is to say 1000TB.
Tieyuan runs every day 7 Ten thousand assignments ,100 About ten thousand SQL. chart (4) The graph on the left shows the number of jobs completed in each time period , It's on it base Job comparison , It's on it stage Job comparison . You can see , At every point in time , Red represents the Dragon MPP DB Number of jobs completed , Basically, it is larger than the tradition represented by blue MPP Number of jobs completed . in other words , When the amount of data expands 5 In the case of times , Long Yun MPP DB The performance of can still meet the application requirements .
chart (4)
One 、 R & D background
CCB has been in the construction of several warehouses for more than 20 years , Great achievements have been made , But also encountered many problems . Tradition MPP Database products , There are several common problems :
- Insufficient concurrency and scalability , A large number of sub databases and sub tables cause serious data redundancy ;
- Data storage and calculation are not separated , This leads to serious database isolation ;
- upgrade 、 Capacity expansion 、 Fault recovery and other operations are complex and time-consuming , The operation and maintenance cost is high ;
- Non cloud native architecture , Dynamic resource scheduling is difficult , And it is difficult to integrate into the cloud construction of CCB .
To solve the above problems , Our dragon MPP DB emerge as the times require .
Long Yun MPP DB The logical architecture can be divided into two modules , One is the management module , One is the user module , Pictured (5). The management module is mainly responsible for the management of basic resources 、 Create cluster 、 Start stop 、 Expansion and contraction, monitoring and alarm services . User modules are divided into 3 layer , That is, the metadata layer 、 Computing tier and shared storage tier .
chart (5)
chart (6) It's our management console UI Interface . All resources are created 、 The destruction 、 Expansion and contraction capacity 、 upgrade 、 Fault self healing , And monitoring , This can be done on the console .
chart (6)
User module , chart (7) It's our metadata cluster , It is mainly used to provide metadata persistence storage, read and write 、 Business 、 Lock management and other services . Metadata cluster uses ETCD As service discovery and load balancing , Use FDB As a data storage layer . The stateless service layer in the middle is responsible for receiving and processing metadata requests from all computing clusters . Each layer of services can be expanded according to the load demand , To improve the service capacity .
chart (7)
Next is the computing layer , Pictured (8). In the computing layer , Each computing cluster is a database service of independent computing resources , Users can create computing clusters on demand 、 Delete 、 Expand and shrink capacity etc , Jobs can also be flexibly deployed among existing computing clusters . When the concurrency and expansion capacity of a set of computing clusters are insufficient , Users can realize the linear expansion of concurrency by creating new clusters .
chart (8)
Finally, the shared storage layer , Pictured (9). Shared storage uses object storage to persist user data , Data is written once , All computing clusters share . By using the massive file storage of object storage 、 High concurrency 、 High availability and persistence of data , Meet the application of massive data access 、 High job concurrency 、 Data security and other requirements .
chart (9)
Two 、 Application solutions
By using dragon MPP DB Such a service hierarchy , The architecture of data sharing , We optimize our application solutions . Pictured (10), The traditional MPP database , The application construction is vertical chimney , Each application needs to create one or more independent clusters . A large amount of data needs to be replicated between different clusters , Managing complex , And the waste of resources is serious . And the use of dragon MPP DB, The computing and concurrency requirements of applications can be met by creating computing clusters , Data replication is no longer required , At the same time, application jobs can be flexibly scheduled to different clusters in real time according to requirements , Greatly improve application flexibility and resource utilization .
Pictured (10)
3、 ... and 、 Operation and maintenance solution
In terms of operation and maintenance , Long Yun MPP DB It also provides a more efficient and convenient solution , Pictured (11). Because of the Dragon MPP DB All computing clusters are stateless , With the help of IaaS Rapid resource supply of services , We can quickly complete the creation or destruction of some nodes and even the whole cluster . It looks like , We can realize the dynamic expansion of the cluster 、 Shrinkage capacity 、 Upgrade and other operations . When a node failure occurs , It can also quickly isolate and recover failed nodes , Realize self-healing of faults , Greatly improve the operation and maintenance efficiency .
chart (11)
Over the past year , CCB Longyun MPP DB The server size of the cluster has increased 50 times , The amount of data has increased 45 times , There are already dozens of applications running on it . However, with the continuous increase of cluster size and application load , It turns out that all kinds of trivial problems have also begun to be solved by infinite methods , Cause a serious chain reaction :
- Ten billion levels of metadata every day RPC How to respond stably to requests ;
- How to efficiently meet the massive data access requirements of object storage ;
- How to efficiently operate and maintain a super large-scale cluster ;
- How to guarantee the high availability demand at the bank level .
To address these issues , We have carried out research and development in the following aspects .
Metadata service capability improvement , According to the service type and load , We split and distributed the metadata service , From the original day can handle a billion levels RPC request , Upgrade to a level that can handle 10 billion RPC request , While improving the service ability , It also improves high availability , Pictured (12):
chart (12)
Storage service capability improvement , On the one hand, we merge through small files 、 Data prefetching 、 Unified cache layer establishment and other methods , Greatly reduce the pressure on storage ; On the other hand , Store each... For the object bucket The number of objects that can be stored and IO The problem of limited capacity , We create separate for each application tablespace, Every tablespace According to the demand, there are several bucket. This way bucket Split , Realize the shared storage IO Isolation and flow control , And avoid single bucket Problems of insufficient ability and inclination .
chart (13)
In terms of automatic monitoring and operation and maintenance , As mentioned earlier , Long Yun MPP DB It has the function of fault self-healing . meanwhile , By collecting jobs in real time 、SQL、 Storage 、 Server and other operation data , And aggregate and analyze these data , Such as whether the load meets historical expectations 、 Completion of key operations, etc , We can further judge whether the database performance is normal 、 Whether the load is inclined 、 Whether the resources are sufficient , And provide support for dynamic scheduling of resources and fault analysis and location , Pictured (14)、 chart (15).
chart (14)
chart (15)
Finally, high availability guarantee . Everybody knows , The system used by the bank , The requirements for high availability are very high . Based on the original distributed architecture and the high availability guarantee of fault self-healing , In order to cope with the overall failure at the cluster level 、AZ Level service failure 、 Data loss / Delete by mistake , We also offer cross AZ Deploy 、 Continuous metadata backup 、 Double active deployment and other schemes , It further improves the level of Longyun MPP High availability service capability of , Pictured (16).
chart (16)
Over the past few years , We have completed countless version iterations and online optimization . The mature development of a database product , Need products 、 framework 、 Research and development 、 Operation and maintenance 、 The long-term cooperation and investment of many people, such as application . In the Dragon MPP DB On , We :
- It has gathered a large number of excellent R & D personnel from Jianxin Jinke and the industry ;
- Provides the most complex 、 Richest 、 The application scenario with the highest load ;
- CCB has more than 20 years of experience in data warehouse construction and operation , It can find product pain points fastest , Put forward the product design that best meets the needs of users .
边栏推荐
- 开发者,MySQL专栏完更,助你轻松从安装到入门进阶
- Why do you say that the maximum single table of MySQL database is 20million? Based on what?
- 散列表
- leetcode刷题目录总结
- Embedded software architecture design - function call
- PyTorch深度学习快速入门教程
- 被PMP考试“折磨”出来的考试心得,值得你一览
- Task state rollback and data blocking tasks based on check point mechanism
- 电子宠物小狗-内部结构是什么?
- 【Go ~ 0到1 】 第六天 文件的读写与创建
猜你喜欢
新的职业已经出现,怎么能够停滞不前 ,人社部公布建筑新职业
Readis configuration and optimization of NoSQL (final chapter)
【Go ~ 0到1 】 第六天 文件的读写与创建
C# 服务器日志模块
2022PMP考试基本情况详情了解
Yanwen logistics plans to be listed on Shenzhen Stock Exchange: it is mainly engaged in international express business, and its gross profit margin is far lower than the industry level
矿产行业商业供应链协同系统解决方案:构建数智化供应链平台,保障矿产资源安全供应
电子元器件B2B商城系统开发:赋能企业构建进销存标准化流程实例
智慧物流园区供应链管理系统解决方案:数智化供应链赋能物流运输行业供应链新模式
kaili不能输入中文怎么办???
随机推荐
kaili不能输入中文怎么办???
整理混乱的头文件,我用include what you use
Ble HCI flow control mechanism
安信证券网上开户安全吗 开户收费吗
La 18e Conférence internationale de l'IET sur le transport d'électricité en courant alternatif et en courant continu (acdc2022) s'est tenue avec succès en ligne.
【云原生】服务网格是什么“格”?
What grade does Anxin securities belong to? Is it safe to open an account
线性时间排序
周大福践行「百周年承诺」,真诚服务推动绿色环保
C# 服务器日志模块
C# 更加优质的操作MongoDB数据库
Redis 的内存淘汰策略和过期删除策略的区别
如何实现一个延时队列 ?
Analysis of abnormal frequency of minor GC in container environment
VMware Tools和open-vm-tools的安装与使用:解决虚拟机不全屏和无法传输文件的问题
太方便了,钉钉上就可完成代码发布审批啦!
leetcode刷题目录总结
【Go ~ 0到1 】 第六天 文件的读写与创建
Cypher task design and task locking mechanism of isomorphic and heterogeneous graphs
Configuration instance of Oracle listener server and client