当前位置:网站首页>To solve the stubborn problem of Lake + warehouse hybrid architecture, xinghuan Technology launched an independent and controllable cloud native Lake warehouse integrated platform

To solve the stubborn problem of Lake + warehouse hybrid architecture, xinghuan Technology launched an independent and controllable cloud native Lake warehouse integrated platform

2022-07-05 17:58:00 Star Ring Technology

In recent years , With the continuous promotion of enterprise digital transformation , In the analysis and utilization of data , Its breadth and depth are constantly extending outward . In terms of breadth of analysis , It is mainly reflected in the more diversified types of analyzed data and analysis scenarios 、 diversified ; In terms of analysis depth , It is mainly reflected in paying more attention to the fusion analysis of multi-source heterogeneous data and the deep mining of data value based on data science and technology .

meanwhile , In order to meet the diverse needs of data analysis , The enterprise data platform architecture is also evolving . Single data lake and data warehouse can no longer conform to the development trend of data analysis , More and more enterprises begin to base on “ lake (Hadoop Technical system )”+“ warehouse (MPP Technical system )” Build your own enterprise data platform based on the hybrid architecture of . This hybrid architecture incorporates “ lake ” and “ warehouse ” Respective technical advantages , It can support the diversified data analysis scenarios of enterprises to a certain extent , But in the ease of use of the data platform 、 Maintainability 、 There are some deficiencies in data processing efficiency and storage cost .

Xuliuming, the person in charge of the system architecture of the government and public utilities Department of star ring technology, said ,“ lake (Hadoop Technical system )”+“ warehouse (MPP Technical system )” The hybrid architecture of is a product of the compromise between technology and business in the evolution of data platform architecture .Hadoop At the beginning of the design, the technical system is mainly to solve the problem of off-line batch processing of massive data , In high concurrency data marts 、 Ad hoc inquiry 、 There are inherent deficiencies in transaction consistency ; and MPP The technology system evolved from the relational database , For transaction consistency 、OLAP Analysis performance is well supported , However, there are some limitations in analyzing scenarios , It mainly focuses on structured data analysis , Unable to support half / Unstructured data storage 、 Real time computing 、 Machine learning and so on . meanwhile , A few years ago , There is no mature technical system in the industry that can meet the requirements of “ lake ”+“ warehouse ” All the scenes of , That's why “Hadoop+MPP” The hybrid architecture of .

However , With the rise of multi model database technology ,“ lake ”“ warehouse ” The technical barriers between them are expected to be broken , The concept of Lake warehouse integration also came into being . The so-called Lake warehouse integration , It is a new open data platform architecture integrating data lake and data warehouse , Fully combine the advantages of data lake and data warehouse , It is built on the low-cost data storage architecture of the data lake , It also inherits the data processing of data warehouse 、 Analysis and management functions .

From a technical point of view ,“ The lake and the warehouse are integrated ” The architecture is based on multi model data platform technology , Break the tradition Hadoop+MPP Mixed deployment mode , Realize the unification of Lake warehouse technical architecture . future , Lake warehouse integration as a new generation of big data technology architecture , It will gradually replace the single data lake and data warehouse architecture .

Demand driven , The era of Lake warehouse integration is coming

Any technology update iteration is driven by requirements , The construction of data platform is no exception . In recent years , The evolution trend of data analysis requirements is reflected in four aspects :

First of all , Diversified data types . From the original structured data , Change to structured 、 Unstructured 、 The coexistence of semi-structured and real-time message data .

second , Analyze the diversity of scenarios . From the original statistical analysis , Turn to statistical analysis 、 Label analysis 、 Full text search 、 Predictive analysis 、 Even reasoning and analysis based on graph data coexist .

Third , Real time analysis . Mainly offline analysis , Turn to real-time analysis 、 Interactive analysis 、 Self help analysis, etc .

Fourth , Unified data management and control . From the original weak control mode to strong control , Reflected in the unified data standards 、 Unified data storage 、 Unified data governance and unified data view .

Under the trend of demand evolution , The enterprise data platform architecture is also iterating , It mainly goes through four stages :

Database phase . In the last century 80 years , Data analysis is mainly based on business database , Do some simple analysis of single system .

Data warehouse stage . here we are 90 years , The concept of data warehouse began to rise , Each enterprise starts to build its own data warehouse platform , Extract the business system data into the data warehouse , Do some multidimensional 、 Relevant 、 Fusible BI analysis , To assist decision making .

Data lake stage . here we are 2010 Around the year , With the rise of big data technology , The concept of data Lake follows . The data lake not only supports the processing of structured data , It also supports semi-structured 、 Storage and query of unstructured data . meanwhile , In the data application scenario , And more diversified , Real time analysis has emerged 、 Full text search 、 Some new analysis scenarios such as machine learning .

At this stage , Our focus is to use different technology stacks to support different data analysis scenarios , Ease of use of data platform architecture 、 Maintainability doesn't pay much attention , The data platform architecture built by many enterprises is very complex , It has caused great trouble for the later platform iteration and operation and maintenance .

Lake warehouse integration stage . In the past two years , The concept of the integration of lake and warehouse rises , Enterprises begin to pay more attention to the data platform architecture , More emphasis on unified architecture , Rely on a one-stop multi model data platform to solve the data lake 、 Diversified data analysis scenarios of data warehouse .

Technology compromises business ,“ lake + warehouse ” Hybrid architectures face multiple challenges

Before the concept of Lake warehouse integration appeared , Its industrial inner lake + The hybrid architecture of silos has existed for many years , And some enterprises put this kind of Lake + The structure of the warehouse is also called Lake warehouse integration , In fact, Lake warehouse integration is not a simple data lake + Data warehouse .

lake + The warehouse hybrid architecture has several typical features :

Data lake and data warehouse are two relatively independent systems , Mixed deployment on a data platform . The data lake is based on Hadoop Technology to implement , It is mainly used to support multi-source heterogeneous data storage , Execute batch 、 Stream processing and other workloads . Data warehouse is mainly based on MPP Or some relational database , It mainly supports structured data in OLAP In the scene BI Analyze and query requirements . The lake and the warehouse are independent of each other , adopt ETL Realize the exchange of data .

This architecture can solve the data analysis needs of enterprises in multiple scenarios to a certain extent , But there are some obvious disadvantages .

First of all , The hybrid deployment architecture is complex , This leads to high cost of architecture design and project implementation and delivery , And it is difficult to operate and maintain the platform in the later stage .

second , Data redundancy is very obvious , Increase the cost of storage .Hadoop and MPP Are distributed systems , In order to ensure the high reliability of data, distributed systems , It is generally realized through redundant backup . Both technologies have already made redundant backup of data , The hybrid architecture can not avoid the existence of some data Hadoop platform , Also exist MPP platform , The ratio of data redundancy is further increased , Increase the cost of storage .

Third , The data processing link is too long , Affect the timeliness of queries . Usually, data processing needs to enter the lake first , After batch processing, it can be put into warehouse , Carry out theme modeling and analysis in the data warehouse , Finally, it provides query services for the upper layer , The whole processing link is relatively long , In addition, there is a lake in the middle, which needs to be warehoused once ETL, Affect the timeliness of queries .

Fourth , The consistency of data , Increase the cost of data verification . Whether from the lake to the warehouse or from the warehouse to the lake , In fact, under the hybrid architecture, the data migration is between two data platforms , During the migration process, data consistency problems will inevitably occur , Add additional data verification costs .

lake + The hybrid architecture of the warehouse is a product of the compromise between technology and business , It is not really a lake warehouse integrated platform . A few years ago , There is no mature technical system in the industry that can meet the requirements of “ lake ”+“ warehouse ” All the scenes of , That's why “Hadoop+MPP” The hybrid architecture of .

break “ lake ”“ warehouse ” technology barrier , Lake warehouse integration is the future evolution trend , Its characteristics have several aspects :

Multimode storage : The lake warehouse integrated platform has the ability of unified data storage and management , Support for structured 、 Semi structured 、 Unified storage of unstructured data , Support multiple data storage models at the same time ;

Unified architecture : The lake warehouse integration has 4 Layer unified architecture . A unified resource scheduling framework can be implemented at the resource management layer , Support computing and storage unit elastic scaling ; Unified data storage capability in the storage layer , Realize unified management of multi-source heterogeneous data ; Support unified computing engine at the computing layer , It can realize the fusion analysis of cross modal data ; Support unified data interface at interface layer , It can provide a unified and easy-to-use query interface for upper layer applications . The integrated structure of lake and warehouse is unified , It can avoid the development difficulties caused by mixed architecture 、 Operation and maintenance is difficult 、 High storage costs 、 Low data processing efficiency .

Performance is remarkable : The lake warehouse integrated platform has more excellent performance . Because under the unified framework , Both data lake and data warehouse adopt integrated design , Reduced data processing links , Increase the reusability of resources , Better timeliness .

Full empowerment : Through the lake warehouse integrated platform , Can satisfy at the same time “ lake ”“ warehouse ” Data analysis needs of , Support diversified business scenarios , It can be used for all kinds of enterprise business systems 、 Various analysis scenarios provide comprehensive empowerment .

“ The lake and the warehouse are integrated ” The architecture is based on multi model data platform technology , break Hadoop+MPP Mixed deployment mode , Realize the unification of Lake warehouse technical architecture , It belongs to the real Lake warehouse integrated platform .

Self control , Star Ring Technology “ The lake and the warehouse are integrated ” The solution

The lake warehouse integrated solution launched by Star Ring Technology , The overall structure is divided into five layers :

The first layer is the infrastructure layer , It can be compatible with the independent and controllable hardware ecology of Xinchuang , Compatible with mainstream X86 And ARM Architecture server , Support CentOS、Redhat、 kirin V10、 Tongxin UOS And other mainstream operating systems , Support at the same time ARM And X86 Architecture hybrid deployment .

The second layer is the unified resource management layer , Star ring technology has launched a containerized operating system based on cloud native technology TCOS, Provide a unified resource scheduling framework , Through containerized choreography , Can be unified scheduling calculation 、 Storage 、 Network and other basic resources .

The third layer is the unified storage management layer , Star ring technology has developed a unified distributed data management system TDDMS, Provide common storage management services for different storage engines , Ensure data consistency , Unified operation and maintenance data management . At present, the distributed data management system is connected to 9 A storage engine , Support 10 Storage of a data model . Users do not need to build separate storage systems for different models , But through unified storage management , Reduce the operation and maintenance management cost , It also avoids data islands . At the same time, the plug-in characteristics of distributed data management system , It also facilitates the flexible expansion of subsequent businesses , You can access other storage engines as needed .

The fourth layer is the unified computing engine layer , Based on distributed computing engine Transwarp Nucleon It can automatically match high-performance algorithms according to different storage engines , It can not only support batch processing 、 Different types of computing tasks such as stream processing , It also supports the fusion analysis of cross modal data , Convenient for users in a SQL Data using different models in , Reduce the difficulty of development , Improve development efficiency .

The fifth layer is the unified data operation layer , It mainly provides standard SQL Grammar support , It can realize a unified interface to deal with different businesses and different data models , Just simple SQL Statement can complete various composite cross model data queries , Different data models can be operated without accessing different interfaces . For scene switching 、 Interface caused by database switching 、 The problem of development language switching does not exist , Development and migration costs are greatly reduced .

Through these five layers , The lake warehouse integrated platform of star ring technology also provides the full life cycle data management and control capability , It can realize unified control of multimodal data and metadata , It also supports unified multi tenant management , It can ensure that the tenants on the lake warehouse integrated platform are from the resource layer 、 The data layer 、 The application layer can realize complete isolation .

Eight characteristics lead , Xinghuan technology Lake warehouse integrated platform enables users

What are the characteristics of the lake warehouse integrated platform of Starlink technology ?

Cloud native . Based on cloud native architecture , Provide a containerized base , It can automatically and flexibly expand or shrink the capacity according to the business load , Improve the overall utilization of resources . The components of the lake warehouse integrated platform are designed with micro service architecture , Divide according to the function modules , It has higher flexibility in horizontal expansion and version update .

Multimodal heterogeneous storage . Star ring technology provides a multi model data management platform , It can improve the query efficiency of scenarios , Multiple data models can be used to store the same data , Solve efficiency problems in different scenarios .

1 lake N Warehouse multi tenant system . For group enterprises , Can provide 1 Hujia N Multi tenant system of warehouse . Build a central tenant in the group headquarters , Build a group level data lake among central tenants , Sort out the unified data asset directory , Form a data asset view , At the same time, it faces the business analysis needs at the group level , Build a group level data warehouse . For the business units of the group 、 Subsidiary company , Or some data innovation teams , Self owned tenants can be established on demand , There is an independent resource environment within the tenant , There is an independent set of data development platform and tools , By sharing the data of the unified data Lake , Build a data warehouse and data mart for your own business and theme , Meet personalized data analysis needs .

Self control . Autonomous control is mainly reflected in two aspects , internal , Xinghuan technology has been insisting on technological innovation , Achieve full autonomy and control . foreign , Xinghuan technology is also actively making compatibility and adaptation with the upstream and downstream of Xinchuang , Embrace the whole Xinchuang ecology .

Overall speaking , The advantages of the star ring technology Lake warehouse integrated platform include multi-mode storage 、 Technological innovation 、 Batch flow collaboration 、 Unified SQL、 Stretch and stretch 、 Xinchuang is independent 、 Full stack tools 、 Reduce cost and increase efficiency 8 Big advantage .

at present , Star Ring Technology Lake warehouse integrated solution has been applied in Finance 、 The government 、 traffic 、 Post Office 、 Medical care 、 Energy and other industries as well as some large-scale state-owned enterprises , Typical customers include Sinochem Group 、 China Post Group 、 Guangzhou rural commercial bank, etc .

原网站

版权声明
本文为[Star Ring Technology]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207051747223684.html