当前位置:网站首页>Data Governance: The Evolution of Data Integration and Application Patterns

Data Governance: The Evolution of Data Integration and Application Patterns

2022-08-02 20:33:00 software testnet

在企业数据集成领域,已经有了很多成熟的框架可以利用,不同的使用场景下,Should choose different application mode.At present the main application mode of data integration have the following:The database schema、Based on the middleware model、Master data integration patterns、数据仓库模式、Lake data model,These patterns of technology focus on different,But in application are solve the problem of data sharing and exchange use,In order to realize the digitization of data-driven business、The goal of data driven management.

01 The database schema

A federal database integration pattern provides a created for data consumers(应用)Perspective data integration view,Data logic seems to have a place,But the actual physical location may be in multiple data sources in.The database consists of semi-autonomous database system,相互之间分享数据,联盟各数据源之间相互提供访问接口,同时联盟数据库系统可以是集中数据库系统或分布式数据库系统及其他联邦式系统.

下图是 IBM A federal database architecture,An application can use any of the supported interfaces(包括 ODBC、JDBC 或 Web 服务客户机)With the server,For users don't need to know where the data is stored in,Don't need to know the data source to support what kind of SQL 语言,Also don't need to know the data source isOracle 9i 还是IBM的DB2.总之,A federal database foryoghurtTo use a system like.

图片

A federal database integration mode has the advantage of unity through“The view”To access the different data sources,Provides great convenience for users to access data,And the data's real-time.Higher requirements for timely production of application,The federal permit applications direct access to the data,Without the need for time-consuming longer data architecture adjustment.另外,If the data security requirements of higher enterprise,Don't allow to copy and backup data scenarios,Data federation is a good solution.The federal database defect is clear:Due to the access to data is through a“联邦”View to realize,A view is real-time,Therefore data conversion is a key,But it can't solve the problem of data quality and performance.With the increase of enterprise data quantity,Performance issues are all data integration problems,But due to the defect of design thinking,Data federation have improved a lot in this aspect, though,But can't compare with other data integration technology.Data quality control means of data rules of load、Data test execution,这也不是“联邦”View the priority factor to solve the problem of real data integration.This leads to the federal data model is not applicable for data quality requirements higher,Need a large number of data conversion、Processing and scene,例如:数据治理、数据仓库等.

02 Data integration middleware model

Based on data integration middleware data integration model is a method of data replication.数据集成中间件(data integration middleware)Refers to the support from different sources、Format and the nature of the data source for logical or physical organic integration,为分布、自治、Heterogeneous data sources to provide reliable conversion、Loading and uniform access service middleware.Middleware integration method is now more popular data integration methods,中间件模式通过统一的全局数据模型来访问异构的数据库、遗留系统、Web资源等.图片The main function of data integration middleware is based on the different sources、Conversion of data format and features with the packaging,Provide a unified high-level access services,Implement all kinds of heterogeneous data sources to share.At the beginning of data integration middleware,Mainly by the data integration of the central processor(中间件)And the adapter.Located in the heterogeneous database middleware system(数据层) 和应用程序(应用层)之间,Down to coordinate various source systems,Up to provide access to the application of integrated data unified data model and data access common interface.各数据源的应用仍然完成它们的任务,Middleware system is mainly for heterogeneous data sources to provide a high level data access service.随着技术的发展,ETLTools become the mainstream of data integration middleware,ETL即数据的抽取、转换、清洗、装载.关于ETLMiddleware and below we detail.

03 Master data application mode

Master data is Shared between different business application system data,例如:客户、供应商、产品、员工等,It is also the core of enterprise data.Master data integration model is essentially a data exchange model,Aimed at solving the core data consistency between heterogeneous systems、正确性、完整性和及时性.

图片

 

Master data integration is a single data view,Through the integration of multiple data sources,Form master data of a single view,保证单一视图的准确性、一致性以及完整性,To provide the data quality.The definition of a unified business entity,Simplified to improve the response speed of the business process and improve business.In the application of master data integration,Will use to a federal database、Data interface integration, and based on theESBThe data integration of middleware technology, etc.

04 Data warehouse application mode

A data warehouse is a support for decision support system of data collection,These data are facing the objective、集成性、And the time correlation characteristics of.Data warehouse is from enterprise data integration of data,不可以修改的、稳定的数据以只读格式保存,且不随时间改变.

图片

 

面向主题:即处于数据仓库中的数据是按照特定的主题组织而成的,这里的主题不是具体的而是一个抽象的概念,常指企业或个人在使用数据仓库着重关注的方面.主题是一个抽象的概念,是较高层次上企业信息系统中的数据综合、归类并进行分析利用的抽象.例如:财务主题、Human subject、Production of theme.

集成性:Integration refers to the information of data in data warehouse is not in the business system simple、随机抽取的,由于数据仓库间的独立性,因此需要消除源数据中的异值.通过对分散、独立、异构的数据库数据进行抽取、清理、转换和汇总便得到了数据仓库的数据,这样保证了数据仓库内的数据关于整个企业的一致性.

稳定性:The business system of data is always in a constant state of change,即数据为最新的状态.相对于业务系统的不断变化,数据仓库具有稳定性,是指数据在进入数据仓库后,数据一般用于查询,很少会对数据进行修改,常见的操作也只是进行定期的加载和刷新.数据仓库中的数据是在对原有分散的数据库数据抽取、On the basis of the data cleansing after processing system、汇总和整理得到的,必须消除源数据中的不一致性,以保证数据仓库内的信息是关于整个企业的一致的全局信息.

Data warehouse is a decision support system and data(知识)The basis of mining system,The degree of credibility and integrity of data in data warehouse will directly affect the work of the follow-up system.Way of integration of data warehouse is usually the use of data integration tools(ETL)Data source of data in full amount or incremental way,Regular extraction into a data warehouse.ETLThe process is the process of data integration,From different heterogeneous data sources to a unified data warehouse,During the data extraction、清洗、Conversion and loading form a serial or parallel process.

05 Data lake application mode

Data lake is on the warehouse concept developed a new generation of data integration、Management and application mode.数据湖的出现,最初就是为了补充数据仓库的缺陷和不足,为了解决数据仓库漫长的开发周期,高昂的开发成本,细节数据丢失、Information will not be able to solve、出现问题无法真正溯源等问题.但是随着大数据技术的发展,数据湖不断演变,汇集了各种技术,包括数据仓库、实时和高速数据流技术、机器学习、分布式存储和其他技术.Data gradually developed into a lake can store and handle all structured、半结构化、非结构化数据,And the data is big data processing、实时分析和机器学习等操作的统一数据管理平台,For enterprises to achieve real“数据驱动”提供完整解决方案.

图片

与数据仓库不同的是,数据仓库在处理数据之前要先进行数据梳理、定义数据结构、进行数据清洗才进行入库操作,而数据湖是不管“三七二十一”Even on the data source will raw data“一锅端过来”,这就为后续数据湖的机器学习、Data mining ability brought infinite may!在灵活性上数据湖具备天然优势.传统的数仓,因为模型范式的要求,业务不能随便的变迁,变迁涉及到底层数据的各种变化,这导致了传统数仓无法支持业务的变化.

对于数据湖来说,即使像互联网行业不断有新的应用,业务不断发生变化,数据模型也不断的变化,但数据依然可以非常容易的进入数据湖,对于数据的采集、清洗、规范化的处理,完全可以延迟到业务需求的时候再来处理.这跟早期的数仓思维就很不一样,数据湖相对于企业来说,灵活性比较强,能更快速的适应上层数据应用的变化.Since lake data need to have multiple data storage and processing power,Used in data integration technology is also varied,比如:基于ETLStructured data integration tools,Interface service integration,File data integration,Real time data integration and so on.

写在最后的话

Data integration is to eliminate enterprise information isolated island,Shared data set,To realize the important means of data management and data application.Data integration enterprise local data can be、In the cloud data from different“孤岛”的数据连接起来,Let the data is not isolated、相互作用,To dig up more value.The application of data integration can make enterprise、流程、系统、Key elements such as the organization and staff have synergy,Improve the efficiency of the enterprise business.Data integration can be different types of data gathering and integration,Let business users can quickly get useful information for analysis,From the perspective of global integrated analysis problem,Increase the accuracy of the results of the analysis.

原网站

版权声明
本文为[software testnet]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/214/202208021746215055.html