当前位置:网站首页>Basic architecture of data Lake
Basic architecture of data Lake
2022-06-09 02:13:00 【InfoQ】



- “ The river ” The emphasis is on liquidity ,“ All rivers run into sea ”, After all, the river will flow into the sea , Enterprise level data needs long-term precipitation , So it's called “ lake ” It's called “ The river ” Be appropriate ; meanwhile , The lake water is naturally stratified , Meet different ecosystem requirements , This is related to the construction of a unified data center , The requirements for storing management data are consistent ,“ heat ” The data is on top , Easy to use at any time ; Temperature data 、 Cold data is located in different storage media in the data center , Achieve a balance between data storage capacity and cost .
- Don't cry “ The sea ” The reason is , The sea is boundless , and “ lake ” There are boundaries , This boundary is the enterprise / The business boundaries of the organization ; Therefore, the data Lake needs more data management and permission management capabilities .
- It's called “ lake ” Another important reason is that the data Lake needs fine management , A lack of control 、 Lack of governance data, the lake will eventually degenerate into “ Data swamp ”, Thus, the application cannot effectively access the data , Make the data stored in it worthless .
- For long-term as is storage
- Effective management and centralized governance
- Provide multi-mode computing power to meet processing requirements
- And business oriented , Provide a unified data view 、 Data model and data processing results
- Data access
- Data relocation
- Data governance
- Quality management
- Asset catalog
- Access control
- task management
- Task arrangement
- Metadata management, etc
- More powerful data access capability . The data access capability is reflected in the definition and management capability of various external heterogeneous data sources , And the ability to extract and migrate relevant data from external data sources , Extract the migrated data, including the metadata of external data sources and the actually stored data .
- More powerful data management capabilities . Management ability can be divided into basic management ability and extended management ability . Basic management capabilities include the management of various metadata 、 Data access control 、 Data asset management , It is necessary for a data lake system , We'll be back “ Data Lake solutions from various manufacturers ” In this section, I believe to discuss the support methods of various manufacturers for basic management capabilities . Expand management capabilities, including task management 、 Process layout and data quality 、 Capabilities related to data governance . Task management and process arrangement are mainly used to manage 、 layout 、 Dispatch 、 Monitor various tasks of processing data in the data lake system , Usually , The data Lake builder will buy / Develop customized data integration or data development subsystem / Modules to provide such capabilities , Customized system / The module can read the relevant metadata of the data lake , To realize the integration with the data lake system . Data quality and data governance are more complex issues , In general , The data lake system will not directly provide relevant functions , But it will open all kinds of interfaces or metadata , For competent enterprises / The organization integrates with existing data governance software or makes customized development .
- Sharable metadata . Various computing engines in the data lake will be deeply integrated with the data in the data lake , The integration is based on the metadata of the data lake . Good data lake system , When the computing engine processes data , The data storage location can be obtained directly from the metadata 、 data format 、 Data patterns 、 Data distribution and other information , Then directly process the data , Without manual / Programming intervention . Further more , A good data lake system can also access and control the data in the data lake , The strength of control can do “ Library table column row ” And so on .


边栏推荐
- 2022年信息安全工程师考试知识点:网络安全产品的配置与使用
- What is operator footwall?
- Requires SQLite 3.8.3 or higher error in CentOS
- Golang of knowledge sharing -- a function based on whether a folder exists and whether a file exists
- 力扣解法汇总1037-有效的回旋镖
- Binary tree chain structure
- How to use superset to seamlessly connect with MRS for self-service analysis
- Detailed explanation of floating point numbers (a thorough study of floating point numbers)
- [SUCTF 2019]EasyWeb
- 27 | discussion on high water level and leader epoch
猜你喜欢

技术负责人如何搞垮一个团队?

Integrated base process test summary
![[brush through sword finger] sword finger offer II 003 Number of 1 in the first n digit binary](/img/4f/695bc08f4d2d7a2a33f6b13ebb40a8.png)
[brush through sword finger] sword finger offer II 003 Number of 1 in the first n digit binary

How to use superset to seamlessly connect with MRS for self-service analysis

Explain sentinel fusing strategy, degradation rules and flow control

Mp4 structure

Shell loop for while (IV)

大四学长谈程序员

Exploration and best practice of automatic verification of object acquisition technology

基于 Selenium 的 UESTC Daily Report 实现
随机推荐
Geotrust证书价格
CVE-2020-3187
Redis集群搭建
Summary of 14 anomaly detection methods
Explication détaillée du nombre de points flottants (une étude approfondie du nombre de points flottants)
Servlet
C#关于缓存区和数据流的问题
Calculate distance according to longitude and latitude
数字电路加法器 基本原理(一)
Analyze several interview questions: = = and = = =; Binding events; regular expression
Template_ Gauss elimination
一文说透Sentinel熔断策略、降级规则、流量控制
Swift GCD DispatchGroup Notify wait DispatchSourceTimer Monitor system file Two apps communicate
巴菲特的alpha--部分代码
浮點數詳解(一篇徹底學通浮點數)
力扣解法汇总1037-有效的回旋镖
Integrated base process test summary
[high level knowledge] epoll implementation principle of user mode protocol stack
C language student native place information record book
二叉树链式结构