当前位置:网站首页>Basic architecture of data Lake
Basic architecture of data Lake
2022-07-27 12:33:00 【InfoQ】
Basic architecture of data lake


- “ The river ” The emphasis is on liquidity ,“ All rivers run into sea ”, After all, the river will flow into the sea , Enterprise level data needs long-term precipitation , So it's called “ lake ” It's called “ The river ” Be appropriate ; meanwhile , The lake water is naturally stratified , Meet different ecosystem requirements , This is related to the construction of a unified data center , The requirements for storing management data are consistent ,“ heat ” The data is on top , Easy to use at any time ; Temperature data 、 Cold data is located in different storage media in the data center , Achieve a balance between data storage capacity and cost .
- Don't cry “ The sea ” The reason is , The sea is boundless , and “ lake ” There are boundaries , This boundary is the enterprise / The business boundaries of the organization ; Therefore, the data Lake needs more data management and permission management capabilities .
- It's called “ lake ” Another important reason is that the data Lake needs fine management , A lack of control 、 Lack of governance data, the lake will eventually degenerate into “ Data swamp ”, Thus, the application cannot effectively access the data , Make the data stored in it worthless .
- For long-term as is storage
- Effective management and centralized governance
- Provide multi-mode computing power to meet processing requirements
- And business oriented , Provide a unified data view 、 Data model and data processing results
- Data access
- Data relocation
- Data governance
- Quality management
- Asset catalog
- Access control
- task management
- Task arrangement
- Metadata management, etc
- More powerful data access capability . The data access capability is reflected in the definition and management capability of various external heterogeneous data sources , And the ability to extract and migrate relevant data from external data sources , Extract the migrated data, including the metadata of external data sources and the actually stored data .
- More powerful data management capabilities . Management ability can be divided into basic management ability and extended management ability . Basic management capabilities include the management of various metadata 、 Data access control 、 Data asset management , It is necessary for a data lake system , We'll be back “ Data Lake solutions from various manufacturers ” In this section, I believe to discuss the support methods of various manufacturers for basic management capabilities . Expand management capabilities, including task management 、 Process layout and data quality 、 Capabilities related to data governance . Task management and process arrangement are mainly used to manage 、 layout 、 Dispatch 、 Monitor various tasks of processing data in the data lake system , Usually , The data Lake builder will buy / Develop customized data integration or data development subsystem / Modules to provide such capabilities , Customized system / The module can read the relevant metadata of the data lake , To realize the integration with the data lake system . Data quality and data governance are more complex issues , In general , The data lake system will not directly provide relevant functions , But it will open all kinds of interfaces or metadata , For competent enterprises / The organization integrates with existing data governance software or makes customized development .
- Sharable metadata . Various computing engines in the data lake will be deeply integrated with the data in the data lake , The integration is based on the metadata of the data lake . Good data lake system , When the computing engine processes data , The data storage location can be obtained directly from the metadata 、 data format 、 Data patterns 、 Data distribution and other information , Then directly process the data , Without manual / Programming intervention . Further more , A good data lake system can also access and control the data in the data lake , The strength of control can do “ Library table column row ” And so on .


边栏推荐
- 二分查找判定树(二分查找树平均查找长度)
- I do live e-commerce in tiktok, UK
- Openpyxl drawing radar map
- Wechat applet session holding
- Detailed explanation of flask framework
- Chapter 13 IO flow
- 2021-3-22-tencent - minimum number of guards
- 评价自动化测试优劣的隐性指标
- 4. Analysis of the execution process of make modules for general purposes
- 20210512 recursive formula
猜你喜欢

Top 10 international NFT exchanges

多表查询

JVM memory layout detailed, illustrated, well written!

Finally, I was ranked first in the content ranking in the professional field. I haven't been tired in vain during this period. Thanks to CSDN's official platform, I'm lucky and bitter.

JVM memory model

20210419 combined sum

Self built personalized automatic quotation system to cope with changeable quotation mode

J9 number theory: how long is the mainstreaming of decentralized identity?

HDU1698_Just a Hook

Play CSDN editor
随机推荐
An overview of kernel compilation system
2021-3-19-byte-face value
POJ1611_ The Suspects
Will MySQL fail to insert data? Why?
MySQL paging query instance_ MySQL paging query example explanation "suggestions collection"
Chain representation and implementation of queues
I/O实例操作
US pressure surges tiktok changes global safety director
Vi. analysis of makefile.build
Fundamentals of mathematics 01
Insert sort summary
评价自动化测试优劣的隐性指标
Fundamentals of mathematics 02 - sequence limit
Unity 2D game tutorial
Flash quickly builds an API
One article to understand the index of like in MySQL
Why does MySQL index use b+ tree instead of jump table?
微信小程序必用接口「建议收藏」
Go Beginner (5)
Sync.map of go language