当前位置:网站首页>Basic architecture of data Lake
Basic architecture of data Lake
2022-07-27 12:33:00 【InfoQ】
Basic architecture of data lake


- “ The river ” The emphasis is on liquidity ,“ All rivers run into sea ”, After all, the river will flow into the sea , Enterprise level data needs long-term precipitation , So it's called “ lake ” It's called “ The river ” Be appropriate ; meanwhile , The lake water is naturally stratified , Meet different ecosystem requirements , This is related to the construction of a unified data center , The requirements for storing management data are consistent ,“ heat ” The data is on top , Easy to use at any time ; Temperature data 、 Cold data is located in different storage media in the data center , Achieve a balance between data storage capacity and cost .
- Don't cry “ The sea ” The reason is , The sea is boundless , and “ lake ” There are boundaries , This boundary is the enterprise / The business boundaries of the organization ; Therefore, the data Lake needs more data management and permission management capabilities .
- It's called “ lake ” Another important reason is that the data Lake needs fine management , A lack of control 、 Lack of governance data, the lake will eventually degenerate into “ Data swamp ”, Thus, the application cannot effectively access the data , Make the data stored in it worthless .
- For long-term as is storage
- Effective management and centralized governance
- Provide multi-mode computing power to meet processing requirements
- And business oriented , Provide a unified data view 、 Data model and data processing results
- Data access
- Data relocation
- Data governance
- Quality management
- Asset catalog
- Access control
- task management
- Task arrangement
- Metadata management, etc
- More powerful data access capability . The data access capability is reflected in the definition and management capability of various external heterogeneous data sources , And the ability to extract and migrate relevant data from external data sources , Extract the migrated data, including the metadata of external data sources and the actually stored data .
- More powerful data management capabilities . Management ability can be divided into basic management ability and extended management ability . Basic management capabilities include the management of various metadata 、 Data access control 、 Data asset management , It is necessary for a data lake system , We'll be back “ Data Lake solutions from various manufacturers ” In this section, I believe to discuss the support methods of various manufacturers for basic management capabilities . Expand management capabilities, including task management 、 Process layout and data quality 、 Capabilities related to data governance . Task management and process arrangement are mainly used to manage 、 layout 、 Dispatch 、 Monitor various tasks of processing data in the data lake system , Usually , The data Lake builder will buy / Develop customized data integration or data development subsystem / Modules to provide such capabilities , Customized system / The module can read the relevant metadata of the data lake , To realize the integration with the data lake system . Data quality and data governance are more complex issues , In general , The data lake system will not directly provide relevant functions , But it will open all kinds of interfaces or metadata , For competent enterprises / The organization integrates with existing data governance software or makes customized development .
- Sharable metadata . Various computing engines in the data lake will be deeply integrated with the data in the data lake , The integration is based on the metadata of the data lake . Good data lake system , When the computing engine processes data , The data storage location can be obtained directly from the metadata 、 data format 、 Data patterns 、 Data distribution and other information , Then directly process the data , Without manual / Programming intervention . Further more , A good data lake system can also access and control the data in the data lake , The strength of control can do “ Library table column row ” And so on .


边栏推荐
- 关于离线缓存Application Cache /使用 manifest文件缓存
- No matching distribution found for flask_ A solution to compat
- Interviewer: how can you close an order without using a scheduled task?
- 2021-3-17-byte-hu Pai
- The bank's face recognition system was broken: a depositor was stolen 430000 yuan
- 隔离级别
- Configuration files in MySQL
- Will MySQL fail to insert data? Why?
- Redistemplate cannot get the value according to the key
- Several rounds of SQL queries in a database
猜你喜欢

About the problem that the onapplicationevent method of the custom listener is executed multiple times

HDU1698_Just a Hook

Will MySQL fail to insert data? Why?

Shutter project scrollcontroller attached to multiple scroll views, failed assertion: line 109 POS 12 error handling

Bishi journey

Good looking (dynamic) Jay fan self-made dynamic album card (front and back are different) and lyrics page

JVM memory layout detailed, illustrated, well written!

SQL question brushing: find out the current salary of all employees

Configuration files in MySQL

B 站 713 事故后的多活容灾建设|TakinTalks 大咖分享
随机推荐
HDU1698_ Just a Hook
The use of omitempty in go
[网摘][医学影像] 常用的DICOM缩略图解释以及Viewer converter 转换工具
Plus版SBOM:流水线物料清单PBOM
Shutter project scrollcontroller attached to multiple scroll views, failed assertion: line 109 POS 12 error handling
Mysql8msi installation tutorial (database mysql installation tutorial)
Wechat applet must use interface "suggestions collection"
While loop instance in shell
POJ1611_ The Suspects
Openpyxl drawing radar map
为什么需要外键?
Multi activity disaster recovery construction after station B 713 accident | takintalks share
Top of the tide - reading notes + excerpts + insights
Chapter 12 generics
Kazoo tutorial
20210520 TCP sliding window
I/o instance operation
Lonely young people can't quit jellycat
Chain representation and implementation of queues
V. introduction of other objectives and general options