当前位置:网站首页>003:what does AWS think is a data lake?
003:what does AWS think is a data lake?
2022-06-12 09:36:00 【YoungerChina】
What is a data Lake ?
Store all your data in a centralized repository at any size
1. What is a data Lake ?
The data lake is a centralized repository , Allows you to store all structured and unstructured data at any scale . You can store the data as it is ( No need to structure the data first ), And run different types of analysis – From control panel and visualization to big data processing 、 Real time analysis and machine learning , To guide better decisions .
2. Why data lake is needed ?
Organizations that successfully create business value through data will outperform their peers .Aberdeen According to a survey of , The organization that implements data lake is higher than similar companies in organic revenue growth 9%. These leaders are able to conduct new types of analysis , For example, through log files 、 Data from clickstream 、 Machine learning from new sources like social media and Internet connected devices stored in data lakes . This helps them attract and retain customers 、 raise productivity 、 Proactively maintain equipment and make informed decisions to identify and respond to business growth opportunities faster .
3. Basic elements of data lake and analysis solution
When organizing the construction of data lake and analysis platform , They need to consider many key functions , Include :
Data mobility
Data Lake allows you to import any amount of real-time data . You can collect data from multiple sources , And move it into the data Lake in its original form . This process allows you to scale to any size of data , At the same time, it saves defining data structures 、Schema And the time of conversion .
Safely store and catalog data
The data Lake allows you to store relational data ( for example , Operational databases and data from line of business applications ) And non relational data ( for example , From mobile apps 、IoT Operational databases and data for devices and social media ). They also enable you to crawl through data 、 Cataloguing and indexing to understand the data in the lake . Last , Data must be protected to ensure that your data assets are protected .
analysis
The data Lake allows a variety of roles in the organization ( Data scientists 、 Data developers and business analysts ) Access the data through the analysis tools and frameworks you choose . This includes Apache Hadoop、Presto and Apache Spark Isoopen source framework , And commercial products from data warehouse and business intelligence providers . Data Lake allows you to run analysis , Without moving the data to a separate analysis system ( How to understand ).
machine learning
Data lakes will allow organizations to generate different types of insights , Including reporting historical data and machine learning ( Build models to predict possible outcomes ), And suggest a set of prescribed actions to achieve the best results .
4. The value of the data lake
Be able to use more data from more sources in a shorter time , And enables users to process and analyze data in different ways , So as to make better 、 Faster decision making . Data lake has value-added Example Include :
Improve customer interaction
Data lake can be from CRM The customer data of the platform is combined with social media analysis , There is a marketing platform that includes purchase history and accident tickets , Enable enterprises to understand the most profitable customer base 、 Reasons for customer churn and promotions or rewards that will enhance Loyalty .
Improve R & D innovation options
The data lake can help your R & D team test its assumptions , Refine assumptions and evaluate results – For example, select the right materials in product design to improve performance , Conduct genome research to obtain more effective drugs , Or understand the customer's willingness to pay for different attributes .
Improve operational efficiency
The Internet of things (IoT) More ways have been introduced to collect data about processes such as manufacturing , Including real-time data from Internet connected devices . Use data lake , It's easy to store , And for machine generated IoT Data analysis , To find ways to reduce operating costs and improve quality .
5. The challenge of data Lake
The main challenge of the data Lake architecture is to store raw data without monitoring content . For data lakes that make data available , It requires a defined mechanism to catalog and protect data . Without these elements , Data cannot be found or trusted , Which leads to “ Data swamp ” Appearance . To meet the needs of a wider audience, the data Lake needs to be managed 、 Semantic consistency and access control .
边栏推荐
- On absolute value function in C language
- Do you know how to improve software testing ability?
- RecyclerView的onBindViewHolder被同时调用两次解决方法
- Difference between MySQL unreal reading and non repeatable reading
- MySQL-MVCC
- Autojs微信研究:微信不同的版本或模拟器上的微信里的控件ip是不同的。
- JVM virtual machine
- After going to the bathroom, I figured out the ES search server
- anxious
- 软件测试工作经验分享,一定有你想知道的
猜你喜欢

gnu-efi开发环境设置

001:数据湖是什么?
软件测试基础知识分享,写点你不知道的

Tap series article 3 | introduction to Tanzu application platform deployment reference architecture

I Regular expression to finite state automata: regular expression to NFA

Example interview -- dongyuhang: harvest love in the club

Black screen solution for computer boot
Cas d'essai et spécification de description des bogues référence

TAP 系列文章3 | Tanzu Application Platform 部署参考架构介绍

使用Visual Studio 2017创建简单的窗口程序
随机推荐
2026年中国软件定义存储市场容量将接近45.1亿美元
C#入门系列(十二) -- 字符串
What are the software testing requirements analysis methods? Let's have a look
005: difference between data lake and data warehouse
卖疯了的临期产品:超低价、大混战与新希望
ThreadLocal
004:AWS数据湖解决方案
Ceph如何改善存储性能以及提升存储稳定性
There must be something you want to know about software testing experience sharing
Distributed transactions - Theoretical Overview
Selenium面试题分享
自动化测试框架的设计原则有哪些?我帮你总结好了快来看
测试用例如何编写?
DNA数字信息存储的研究进展
Hotspot Metaspace
Quick sort
Implementation of hotspot reference
Mysql database ignores case
測試用例和bug描述規範參考
软件测试报告中常见的疏漏,给自己提个醒