当前位置:网站首页>003:what does AWS think is a data lake?
003:what does AWS think is a data lake?
2022-06-12 09:36:00 【YoungerChina】
What is a data Lake ?
Store all your data in a centralized repository at any size
1. What is a data Lake ?
The data lake is a centralized repository , Allows you to store all structured and unstructured data at any scale . You can store the data as it is ( No need to structure the data first ), And run different types of analysis – From control panel and visualization to big data processing 、 Real time analysis and machine learning , To guide better decisions .
2. Why data lake is needed ?
Organizations that successfully create business value through data will outperform their peers .Aberdeen According to a survey of , The organization that implements data lake is higher than similar companies in organic revenue growth 9%. These leaders are able to conduct new types of analysis , For example, through log files 、 Data from clickstream 、 Machine learning from new sources like social media and Internet connected devices stored in data lakes . This helps them attract and retain customers 、 raise productivity 、 Proactively maintain equipment and make informed decisions to identify and respond to business growth opportunities faster .
3. Basic elements of data lake and analysis solution
When organizing the construction of data lake and analysis platform , They need to consider many key functions , Include :
Data mobility
Data Lake allows you to import any amount of real-time data . You can collect data from multiple sources , And move it into the data Lake in its original form . This process allows you to scale to any size of data , At the same time, it saves defining data structures 、Schema And the time of conversion .
Safely store and catalog data
The data Lake allows you to store relational data ( for example , Operational databases and data from line of business applications ) And non relational data ( for example , From mobile apps 、IoT Operational databases and data for devices and social media ). They also enable you to crawl through data 、 Cataloguing and indexing to understand the data in the lake . Last , Data must be protected to ensure that your data assets are protected .
analysis
The data Lake allows a variety of roles in the organization ( Data scientists 、 Data developers and business analysts ) Access the data through the analysis tools and frameworks you choose . This includes Apache Hadoop、Presto and Apache Spark Isoopen source framework , And commercial products from data warehouse and business intelligence providers . Data Lake allows you to run analysis , Without moving the data to a separate analysis system ( How to understand ).
machine learning
Data lakes will allow organizations to generate different types of insights , Including reporting historical data and machine learning ( Build models to predict possible outcomes ), And suggest a set of prescribed actions to achieve the best results .
4. The value of the data lake
Be able to use more data from more sources in a shorter time , And enables users to process and analyze data in different ways , So as to make better 、 Faster decision making . Data lake has value-added Example Include :
Improve customer interaction
Data lake can be from CRM The customer data of the platform is combined with social media analysis , There is a marketing platform that includes purchase history and accident tickets , Enable enterprises to understand the most profitable customer base 、 Reasons for customer churn and promotions or rewards that will enhance Loyalty .
Improve R & D innovation options
The data lake can help your R & D team test its assumptions , Refine assumptions and evaluate results – For example, select the right materials in product design to improve performance , Conduct genome research to obtain more effective drugs , Or understand the customer's willingness to pay for different attributes .
Improve operational efficiency
The Internet of things (IoT) More ways have been introduced to collect data about processes such as manufacturing , Including real-time data from Internet connected devices . Use data lake , It's easy to store , And for machine generated IoT Data analysis , To find ways to reduce operating costs and improve quality .
5. The challenge of data Lake
The main challenge of the data Lake architecture is to store raw data without monitoring content . For data lakes that make data available , It requires a defined mechanism to catalog and protect data . Without these elements , Data cannot be found or trusted , Which leads to “ Data swamp ” Appearance . To meet the needs of a wider audience, the data Lake needs to be managed 、 Semantic consistency and access control .
边栏推荐
- 001:数据湖是什么?
- Dragon Boat Festival Ankang - - les Yankees dans mon cœur de plus en plus de zongzi
- 《第五项修炼》读书笔记
- Common omissions in software test reports, give yourself a wake-up call
- 抽象类和接口
- JVM garbage collection
- Do you know the meaning behind these questions?
- 还在原地踏步,提高软件测试能力的方法你知道吗?
- 榜样访谈——董宇航:在俱乐部中收获爱情
- 行业分析怎么做
猜你喜欢
随机推荐
数据库常见面试题都给你准备好了
II Transforming regular expressions into finite state automata: NFA state machine recognizes input strings
Distributed transaction solution 2: message queue to achieve final consistency
电阻的作用有哪些?(超全)
小程序的介绍
【云原生】具体指什么呢---此文和大伙儿分享答案
C语言递归文件夹的代码
Do you know the meaning behind these questions?
Microservice gateway
[cloud native] establishment of Eureka service registration
Auto.js调试:使用雷电模拟器的网络模式进行调试
ADB command collection, let's learn together
MySQL优化之慢日志查询
软件测试面试题精选
anxious
Do you know how to improve software testing ability?
The onbindviewholder of recyclerview is called twice at the same time
JVM virtual machine
Introduction to applet
Semaphore flow control semaphore







