当前位置:网站首页>003:what does AWS think is a data lake?
003:what does AWS think is a data lake?
2022-06-12 09:36:00 【YoungerChina】
What is a data Lake ?
Store all your data in a centralized repository at any size
1. What is a data Lake ?
The data lake is a centralized repository , Allows you to store all structured and unstructured data at any scale . You can store the data as it is ( No need to structure the data first ), And run different types of analysis – From control panel and visualization to big data processing 、 Real time analysis and machine learning , To guide better decisions .
2. Why data lake is needed ?
Organizations that successfully create business value through data will outperform their peers .Aberdeen According to a survey of , The organization that implements data lake is higher than similar companies in organic revenue growth 9%. These leaders are able to conduct new types of analysis , For example, through log files 、 Data from clickstream 、 Machine learning from new sources like social media and Internet connected devices stored in data lakes . This helps them attract and retain customers 、 raise productivity 、 Proactively maintain equipment and make informed decisions to identify and respond to business growth opportunities faster .
3. Basic elements of data lake and analysis solution
When organizing the construction of data lake and analysis platform , They need to consider many key functions , Include :
Data mobility
Data Lake allows you to import any amount of real-time data . You can collect data from multiple sources , And move it into the data Lake in its original form . This process allows you to scale to any size of data , At the same time, it saves defining data structures 、Schema And the time of conversion .
Safely store and catalog data
The data Lake allows you to store relational data ( for example , Operational databases and data from line of business applications ) And non relational data ( for example , From mobile apps 、IoT Operational databases and data for devices and social media ). They also enable you to crawl through data 、 Cataloguing and indexing to understand the data in the lake . Last , Data must be protected to ensure that your data assets are protected .
analysis
The data Lake allows a variety of roles in the organization ( Data scientists 、 Data developers and business analysts ) Access the data through the analysis tools and frameworks you choose . This includes Apache Hadoop、Presto and Apache Spark Isoopen source framework , And commercial products from data warehouse and business intelligence providers . Data Lake allows you to run analysis , Without moving the data to a separate analysis system ( How to understand ).
machine learning
Data lakes will allow organizations to generate different types of insights , Including reporting historical data and machine learning ( Build models to predict possible outcomes ), And suggest a set of prescribed actions to achieve the best results .
4. The value of the data lake
Be able to use more data from more sources in a shorter time , And enables users to process and analyze data in different ways , So as to make better 、 Faster decision making . Data lake has value-added Example Include :
Improve customer interaction
Data lake can be from CRM The customer data of the platform is combined with social media analysis , There is a marketing platform that includes purchase history and accident tickets , Enable enterprises to understand the most profitable customer base 、 Reasons for customer churn and promotions or rewards that will enhance Loyalty .
Improve R & D innovation options
The data lake can help your R & D team test its assumptions , Refine assumptions and evaluate results – For example, select the right materials in product design to improve performance , Conduct genome research to obtain more effective drugs , Or understand the customer's willingness to pay for different attributes .
Improve operational efficiency
The Internet of things (IoT) More ways have been introduced to collect data about processes such as manufacturing , Including real-time data from Internet connected devices . Use data lake , It's easy to store , And for machine generated IoT Data analysis , To find ways to reduce operating costs and improve quality .
5. The challenge of data Lake
The main challenge of the data Lake architecture is to store raw data without monitoring content . For data lakes that make data available , It requires a defined mechanism to catalog and protect data . Without these elements , Data cannot be found or trusted , Which leads to “ Data swamp ” Appearance . To meet the needs of a wider audience, the data Lake needs to be managed 、 Semantic consistency and access control .
边栏推荐
- Ceil, floor and round functions
- 自动化测试学习路线,快来学吧
- 2022 pole technology communication - the dispute over anmou technology is settled, and the cornerstone of the local semiconductor industry is more stable
- Financial test interview questions to help you get the offer
- Black screen solution for computer boot
- Es6-- common basic knowledge
- 科创人·世界500强集团CIO李洋:数字化转型成事在人,决策者应时刻聚焦于「柴」
- C # getting started series (12) -- string
- Crazy temporary products: super low price, big scuffle and new hope
- 《第五项修炼》读书笔记
猜你喜欢

Is it necessary to separate databases and tables for MySQL single table data of 5million?

Distributed transaction solution 2: message queue to achieve final consistency

Principle analysis of mongodb storage engine wiredtiger

Swagger documentation details

Ceph性能优化与增强

After receiving the picture, caigou was very happy and played with PDF. The submission format was flag{xxx}, and the decryption characters should be in lowercase
自动化测试学习路线,快来学吧

网络层IP协议 ARP&ICMP&IGMP NAT

Auto.js学习笔记7:js文件调用另一个js文件里的函数和变量,解决调用失败的各种问题

Microservice gateway
随机推荐
APP测试面试题汇总,面试必考一定要看
Common omissions in software test reports, give yourself a wake-up call
90%以上软件公司都会问的软件测试面试题赶紧来背吧
端午節安康--諸佬在我心裏越來越粽要了
QQ, wechat chat depends on it (socket)?
The Dragon Boat Festival is in good health -- people are becoming more and more important in my heart
Crazy temporary products: super low price, big scuffle and new hope
[cloud native] what exactly does it mean? This article shares the answer with you
2022 极术通讯-安谋科技迎来发展新机遇
Autojs微信研究:微信不同的版本或模拟器上的微信里的控件ip是不同的。
What are the software testing requirements analysis methods? Let's have a look
Thread deadlock and its solution
Es6-- common basic knowledge
Codecraft-22 and codeforces round 795 (Div. 2)
Do you know how to improve software testing ability?
II Transforming regular expressions into finite state automata: NFA state machine recognizes input strings
电脑启动快捷键一览表
Briefly introduce the difference between threads and processes
C#入门系列(十二) -- 字符串
005:数据湖与数据仓库的区别