当前位置:网站首页>Heating data in data lake?
Heating data in data lake?
2022-07-06 11:21:00 【Jiedao jdon】
Data Lake : End the data island through a repository for big data analysis . Imagine , There is a single place to store all your data for analysis , To support product led growth and business insight . It is sad , The idea of data lake was once ignored , Because early attempts were based on Hadoop On the repository , These repositories are local , Lack of resources and availability Extensibility . We use “Hadoop Hangover ” It's over .
In the past, data lake was famous for its management challenges and slow speed of value realization . But the accelerated adoption of cloud object storage , And the exponential growth of data , Make them attractive again .
in fact , Now more than ever, we need data lake to support data analysis . Although cloud object storage was initially popular as a cost-effective way to temporarily store or archive data , But it has become popular , Because it's cheap 、 Security 、 Durable and elastic . It is not only cost-effective , And it's easy to stream data .
Data lake or data swamp ?
The economics of cloud object storage 、 Built in security and scalability encourage enterprises to store more and more data -- Create a huge data lake with infinite potential for data analysis . Enterprises understand , Have more data ( Not less ) It can become a strategic advantage . Unfortunately , In recent history , Many data Lake projects have failed , Because the data lake has become a data swamp -- It consists of cold data that is not easy to access or use . Many people find out , Sending data to the cloud is easy , But let users of the whole organization have access to these data , And get inspiration from it , It's hard to . These data lakes have become garbage dumps for multi structure data sets , Accumulate and collect digital dust , Strategic advantage without any promise .
In short , Cloud object storage is not built for general analysis , Don't like Hadoop like that . To gain insight , Data must be transformed and removed from the lake , Enter the analysis database , Such as Splunk、MySQL or Oracle, Depending on usage . This process is complex 、 Slow and expensive . It's also a challenge , Because the industry is currently facing a shortage of data engineers , They need to clean up and transform data , And establish the required data pipeline , To incorporate it into these analysis systems .
Gartner Find out , Despite these well-known challenges , More than half of the enterprises plan to invest in the data Lake in the next two years . The data lake has a surprising number of use cases , From investigating network intrusion through security logs to researching and improving customer experience . It is no wonder that enterprises still adhere to the promise of data Lake . that , How can we clear the marsh , Make sure these efforts don't fail ? And the key is , How do we unlock and provide access to data stored in the cloud -- This is the most important of all obstacles ?
Improve the heat of cold cloud storage
It is possible for cloud objects to be stored and heated for data analysis ( And it's the best ), But this requires rethinking framework . We need to ensure that storage has the look and feel of a database , In essence , Turn cloud object storage into a high-performance analysis database or warehouse . Have " Thermal data " Need to access quickly and easily in a few minutes , Not weeks or months , Even when processing tens of megabytes every day . This type of performance requires a different approach to data pipelining , Avoid switching and moving . The required architecture is like compression 、 Index and pass well-known API Publish data to Kibana and / or Looker It's as simple as tools , For one-time storage , Reduce movement and handling .
One of the most important ways to increase the popularity of data analysis is by promoting search . say concretely , Search is the ultimate democratization of data , Allow self-service data flow selection and publishing , Without the need for IT Administrator or database engineer . All data should be completely searchable , And you can use existing data tools for analysis . Imagine , Let users have the ability to search and query at will , Ask questions easily , Easily analyze data . Most well-known data warehouse and data Lake platforms do not provide this key function .
But some forward-looking companies have found ways . With BAI Take communication companies for example , Its data Lake strategy adopts this type of architecture . In major commuter cities ,BAI Provide the most advanced communication infrastructure ( cellular 、Wi-Fi、 radio broadcast 、 Radio and IP The Internet ).BAI Build its data flow on Amazon S3 Centralized data lake on cloud object storage , It's safe there , And comply with many government regulations . Use a data Lake built on cloud object storage , And through many API Data Lake platform activation analysis ,BAI It can be faster than before 、 Easier to find 、 Access and analyze its data , And the cost is more controlled . The company is taking advantage of the insights generated by its global network over the years , Help railway operators maintain traffic flow and optimize routes , Turn data insight into business value . This method has proved particularly valuable in the event of a pandemic , because BAI Be able to learn more about COVID-19 Impact on regional public transport networks around the world , So that they can continue to provide key connections for citizens .
边栏推荐
- 【博主推荐】SSM框架的后台管理系统(附源码)
- 1. Mx6u learning notes (VII): bare metal development (4) -- master frequency and clock configuration
- Idea import / export settings file
- 软件测试-面试题分享
- Summary of numpy installation problems
- Cookie setting three-day secret free login (run tutorial)
- AcWing 1294.樱花 题解
- In the era of DFI dividends, can TGP become a new benchmark for future DFI?
- Postman uses scripts to modify the values of environment variables
- Why can't STM32 download the program
猜你喜欢
![[recommended by bloggers] C # generate a good-looking QR code (with source code)](/img/5a/1dbafe5a28f016b815964b9b37c9f1.jpg)
[recommended by bloggers] C # generate a good-looking QR code (with source code)
![[ahoi2009]chess Chinese chess - combination number optimization shape pressure DP](/img/7d/8cbbd2f328a10808319458a96fa5ec.jpg)
[ahoi2009]chess Chinese chess - combination number optimization shape pressure DP

La table d'exportation Navicat génère un fichier PDM

02 staff information management after the actual project
![[recommended by bloggers] C MVC list realizes the function of adding, deleting, modifying, checking, importing and exporting curves (with source code)](/img/b7/aae35f049ba659326536904ab089cb.png)
[recommended by bloggers] C MVC list realizes the function of adding, deleting, modifying, checking, importing and exporting curves (with source code)
![[recommended by bloggers] asp Net WebService background data API JSON (with source code)](/img/04/c721e6177b578b30cbbf334cb1b6c9.png)
[recommended by bloggers] asp Net WebService background data API JSON (with source code)

QT creator specify editor settings

AcWing 1298.曹冲养猪 题解

PyCharm中无法调用numpy,报错ModuleNotFoundError: No module named ‘numpy‘

学习问题1:127.0.0.1拒绝了我们的访问
随机推荐
[ahoi2009]chess Chinese chess - combination number optimization shape pressure DP
SSM integrated notes easy to understand version
Image recognition - pyteseract TesseractNotFoundError: tesseract is not installed or it‘s not in your path
AcWing 1294. Cherry Blossom explanation
TCP/IP协议(UDP)
MySQL主從複制、讀寫分離
02 staff information management after the actual project
error C4996: ‘strcpy‘: This function or variable may be unsafe. Consider using strcpy_s instead
L2-006 树的遍历 (25 分)
学习问题1:127.0.0.1拒绝了我们的访问
SSM整合笔记通俗易懂版
图像识别问题 — pytesseract.TesseractNotFoundError: tesseract is not installed or it‘s not in your path
Invalid global search in idea/pychar, etc. (win10)
Some problems in the development of unity3d upgraded 2020 VR
软件测试与质量学习笔记3--白盒测试
Database advanced learning notes -- SQL statement
[number theory] divisor
Postman uses scripts to modify the values of environment variables
windows下同时安装mysql5.5和mysql8.0
解决安装Failed building wheel for pillow