当前位置:网站首页>Heating data in data lake?
Heating data in data lake?
2022-07-06 11:21:00 【Jiedao jdon】
Data Lake : End the data island through a repository for big data analysis . Imagine , There is a single place to store all your data for analysis , To support product led growth and business insight . It is sad , The idea of data lake was once ignored , Because early attempts were based on Hadoop On the repository , These repositories are local , Lack of resources and availability Extensibility . We use “Hadoop Hangover ” It's over .
In the past, data lake was famous for its management challenges and slow speed of value realization . But the accelerated adoption of cloud object storage , And the exponential growth of data , Make them attractive again .
in fact , Now more than ever, we need data lake to support data analysis . Although cloud object storage was initially popular as a cost-effective way to temporarily store or archive data , But it has become popular , Because it's cheap 、 Security 、 Durable and elastic . It is not only cost-effective , And it's easy to stream data .
Data lake or data swamp ?
The economics of cloud object storage 、 Built in security and scalability encourage enterprises to store more and more data -- Create a huge data lake with infinite potential for data analysis . Enterprises understand , Have more data ( Not less ) It can become a strategic advantage . Unfortunately , In recent history , Many data Lake projects have failed , Because the data lake has become a data swamp -- It consists of cold data that is not easy to access or use . Many people find out , Sending data to the cloud is easy , But let users of the whole organization have access to these data , And get inspiration from it , It's hard to . These data lakes have become garbage dumps for multi structure data sets , Accumulate and collect digital dust , Strategic advantage without any promise .
In short , Cloud object storage is not built for general analysis , Don't like Hadoop like that . To gain insight , Data must be transformed and removed from the lake , Enter the analysis database , Such as Splunk、MySQL or Oracle, Depending on usage . This process is complex 、 Slow and expensive . It's also a challenge , Because the industry is currently facing a shortage of data engineers , They need to clean up and transform data , And establish the required data pipeline , To incorporate it into these analysis systems .
Gartner Find out , Despite these well-known challenges , More than half of the enterprises plan to invest in the data Lake in the next two years . The data lake has a surprising number of use cases , From investigating network intrusion through security logs to researching and improving customer experience . It is no wonder that enterprises still adhere to the promise of data Lake . that , How can we clear the marsh , Make sure these efforts don't fail ? And the key is , How do we unlock and provide access to data stored in the cloud -- This is the most important of all obstacles ?
Improve the heat of cold cloud storage
It is possible for cloud objects to be stored and heated for data analysis ( And it's the best ), But this requires rethinking framework . We need to ensure that storage has the look and feel of a database , In essence , Turn cloud object storage into a high-performance analysis database or warehouse . Have " Thermal data " Need to access quickly and easily in a few minutes , Not weeks or months , Even when processing tens of megabytes every day . This type of performance requires a different approach to data pipelining , Avoid switching and moving . The required architecture is like compression 、 Index and pass well-known API Publish data to Kibana and / or Looker It's as simple as tools , For one-time storage , Reduce movement and handling .
One of the most important ways to increase the popularity of data analysis is by promoting search . say concretely , Search is the ultimate democratization of data , Allow self-service data flow selection and publishing , Without the need for IT Administrator or database engineer . All data should be completely searchable , And you can use existing data tools for analysis . Imagine , Let users have the ability to search and query at will , Ask questions easily , Easily analyze data . Most well-known data warehouse and data Lake platforms do not provide this key function .
But some forward-looking companies have found ways . With BAI Take communication companies for example , Its data Lake strategy adopts this type of architecture . In major commuter cities ,BAI Provide the most advanced communication infrastructure ( cellular 、Wi-Fi、 radio broadcast 、 Radio and IP The Internet ).BAI Build its data flow on Amazon S3 Centralized data lake on cloud object storage , It's safe there , And comply with many government regulations . Use a data Lake built on cloud object storage , And through many API Data Lake platform activation analysis ,BAI It can be faster than before 、 Easier to find 、 Access and analyze its data , And the cost is more controlled . The company is taking advantage of the insights generated by its global network over the years , Help railway operators maintain traffic flow and optimize routes , Turn data insight into business value . This method has proved particularly valuable in the event of a pandemic , because BAI Be able to learn more about COVID-19 Impact on regional public transport networks around the world , So that they can continue to provide key connections for citizens .
边栏推荐
- Record a problem of raspberry pie DNS resolution failure
- Neo4j installation tutorial
- AcWing 179.阶乘分解 题解
- PyCharm中无法调用numpy,报错ModuleNotFoundError: No module named ‘numpy‘
- [recommended by bloggers] background management system of SSM framework (with source code)
- QT creator specifies dependencies
- There are three iPhone se 2022 models in the Eurasian Economic Commission database
- [recommended by bloggers] C WinForm regularly sends email (with source code)
- 基于apache-jena的知识问答
- 解决安装Failed building wheel for pillow
猜你喜欢

Basic use of redis

Machine learning -- census data analysis

引入了junit为什么还是用不了@Test注解

机器学习--人口普查数据分析
![[recommended by bloggers] asp Net WebService background data API JSON (with source code)](/img/04/c721e6177b578b30cbbf334cb1b6c9.png)
[recommended by bloggers] asp Net WebService background data API JSON (with source code)

Deoldify project problem - omp:error 15:initializing libiomp5md dll,but found libiomp5md. dll already initialized.

Introduction and use of automatic machine learning framework (flaml, H2O)
![[free setup] asp Net online course selection system design and Implementation (source code +lunwen)](/img/ac/b518796a92d00615cd374c0c835f38.jpg)
[free setup] asp Net online course selection system design and Implementation (source code +lunwen)

CSDN markdown editor

Pytorch基础
随机推荐
Introduction to the easy copy module
Image recognition - pyteseract TesseractNotFoundError: tesseract is not installed or it‘s not in your path
数据库高级学习笔记--SQL语句
@Controller, @service, @repository, @component differences
Solve the problem of installing failed building wheel for pilot
Windows下安装MongDB教程、Redis教程
A trip to Macao - > see the world from a non line city to Macao
Introduction and use of automatic machine learning framework (flaml, H2O)
Asp access Shaoxing tourism graduation design website
Ansible实战系列一 _ 入门
[Thesis Writing] how to write function description of jsp online examination system
机器学习--人口普查数据分析
[蓝桥杯2020初赛] 平面切分
There are three iPhone se 2022 models in the Eurasian Economic Commission database
Principes JDBC
QT creator runs the Valgrind tool on external applications
Data dictionary in C #
QT creator create button
【博主推荐】C#MVC列表实现增删改查导入导出曲线功能(附源码)
Software testing - interview question sharing