当前位置:网站首页>Heating data in data lake?
Heating data in data lake?
2022-07-06 11:21:00 【Jiedao jdon】
Data Lake : End the data island through a repository for big data analysis . Imagine , There is a single place to store all your data for analysis , To support product led growth and business insight . It is sad , The idea of data lake was once ignored , Because early attempts were based on Hadoop On the repository , These repositories are local , Lack of resources and availability Extensibility . We use “Hadoop Hangover ” It's over .
In the past, data lake was famous for its management challenges and slow speed of value realization . But the accelerated adoption of cloud object storage , And the exponential growth of data , Make them attractive again .
in fact , Now more than ever, we need data lake to support data analysis . Although cloud object storage was initially popular as a cost-effective way to temporarily store or archive data , But it has become popular , Because it's cheap 、 Security 、 Durable and elastic . It is not only cost-effective , And it's easy to stream data .
Data lake or data swamp ?
The economics of cloud object storage 、 Built in security and scalability encourage enterprises to store more and more data -- Create a huge data lake with infinite potential for data analysis . Enterprises understand , Have more data ( Not less ) It can become a strategic advantage . Unfortunately , In recent history , Many data Lake projects have failed , Because the data lake has become a data swamp -- It consists of cold data that is not easy to access or use . Many people find out , Sending data to the cloud is easy , But let users of the whole organization have access to these data , And get inspiration from it , It's hard to . These data lakes have become garbage dumps for multi structure data sets , Accumulate and collect digital dust , Strategic advantage without any promise .
In short , Cloud object storage is not built for general analysis , Don't like Hadoop like that . To gain insight , Data must be transformed and removed from the lake , Enter the analysis database , Such as Splunk、MySQL or Oracle, Depending on usage . This process is complex 、 Slow and expensive . It's also a challenge , Because the industry is currently facing a shortage of data engineers , They need to clean up and transform data , And establish the required data pipeline , To incorporate it into these analysis systems .
Gartner Find out , Despite these well-known challenges , More than half of the enterprises plan to invest in the data Lake in the next two years . The data lake has a surprising number of use cases , From investigating network intrusion through security logs to researching and improving customer experience . It is no wonder that enterprises still adhere to the promise of data Lake . that , How can we clear the marsh , Make sure these efforts don't fail ? And the key is , How do we unlock and provide access to data stored in the cloud -- This is the most important of all obstacles ?
Improve the heat of cold cloud storage
It is possible for cloud objects to be stored and heated for data analysis ( And it's the best ), But this requires rethinking framework . We need to ensure that storage has the look and feel of a database , In essence , Turn cloud object storage into a high-performance analysis database or warehouse . Have " Thermal data " Need to access quickly and easily in a few minutes , Not weeks or months , Even when processing tens of megabytes every day . This type of performance requires a different approach to data pipelining , Avoid switching and moving . The required architecture is like compression 、 Index and pass well-known API Publish data to Kibana and / or Looker It's as simple as tools , For one-time storage , Reduce movement and handling .
One of the most important ways to increase the popularity of data analysis is by promoting search . say concretely , Search is the ultimate democratization of data , Allow self-service data flow selection and publishing , Without the need for IT Administrator or database engineer . All data should be completely searchable , And you can use existing data tools for analysis . Imagine , Let users have the ability to search and query at will , Ask questions easily , Easily analyze data . Most well-known data warehouse and data Lake platforms do not provide this key function .
But some forward-looking companies have found ways . With BAI Take communication companies for example , Its data Lake strategy adopts this type of architecture . In major commuter cities ,BAI Provide the most advanced communication infrastructure ( cellular 、Wi-Fi、 radio broadcast 、 Radio and IP The Internet ).BAI Build its data flow on Amazon S3 Centralized data lake on cloud object storage , It's safe there , And comply with many government regulations . Use a data Lake built on cloud object storage , And through many API Data Lake platform activation analysis ,BAI It can be faster than before 、 Easier to find 、 Access and analyze its data , And the cost is more controlled . The company is taking advantage of the insights generated by its global network over the years , Help railway operators maintain traffic flow and optimize routes , Turn data insight into business value . This method has proved particularly valuable in the event of a pandemic , because BAI Be able to learn more about COVID-19 Impact on regional public transport networks around the world , So that they can continue to provide key connections for citizens .
边栏推荐
- Idea import / export settings file
- Tcp/ip protocol (UDP)
- Mysql 其他主机无法连接本地数据库
- [C language foundation] 04 judgment and circulation
- How to set up voice recognition on the computer with shortcut keys
- QT creator uses Valgrind code analysis tool
- How to build a new project for keil5mdk (with super detailed drawings)
- Solution: log4j:warn please initialize the log4j system properly
- L2-007 家庭房产 (25 分)
- Postman environment variable settings
猜你喜欢
Software testing and quality learning notes 3 -- white box testing
csdn-Markdown编辑器
Django running error: error loading mysqldb module solution
Redis的基础使用
一键提取pdf中的表格
【博主推荐】asp.net WebService 后台数据API JSON(附源码)
Neo4j installation tutorial
Pytorch基础
Generate PDM file from Navicat export table
Deoldify项目问题——OMP:Error#15:Initializing libiomp5md.dll,but found libiomp5md.dll already initialized.
随机推荐
Postman uses scripts to modify the values of environment variables
Armv8-a programming guide MMU (2)
What does usart1 mean
Ansible实战系列一 _ 入门
[C language foundation] 04 judgment and circulation
Invalid default value for 'create appears when importing SQL_ Time 'error reporting solution
【博主推荐】C# Winform定时发送邮箱(附源码)
QT creator runs the Valgrind tool on external applications
自动机器学习框架介绍与使用(flaml、h2o)
Machine learning -- census data analysis
How to build a new project for keil5mdk (with super detailed drawings)
One click extraction of tables in PDF
JDBC principle
Did you forget to register or load this tag 报错解决方法
Neo4j installation tutorial
[recommended by bloggers] asp Net WebService background data API JSON (with source code)
Deoldify project problem - omp:error 15:initializing libiomp5md dll,but found libiomp5md. dll already initialized.
error C4996: ‘strcpy‘: This function or variable may be unsafe. Consider using strcpy_s instead
What does BSP mean
Knowledge Q & A based on Apache Jena