当前位置:网站首页>Heating data in data lake?
Heating data in data lake?
2022-07-06 11:21:00 【Jiedao jdon】
Data Lake : End the data island through a repository for big data analysis . Imagine , There is a single place to store all your data for analysis , To support product led growth and business insight . It is sad , The idea of data lake was once ignored , Because early attempts were based on Hadoop On the repository , These repositories are local , Lack of resources and availability Extensibility . We use “Hadoop Hangover ” It's over .
In the past, data lake was famous for its management challenges and slow speed of value realization . But the accelerated adoption of cloud object storage , And the exponential growth of data , Make them attractive again .
in fact , Now more than ever, we need data lake to support data analysis . Although cloud object storage was initially popular as a cost-effective way to temporarily store or archive data , But it has become popular , Because it's cheap 、 Security 、 Durable and elastic . It is not only cost-effective , And it's easy to stream data .
Data lake or data swamp ?
The economics of cloud object storage 、 Built in security and scalability encourage enterprises to store more and more data -- Create a huge data lake with infinite potential for data analysis . Enterprises understand , Have more data ( Not less ) It can become a strategic advantage . Unfortunately , In recent history , Many data Lake projects have failed , Because the data lake has become a data swamp -- It consists of cold data that is not easy to access or use . Many people find out , Sending data to the cloud is easy , But let users of the whole organization have access to these data , And get inspiration from it , It's hard to . These data lakes have become garbage dumps for multi structure data sets , Accumulate and collect digital dust , Strategic advantage without any promise .
In short , Cloud object storage is not built for general analysis , Don't like Hadoop like that . To gain insight , Data must be transformed and removed from the lake , Enter the analysis database , Such as Splunk、MySQL or Oracle, Depending on usage . This process is complex 、 Slow and expensive . It's also a challenge , Because the industry is currently facing a shortage of data engineers , They need to clean up and transform data , And establish the required data pipeline , To incorporate it into these analysis systems .
Gartner Find out , Despite these well-known challenges , More than half of the enterprises plan to invest in the data Lake in the next two years . The data lake has a surprising number of use cases , From investigating network intrusion through security logs to researching and improving customer experience . It is no wonder that enterprises still adhere to the promise of data Lake . that , How can we clear the marsh , Make sure these efforts don't fail ? And the key is , How do we unlock and provide access to data stored in the cloud -- This is the most important of all obstacles ?
Improve the heat of cold cloud storage
It is possible for cloud objects to be stored and heated for data analysis ( And it's the best ), But this requires rethinking framework . We need to ensure that storage has the look and feel of a database , In essence , Turn cloud object storage into a high-performance analysis database or warehouse . Have " Thermal data " Need to access quickly and easily in a few minutes , Not weeks or months , Even when processing tens of megabytes every day . This type of performance requires a different approach to data pipelining , Avoid switching and moving . The required architecture is like compression 、 Index and pass well-known API Publish data to Kibana and / or Looker It's as simple as tools , For one-time storage , Reduce movement and handling .
One of the most important ways to increase the popularity of data analysis is by promoting search . say concretely , Search is the ultimate democratization of data , Allow self-service data flow selection and publishing , Without the need for IT Administrator or database engineer . All data should be completely searchable , And you can use existing data tools for analysis . Imagine , Let users have the ability to search and query at will , Ask questions easily , Easily analyze data . Most well-known data warehouse and data Lake platforms do not provide this key function .
But some forward-looking companies have found ways . With BAI Take communication companies for example , Its data Lake strategy adopts this type of architecture . In major commuter cities ,BAI Provide the most advanced communication infrastructure ( cellular 、Wi-Fi、 radio broadcast 、 Radio and IP The Internet ).BAI Build its data flow on Amazon S3 Centralized data lake on cloud object storage , It's safe there , And comply with many government regulations . Use a data Lake built on cloud object storage , And through many API Data Lake platform activation analysis ,BAI It can be faster than before 、 Easier to find 、 Access and analyze its data , And the cost is more controlled . The company is taking advantage of the insights generated by its global network over the years , Help railway operators maintain traffic flow and optimize routes , Turn data insight into business value . This method has proved particularly valuable in the event of a pandemic , because BAI Be able to learn more about COVID-19 Impact on regional public transport networks around the world , So that they can continue to provide key connections for citizens .
边栏推荐
- FRP intranet penetration
- 02 staff information management after the actual project
- MySQL completely uninstalled (windows, MAC, Linux)
- [recommended by bloggers] asp Net WebService background data API JSON (with source code)
- Deoldify项目问题——OMP:Error#15:Initializing libiomp5md.dll,but found libiomp5md.dll already initialized.
- JDBC原理
- Machine learning -- census data analysis
- Idea import / export settings file
- Dotnet replaces asp Net core's underlying communication is the IPC Library of named pipes
- Ansible实战系列二 _ Playbook入门
猜你喜欢
[download app for free]ineukernel OCR image data recognition and acquisition principle and product application
La table d'exportation Navicat génère un fichier PDM
QT creator runs the Valgrind tool on external applications
MySQL主从复制、读写分离
MySQL master-slave replication, read-write separation
Classes in C #
Did you forget to register or load this tag 报错解决方法
一键提取pdf中的表格
学习问题1:127.0.0.1拒绝了我们的访问
Basic use of redis
随机推荐
One click extraction of tables in PDF
引入了junit为什么还是用不了@Test注解
02 staff information management after the actual project
Tcp/ip protocol (UDP)
软件测试与质量学习笔记3--白盒测试
What does usart1 mean
Ansible practical Series III_ Task common commands
Ansible实战系列二 _ Playbook入门
Unable to call numpy in pycharm, with an error modulenotfounderror: no module named 'numpy‘
Number game
图像识别问题 — pytesseract.TesseractNotFoundError: tesseract is not installed or it‘s not in your path
[number theory] divisor
[蓝桥杯2021初赛] 砝码称重
SSM整合笔记通俗易懂版
【博主推荐】C# Winform定时发送邮箱(附源码)
Windows cannot start the MySQL service (located on the local computer) error 1067 the process terminated unexpectedly
Some notes of MySQL
Record a problem of raspberry pie DNS resolution failure
Antlr4 uses keywords as identifiers
Ubuntu 20.04 安装 MySQL