当前位置:网站首页>Data Lake (2): What is Hudi
Data Lake (2): What is Hudi
2022-08-02 12:19:00 【InfoQ】
What is Hudi
Apache Hudi is an open source solution for Data Lakes. Hudi is short for Hadoop Updates and Incrementals. It is a Data Lakes solution developed and open sourced by Uber.Hudi can manage large-scale analytical data sets based on HDFS, and can perform operations such as inserting, updating, and incremental consumption of data. The main purpose is to efficiently reduce the data delay in the ingestion process.
Hudi is very lightweight and can be integrated with Spark and Flink as a lib
Hudi official website:
https://hudi.apache.org
data:image/s3,"s3://crabby-images/de551/de551cede908a6d0a21fffa7f1d40d3522c4c5d6" alt="null"
Hudi is based on Parquet column storage and Avro row storage, while avoiding the creation of small files to achieve high-efficiency and low-latency data access.Provides insert updates, incremental pulls, and full pulls on HDFS datasets.Hudi has the following features:
- Quick upsert, can insert index.
- Atomically manipulate data with rollback.
- Snapshot isolation between writers and queries.
- The savepoint for data recovery.Hudi implements data recovery through Savepoint.
- Manage file size, use statistics layout.
- Asynchronous compression of row and column data.
边栏推荐
- Likou 704 - binary search
- 免费的中英文翻译软件-自动批量中英文翻译软件推荐大全
- The use of QListView
- AQS-AbstractQueuedSynchronizer
- Likou 35 - search for insertion position - binary search
- QAbstractScrollArea、QScrollArea
- DTG-SSOD:最新半监督检测框架,Dense Teacher(附论文下载)
- 力扣151-颠倒字符串中的单词
- Failure Analysis | A SELECT statement crashes MySQL, what happened?
- 【MySQL系列】- LIKE查询 以%开头一定会让索引失效吗
猜你喜欢
随机推荐
软件成分分析:手握5大能力守护软件供应链安全
解决anaconda下载pytorch速度极慢的方法
【kali-信息收集】(1.8)ARP侦查工具_Netdiscover
技术分享| 融合调度系统中的电子围栏功能说明
Running yum reports Error: Cannot retrieve metalink for reposit
go源码之sync.Waitgroup
Thymeleaf
pyqt5连接MYSQL数据库问题
np.nan, np.isnan, None, pd.isnull, pd.isna 整理与小结
如何通过DBeaver 连接 TDengine?
Crack detection technology based on deep learning
商业流程服务BPass你真的了解吗?
数据湖(一):数据湖概念
Swiper系列之轮播图
力扣209-长度最小的字符串——滑动窗口法
元宇宙“吹鼓手”Unity:疯狂扩局,悬念犹存
Metaverse "Drummer" Unity: Crazy expansion, suspense still exists
#夏日挑战赛#【FFH】OpenHarmony设备开发基础(三)编译依赖
Drools(8): WorkBench uses
Transfer files between servers