当前位置:网站首页>Data Lake (2): What is Hudi
Data Lake (2): What is Hudi
2022-08-02 12:19:00 【InfoQ】
What is Hudi
Apache Hudi is an open source solution for Data Lakes. Hudi is short for Hadoop Updates and Incrementals. It is a Data Lakes solution developed and open sourced by Uber.Hudi can manage large-scale analytical data sets based on HDFS, and can perform operations such as inserting, updating, and incremental consumption of data. The main purpose is to efficiently reduce the data delay in the ingestion process.
Hudi is very lightweight and can be integrated with Spark and Flink as a lib
Hudi official website:
https://hudi.apache.org

Hudi is based on Parquet column storage and Avro row storage, while avoiding the creation of small files to achieve high-efficiency and low-latency data access.Provides insert updates, incremental pulls, and full pulls on HDFS datasets.Hudi has the following features:
- Quick upsert, can insert index.
- Atomically manipulate data with rollback.
- Snapshot isolation between writers and queries.
- The savepoint for data recovery.Hudi implements data recovery through Savepoint.
- Manage file size, use statistics layout.
- Asynchronous compression of row and column data.
边栏推荐
猜你喜欢
随机推荐
【kali-信息收集】(1.8)ARP侦查工具_Netdiscover
手撸架构,网络 面试36问
np.nan, np.isnan, None, pd.isnull, pd.isna finishing and summary
openresty 性能优化
[kali-information collection] (1.8) ARP reconnaissance tool _Netdiscover
翻译英语的软件-免费翻译软件-各种语言互相翻译
Pod调度策略:亲和性、污点与污点容忍
使用mosquitto过程中的问题解决
【MySQL系列】- LIKE查询 以%开头一定会让索引失效吗
SQL Server2019安装步骤及脱机安装Microsoft机器学习组件下一步不能继续的问题
ssm访问数据库数据报错
技术分享| 融合调度系统中的电子围栏功能说明
ABAP-OOAVL模板程序
基于threejs的商品VR展示平台的设计与实现思路
WebUI自动化测试框架搭建从0到1(完整源码)更新完毕
Drools(8):WorkBench使用
Swift中什么时候不能用 () 代替 Void 来使用
WPF 实现窗体抖动效果
服务器间传输文件
学习经验分享之七:YOLOv5代码中文注释









