当前位置:网站首页>Data Lake (2): What is Hudi
Data Lake (2): What is Hudi
2022-08-02 12:19:00 【InfoQ】
What is Hudi
Apache Hudi is an open source solution for Data Lakes. Hudi is short for Hadoop Updates and Incrementals. It is a Data Lakes solution developed and open sourced by Uber.Hudi can manage large-scale analytical data sets based on HDFS, and can perform operations such as inserting, updating, and incremental consumption of data. The main purpose is to efficiently reduce the data delay in the ingestion process.
Hudi is very lightweight and can be integrated with Spark and Flink as a lib
Hudi official website:
https://hudi.apache.org

Hudi is based on Parquet column storage and Avro row storage, while avoiding the creation of small files to achieve high-efficiency and low-latency data access.Provides insert updates, incremental pulls, and full pulls on HDFS datasets.Hudi has the following features:
- Quick upsert, can insert index.
- Atomically manipulate data with rollback.
- Snapshot isolation between writers and queries.
- The savepoint for data recovery.Hudi implements data recovery through Savepoint.
- Manage file size, use statistics layout.
- Asynchronous compression of row and column data.
边栏推荐
猜你喜欢
随机推荐
ASP.NET Core 6框架揭秘实例演示[31]:路由“高阶”用法
自己如何做小程序呢?
WPF 实现窗体抖动效果
解决anaconda下载pytorch速度极慢的方法
JVM学习----垃圾回收调优
故障分析 | 一条 SELECT 语句跑崩了 MySQL ,怎么回事?
【kali-信息收集】(1.9)Metasploit+搜索引擎工具Shodan
手撸架构,网络 面试36问
太厉害了,终于有人能把TCP/IP 协议讲的明明白白了
基础协议讲解
半夜赶工制作简报的我好想说 : 确定了,最终稿就是这样
Manual architecture, Mysql interview 126 questions
观察者(observer)模式(二) —— 实现线程安全的监听器
力扣209-长度最小的字符串——滑动窗口法
SQL Server 数据库之导入导出数据
Drools(8):WorkBench使用
Thymeleaf
Do you really understand the business process service BPass?
SQL Server修改数据
openresty 性能优化









