当前位置:网站首页>Data Lake (2): What is Hudi
Data Lake (2): What is Hudi
2022-08-02 12:19:00 【InfoQ】
What is Hudi
Apache Hudi is an open source solution for Data Lakes. Hudi is short for Hadoop Updates and Incrementals. It is a Data Lakes solution developed and open sourced by Uber.Hudi can manage large-scale analytical data sets based on HDFS, and can perform operations such as inserting, updating, and incremental consumption of data. The main purpose is to efficiently reduce the data delay in the ingestion process.
Hudi is very lightweight and can be integrated with Spark and Flink as a lib
Hudi official website:
https://hudi.apache.org
Hudi is based on Parquet column storage and Avro row storage, while avoiding the creation of small files to achieve high-efficiency and low-latency data access.Provides insert updates, incremental pulls, and full pulls on HDFS datasets.Hudi has the following features:
- Quick upsert, can insert index.
- Atomically manipulate data with rollback.
- Snapshot isolation between writers and queries.
- The savepoint for data recovery.Hudi implements data recovery through Savepoint.
- Manage file size, use statistics layout.
- Asynchronous compression of row and column data.
边栏推荐
猜你喜欢
随机推荐
力扣35-搜索插入位置——二分查找
三种实现分布式锁的方式
面积曲线AUC(area under curve)
Distributed current limiting, hand & redisson implementation
kvm部署
运行yum报错Error: Cannot retrieve metalink for reposit
Create an application operation process using the kubesphere GUI
Likou 704 - binary search
力扣151-颠倒字符串中的单词
ABAP-OOAVL模板程序
liunx基础命令讲解
SQL function TRIM
Drools(8):WorkBench使用
[kali-information collection] (1.8) ARP reconnaissance tool _Netdiscover
Chapter 14 Manually create a REST service (2)
An example of type3 voltage loop compensator taking Boost as an example
Chapter 11 Documents
网站自动翻译-网站批量自动翻译-网站免费翻译导出
Swift中什么时候不能用 () 代替 Void 来使用
According to the field classification Golang map array