当前位置:网站首页>Hudi of data Lake (1): introduction to Hudi
Hudi of data Lake (1): introduction to Hudi
2022-07-06 00:01:00 【Electro optic flicker】
Catalog
4. Hudi Release time of each version
0. Links to related articles
Basic knowledge points of big data A summary of the article
1. What is? Hudi
Apache Hudi( pronunciation “hoodie”) It is the next generation of streaming data Lake platform .Apache Hudi Bring core warehouse and database functions directly to the data Lake .Hudi Tables are provided , Business , Efficient upserts / Delete , Advanced index , Streaming ingestion service , Data cluster / Compression optimization and concurrency , At the same time, keep the data in open source file format .
Apache Hudi Not only for streaming workloads , It also allows the creation of effective incremental batch pipelines . Include Uber, Amazon, ByteDance, Robinhood And more companies are using Hudi Transform their production data Lake .
Apache Hudi It can be easily used on any cloud storage platform .Hudi Advanced performance optimization for , Analyze workloads using any popular query engine , Include Apache Spark,Flink,Presto,Trino,Hive etc. .
2. Hudi Position in big data
Hudi Introducing stream processing into big data , Provide fresh data , At the same time, it is one data order of magnitude higher than the traditional batch processing efficiency .

3. Hudi Characteristics of
- Fast upsert, Insertable index
- Operate data atomically and have rollback function
- Snapshot isolation between writer and query
- savepoint Save point for user data recovery
- Manage file size , Use statistics layout
- Asynchronously compress row and column data
- Have a timeline to track metadata lineage
- Optimize the data set by clustering
4. Hudi Release time of each version
github Official website address :Tags · apache/hudi · GitHub




Hudi Download address and feature description of each historical version :Download | Apache Hudi

notes :Hudi The series of blog posts are through Hudi Written in the official website learning records , One of them is to add personal understanding , If there is any deficiency , Please understand
notes : Links to other related articles go here ( Include Hudi Blog posts related to big data, including ) -> Basic knowledge points of big data A summary of the article
边栏推荐
猜你喜欢

单商户V4.4,初心未变,实力依旧!

多普勒效應(多普勒頻移)

Bao Yan notes II software engineering and calculation volume II (Chapter 13-16)

Research notes I software engineering and calculation volume II (Chapter 1-7)

云呐|固定资产管理系统主要操作流程有哪些

Senparc.Weixin.Sample.MP源码剖析
![[binary search tree] add, delete, modify and query function code implementation](/img/38/810a83575c56f17a7a0ed428a2e02e.png)
[binary search tree] add, delete, modify and query function code implementation

FFT learning notes (I think it is detailed)

数据库遇到的问题

行列式学习笔记(一)
随机推荐
提升工作效率工具:SQL批量生成工具思想
[designmode] adapter pattern
15 MySQL stored procedures and functions
[QT] QT uses qjson to generate JSON files and save them
JS can really prohibit constant modification this time!
14 MySQL-视图
Doppler effect (Doppler shift)
上门预约服务类的App功能详解
【SQL】各主流数据库sql拓展语言(T-SQL 、 PL/SQL、PL/PGSQL)
软件测试工程师必会的银行存款业务,你了解多少?
Mathematical model Lotka Volterra
权限问题:source .bash_profile permission denied
Add noise randomly to open3d point cloud
Asynchronous task Whenall timeout - Async task WhenAll with timeout
Online yaml to CSV tool
教你在HbuilderX上使用模拟器运行uni-app,良心教学!!!
Transport layer protocol ----- UDP protocol
Initialiser votre vecteur & initialisateur avec une liste Introduction à la Liste
Hardware and interface learning summary
跟着CTF-wiki学pwn——ret2libc1