当前位置:网站首页>Hudi of data Lake (1): introduction to Hudi
Hudi of data Lake (1): introduction to Hudi
2022-07-06 00:01:00 【Electro optic flicker】
Catalog
4. Hudi Release time of each version
0. Links to related articles
Basic knowledge points of big data A summary of the article
1. What is? Hudi
Apache Hudi( pronunciation “hoodie”) It is the next generation of streaming data Lake platform .Apache Hudi Bring core warehouse and database functions directly to the data Lake .Hudi Tables are provided , Business , Efficient upserts / Delete , Advanced index , Streaming ingestion service , Data cluster / Compression optimization and concurrency , At the same time, keep the data in open source file format .
Apache Hudi Not only for streaming workloads , It also allows the creation of effective incremental batch pipelines . Include Uber, Amazon, ByteDance, Robinhood And more companies are using Hudi Transform their production data Lake .
Apache Hudi It can be easily used on any cloud storage platform .Hudi Advanced performance optimization for , Analyze workloads using any popular query engine , Include Apache Spark,Flink,Presto,Trino,Hive etc. .
2. Hudi Position in big data
Hudi Introducing stream processing into big data , Provide fresh data , At the same time, it is one data order of magnitude higher than the traditional batch processing efficiency .

3. Hudi Characteristics of
- Fast upsert, Insertable index
- Operate data atomically and have rollback function
- Snapshot isolation between writer and query
- savepoint Save point for user data recovery
- Manage file size , Use statistics layout
- Asynchronously compress row and column data
- Have a timeline to track metadata lineage
- Optimize the data set by clustering
4. Hudi Release time of each version
github Official website address :Tags · apache/hudi · GitHub




Hudi Download address and feature description of each historical version :Download | Apache Hudi

notes :Hudi The series of blog posts are through Hudi Written in the official website learning records , One of them is to add personal understanding , If there is any deficiency , Please understand
notes : Links to other related articles go here ( Include Hudi Blog posts related to big data, including ) -> Basic knowledge points of big data A summary of the article
边栏推荐
- Open source CRM customer relationship system management system source code, free sharing
- Qt QPushButton详解
- Senparc.Weixin.Sample.MP源码剖析
- Huawei equipment is configured with OSPF and BFD linkage
- Spreadjs 15.1 CN and spreadjs 15.1 en
- Online yaml to CSV tool
- 4 points tell you the advantages of the combination of real-time chat and chat robots
- 【QT】Qt使用QJson生成json文件并保存
- 20220703 周赛:知道秘密的人数-动规(题解)
- 零犀科技携手集智俱乐部:“因果派”论坛成功举办,“因果革命”带来下一代可信AI
猜你喜欢

硬件及接口学习总结

PV静态创建和动态创建

5. Logistic regression

Redis high availability - master-slave replication, sentinel mode, cluster

跟着CTF-wiki学pwn——ret2libc1

Qt QPushButton详解

MySql——CRUD

My colleagues quietly told me that flying Book notification can still play like this

教你在HbuilderX上使用模拟器运行uni-app,良心教学!!!
![[noi simulation] Anaid's tree (Mobius inversion, exponential generating function, Ehrlich sieve, virtual tree)](/img/d6/c3128e26d7e629b7f128c551cd03a7.png)
[noi simulation] Anaid's tree (Mobius inversion, exponential generating function, Ehrlich sieve, virtual tree)
随机推荐
XML配置文件(DTD详细讲解)
云呐|公司固定资产管理系统有哪些?
Laser slam learning record
JS can really prohibit constant modification this time!
MySql——CRUD
【二叉搜索树】增删改查功能代码实现
【QT】Qt使用QJson生成json文件并保存
【DesignMode】装饰者模式(Decorator pattern)
JVM details
Bao Yan notebook IV software engineering and calculation volume II (Chapter 8-12)
Single merchant v4.4 has the same original intention and strength!
GD32F4xx uIP协议栈移植记录
Fiddler Everywhere 3.2.1 Crack
mysql-全局锁和表锁
Redis high availability - master-slave replication, sentinel mode, cluster
Add noise randomly to open3d point cloud
提升工作效率工具:SQL批量生成工具思想
零犀科技携手集智俱乐部:“因果派”论坛成功举办,“因果革命”带来下一代可信AI
There is no network after configuring the agent by capturing packets with Fiddler mobile phones
5. Logistic regression