当前位置:网站首页>Hudi of data Lake (1): introduction to Hudi

Hudi of data Lake (1): introduction to Hudi

2022-07-06 00:01:00 Electro optic flicker

Catalog

0. Links to related articles

1. What is? Hudi

2. Hudi Position in big data

3. Hudi Characteristics of

4. Hudi Release time of each version


0. Links to related articles

Basic knowledge points of big data A summary of the article

1. What is? Hudi

        Apache Hudi( pronunciation “hoodie”) It is the next generation of streaming data Lake platform .Apache Hudi Bring core warehouse and database functions directly to the data Lake .Hudi Tables are provided , Business , Efficient upserts / Delete , Advanced index , Streaming ingestion service , Data cluster / Compression optimization and concurrency , At the same time, keep the data in open source file format .

        Apache Hudi Not only for streaming workloads , It also allows the creation of effective incremental batch pipelines . Include Uber, Amazon, ByteDance, Robinhood And more companies are using Hudi Transform their production data Lake .

        Apache Hudi It can be easily used on any cloud storage platform .Hudi Advanced performance optimization for , Analyze workloads using any popular query engine , Include Apache Spark,Flink,Presto,Trino,Hive etc. .

2. Hudi Position in big data

Hudi Introducing stream processing into big data , Provide fresh data , At the same time, it is one data order of magnitude higher than the traditional batch processing efficiency .

3. Hudi Characteristics of

  1. Fast upsert, Insertable index
  2. Operate data atomically and have rollback function
  3. Snapshot isolation between writer and query
  4. savepoint Save point for user data recovery
  5. Manage file size , Use statistics layout
  6. Asynchronously compress row and column data
  7. Have a timeline to track metadata lineage
  8. Optimize the data set by clustering

4. Hudi Release time of each version

github Official website address :Tags · apache/hudi · GitHub

Hudi Download address and feature description of each historical version :Download | Apache Hudi 


notes :Hudi The series of blog posts are through Hudi Written in the official website learning records , One of them is to add personal understanding , If there is any deficiency , Please understand

notes : Links to other related articles go here ( Include Hudi Blog posts related to big data, including ) ->  Basic knowledge points of big data A summary of the article


原网站

版权声明
本文为[Electro optic flicker]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202140248373647.html