当前位置:网站首页>Data Lake (20): Flink is compatible with iceberg, which is currently insufficient, and iceberg is compared with Hudi
Data Lake (20): Flink is compatible with iceberg, which is currently insufficient, and iceberg is compared with Hudi
2022-07-27 12:21:00 【51CTO】
Flink compatible Iceberg Current deficiencies and Iceberg And Hudi contrast
One 、Flink compatible Iceberg Not enough at present
- Iceberg Currently not supported Flink SQL Query metadata information of table , Need to use Java API Realization .
- Flink Creating with hidden partitions is not supported Iceberg surface
- Flink Do not support with WaterMark Of Iceberg surface
- Flink Adding columns is not supported 、 Delete column 、 Rename column operation .
- Flink Yes Iceberg Connector Support is not perfect .
Two 、Iceberg And Hudi contrast
Iceberg and Hudi It's all data Lake Technology , From the perspective of community activity ,Iceberg There is transcendence Hudi The trend of . They have the following in common :
- Both are data organization methods based on storage formats
- Provide ACID Ability , Provide certain transactions 、 Parallel execution capability
- Provide row level data modification capability .
- Provide a certain amount of Schema Expand capabilities , for example : newly added 、 modify 、 Delete column operation .
- Support data consolidation , Working with small files .
- Support Time travel Query snapshot data .
- Support batch and real-time data reading and writing
Iceberg And Hudi The difference between them lies in the following points :
- Iceberg Support Parquet、avro、orc data format ,Hudi Support Parquet and Avro Format .
- The data storage and query mechanisms of the two are different
Iceberg Only one table storage mode is supported , There is metadata file、manifest file and data file Form a storage structure , When querying, first find Metadata The metadata is then filtered to find the corresponding SnapShot Corresponding manifest files , Then find the corresponding data file .Hudi Two table storage modes are supported :Copy On Write( Merge on write ) and Merge On Read( Merge while reading ), When querying, directly read the corresponding snapshot data .
- When dealing with small file merging ,Iceberg Only support API Method to manually process and merge small files ,Hudi For small files, merge processing can be performed automatically according to the configuration .
- Spark And Iceberg and Hudi Integration time ,Iceberg Yes SparkSQL At present, our support is better .Spark And Hudi Integration is more Spark DataFrame API operation .
- About Schema aspect ,Iceberg Schema It is decoupled from the computing engine , Do not rely on any computing engine , and Hudi Of Schema Rely on the computing engine Schema.
边栏推荐
- About offline caching application cache / using manifest file caching
- Sword finger offer notes: t57 - I. and two numbers of S
- USB network card drive data stream
- 查看系统下各个进程打开的文件描述符数量
- Several rounds of SQL queries in a database
- Chapter 13 IO flow
- JVM memory model
- While loop instance in shell
- @Postconstruct annotations and initializingbean perform some initialization operations after bean instantiation
- One article to understand the index of like in MySQL
猜你喜欢

The first case of monkeypox in pregnant women in the United States: the newborn was injected with immunoglobulin and was safely born

Chapter 7 exception handling

快抖抢救“失意人”

Alibaba cloud RDS exception during pool initialization

JS parasitic combinatorial inheritance

B 站 713 事故后的多活容灾建设|TakinTalks 大咖分享

Chapter 8 multithreading

严控室外作业时间!佛山住建局发文:加强高温期间建筑施工安全管理

【数据库数据恢复】SQL Server数据库所在磁盘分区空间不足报错的数据恢复案例

MySQL数据库主从复制集群原理概念以及搭建流程
随机推荐
shell中的while循环实例
I/O实例操作
The chess robot "broke" the chess boy's finger...
Unexpected harvest of epic distributed resources, from basic to advanced are full of dry goods, big guys are strong!
2021-3-22-tencent - minimum number of guards
20210419 combined sum
POJ1988_Cube Stacking
Sword finger offer notes: t58 - ii Rotate string left
matlab二分法例题(用二分法求零点例题)
Current situation and development trend of accounting computerization
JVM memory model
Image segmentation vs Adobe Photoshop (PS)
How to make a graph? Multiple subgraphs in a graph are histogram (or other graphs)
评价自动化测试优劣的隐性指标
JS parasitic combinatorial inheritance
Transactions in MySQL
图像分割 vs Adobephotoshop(PS)
20210519 leetcode double pointer
银行人脸识别系统被攻破:一储户被偷走 43 万元
V-show failure