当前位置:网站首页>Data Lake (20): Flink is compatible with iceberg, which is currently insufficient, and iceberg is compared with Hudi
Data Lake (20): Flink is compatible with iceberg, which is currently insufficient, and iceberg is compared with Hudi
2022-07-27 03:11:00 【Hua Weiyun】
Flink compatible Iceberg Current deficiencies and Iceberg And Hudi contrast
One 、Flink compatible Iceberg Not enough at present
- Iceberg Currently not supported Flink SQL Query metadata information of table , Need to use Java API Realization .
- Flink Creating with hidden partitions is not supported Iceberg surface
- Flink Do not support with WaterMark Of Iceberg surface
- Flink Adding columns is not supported 、 Delete column 、 Rename column operation .
- Flink Yes Iceberg Connector Support is not perfect .
Two 、Iceberg And Hudi contrast
Iceberg and Hudi It's all data Lake Technology , From the perspective of community activity ,Iceberg There is transcendence Hudi The trend of . They have the following in common :
- Both are data organization methods based on storage formats
- Provide ACID Ability , Provide certain transactions 、 Parallel execution capability
- Provide row level data modification capability .
- Provide a certain amount of Schema Expand capabilities , for example : newly added 、 modify 、 Delete column operation .
- Support data consolidation , Working with small files .
- Support Time travel Query snapshot data .
- Support batch and real-time data reading and writing
Iceberg And Hudi The difference between them lies in the following points :
- Iceberg Support Parquet、avro、orc data format ,Hudi Support Parquet and Avro Format .
- The data storage and query mechanisms of the two are different
Iceberg Only one table storage mode is supported , There is metadata file、manifest file and data file Form a storage structure , When querying, first find Metadata The metadata is then filtered to find the corresponding SnapShot Corresponding manifest files , Then find the corresponding data file .Hudi Two table storage modes are supported :Copy On Write( Merge on write ) and Merge On Read( Merge while reading ), When querying, directly read the corresponding snapshot data .
- When dealing with small file merging ,Iceberg Only support API Method to manually process and merge small files ,Hudi For small files, merge processing can be performed automatically according to the configuration .
- Spark And Iceberg and Hudi Integration time ,Iceberg Yes SparkSQL At present, our support is better .Spark And Hudi Integration is more Spark DataFrame API operation .
- About Schema aspect ,Iceberg Schema It is decoupled from the computing engine , Do not rely on any computing engine , and Hudi Of Schema Rely on the computing engine Schema.
边栏推荐
- Complete source code of mall applet project (wechat applet)
- iNFTnews | GGAC联合中国航天ASES 独家出品《中国2065典藏版》
- Make ppt timeline
- 仿知乎论坛社区社交微信小程序
- MarqueeView实现滑动展示效果
- 商城小程序项目完整源码(微信小程序)
- A math problem cost the chip giant $500million!
- Cs224w fall course - --- 1.1 why graphs?
- 使用 WebSocket 实现一个网页版的聊天室(摸鱼更隐蔽)
- Coco test dev test code
猜你喜欢

Single case mode (double check lock)

HCIP第十四天笔记

Inftnews | "traffic + experience" white lining e Digital Fashion Festival leads the new changes of digital fashion

Play a parallel multithreaded mcu-mc3172

单例模式(双检锁)

机器学习【Matplotlib】

抖音服务器带宽有多大,才能供上亿人同时刷?

智能指针shared_ptr、unique_ptr、weak_ptr

Concept of data asset management

百度云人脸识别
随机推荐
次轮Okaleido Tiger即将登录Binance NFT,引发社区热议
Manually build ABP framework from 0 -abp official complete solution and manually build simplified solution practice
Kubeadmin到底做了什么?
Plato Farm全新玩法,套利ePLATO稳获超高收益
coco test-dev 测试代码
Role of thread.sleep (0)
OD-Paper【3】:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
一个测试类了解BeanUtils.copyProperties
MarqueeView实现滑动展示效果
Redis四大特殊数据类型的学习和理解
A math problem cost the chip giant $500million!
185. 部门工资前三高的所有员工(必会)
红宝书第四版的一个错误?
Arduinouno drive RGB module full color effect example
哈希表与一致性哈希的原理理解以及应用
Inftnews | ggac and China Aerospace ases exclusively produce "China 2065 Collection Edition"
使用 WebSocket 实现一个网页版的聊天室(摸鱼更隐蔽)
万字长文,带你搞懂 Kubernetes 网络模型
Ten thousand words long text, take you to understand the kubernetes network model
力扣(LeetCode)207. 课程表(2022.07.26)