当前位置:网站首页>Three ways to operate tables in Apache iceberg
Three ways to operate tables in Apache iceberg
2020-11-09 07:35:00 【osc_tjee7s】
stay Apache Iceberg There are many ways to create tables in , Among them is the use of Catalog How or how to implement org.apache.iceberg.Tables Interface . Let's briefly introduce how to use ..
If you want to know in time Spark、Hadoop perhaps HBase Related articles , Welcome to WeChat official account. : iteblog_hadoop
List of articles
Use Hive catalog
You can tell by the name ,Hive catalog It's through connections Hive Of MetaStore, hold Iceberg The table is stored in it , Its implementation class is org.apache.iceberg.hive.HiveCatalog, Here is the passage sparkContext Medium hadoopConfiguration To get HiveCatalog The way :
import org.apache.iceberg.hive.HiveCatalog;
Catalog catalog = new HiveCatalog(spark.sparkContext().hadoopConfiguration());
Catalog The interface defines the method of operation table , such as createTable, loadTable, renameTable, as well as dropTable. If you want to create a table , We need to define TableIdentifier, Tabular Schema And partition information , as follows :
import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.PartitionSpec;
import org.apache.iceberg.Schema;
TableIdentifier name = TableIdentifier.of("default", "iteblog");
Schema schema = new Schema(
Types.NestedField.required(1, "id", Types.IntegerType.get()),
Types.NestedField.optional(2, "name", Types.StringType.get()),
Types.NestedField.required(3, "age", Types.IntegerType.get()),
Types.NestedField.optional(4, "ts", Types.TimestampType.withZone())
);
PartitionSpec spec = PartitionSpec.builderFor(schema).year("ts").bucket("id", 2).build();
Table table = catalog.createTable(name, schema, spec);
Use Hadoop catalog
Hadoop catalog Do not rely on Hive MetaStore To store metadata , Its use HDFS Or a similar file system to store metadata . Be careful , File systems need to support atomic renaming operations , So the local file system (local FS)、 Object storage (S3、OSS etc. ) To store Apache Iceberg Metadata is not secure . Here's how to get HadoopCatalog Example :
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopCatalog;
Configuration conf = new Configuration();
String warehousePath = "hdfs://www.iteblog.com:8020/warehouse_path";
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);
and Hive catalog equally ,HadoopCatalog Also realize Catalog Interface , So it also implements various operations of the table , Include createTable, loadTable, as well as dropTable. Here's how to use HadoopCatalog To create Iceberg Example :
import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;
TableIdentifier name = TableIdentifier.of("logging", "logs");
Table table = catalog.createTable(name, schema, spec);
Use Hadoop tables
Iceberg It also supports storing in HDFS Table in table of contents . and Hadoop catalog equally , File systems need to support atomic renaming operations , So the local file system (local FS)、 Object storage (S3、OSS etc. ) To store Apache Iceberg Metadata is not secure . Tables stored in this way do not support various operations of the table , For example, it doesn't support renameTable. Here's how to get HadoopTables Example :
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopTables;
import org.apache.iceberg.Table;
Configuration conf = new Configuration():
HadoopTables tables = new HadoopTables(conf);
Table table = tables.create(schema, spec, table_location);
stay Spark in , It supports HiveCatalog、HadoopCatalog as well as HadoopTables Way to create 、 Load table . If the incoming table is not a path , select HiveCatalog, otherwise Spark It will be inferred that the table is stored in HDFS Upper .
Of course ,Apache Iceberg The storage place of table metadata is pluggable , So we can customize the way metadata is stored , such as AWS Just one for the community issue, Its handle Apache Iceberg Metadata in is stored in glue Inside , See #1633、#1608.
In addition to this blog post , It's all original !Please add : Reprinted from Past memory (https://www.iteblog.com/)
Link to this article : 【Apache Iceberg There are three ways to operate the table 】(https://www.iteblog.com/archives/9886.html)
版权声明
本文为[osc_tjee7s]所创,转载请带上原文链接,感谢
边栏推荐
猜你喜欢

14. Introduction to kubenetes

卧槽,这年轻人不讲武德,应届生凭“小抄”干掉5年老鸟,成功拿到字节20Koffer

Detailed analysis of OpenGL es framework (8) -- OpenGL es Design Guide

App crashed inexplicably. At first, it thought it was the case of the name in the header. Finally, it was found that it was the fault of the container!

1.操作系统是干什么的?

centos7下安装iperf时出现 make: *** No targets specified and no makefile found. Stop.的解决方案

商品管理系统——整合仓库服务以及获取仓库列表

如何通过Sidecar自定义资源减少Istio代理资源消耗

20201108编程练习——练习3

Leetcode-11: container with the most water
随机推荐
When iperf is installed under centos7, the solution of make: * no targets specified and no makefile found. Stop
salesforce零基础学习(九十八)Salesforce Connect & External Object
Installation record of SAP s / 4hana 2020
理论与实践相结合彻底理解CORS
20201108 programming exercise exercise 3
STS安装
深度优先搜索和广度优先搜索
Android emulator error: x86 emulation currently requires hardware acceleration的解决方案
亚马逊的无服务器总线EventBridge支持事件溯源 - AWS
Introduction to nmon
无法启动此程序,因为计算机中丢失 MSVCP120.dll。尝试安装该程序以解决此问题
老大问我:“建表为啥还设置个自增 id ?用流水号当主键不正好么?”
GDI 及OPENGL的区别
失业日志 11月5日
第五章编程
Combine theory with practice to understand CORS thoroughly
Depth first search and breadth first search
2. Introduction to computer hardware
LeetCode-15:三数之和
leetcode之反转字符串中的元音字母