当前位置：网站首页>Three ways to operate tables in Apache iceberg

Three ways to operate tables in Apache iceberg

2020-11-09 07:35:00 【osc_tjee7s】

Want to make APP Same thing as WeChat , Can run small programs smoothly ？ | Experience will send you to Xinjiang 、 Huawei 、 Cherry keyboard ！>>>

stay Apache Iceberg There are many ways to create tables in , Among them is the use of Catalog How or how to implement org.apache.iceberg.Tables Interface . Let's briefly introduce how to use ..

If you want to know in time Spark、Hadoop perhaps HBase Related articles , Welcome to WeChat official account. ： iteblog_hadoop

List of articles

1 Use Hive catalog
2 Use Hadoop catalog
3 Use Hadoop tables

Use Hive catalog

You can tell by the name ,Hive catalog It's through connections Hive Of MetaStore, hold Iceberg The table is stored in it , Its implementation class is org.apache.iceberg.hive.HiveCatalog, Here is the passage sparkContext Medium hadoopConfiguration To get HiveCatalog The way ：

import org.apache.iceberg.hive.HiveCatalog;

Catalog catalog = new HiveCatalog(spark.sparkContext().hadoopConfiguration());

Catalog The interface defines the method of operation table , such as createTable, loadTable, renameTable, as well as dropTable. If you want to create a table , We need to define TableIdentifier, Tabular Schema And partition information , as follows ：

import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.PartitionSpec;
import org.apache.iceberg.Schema;

TableIdentifier name = TableIdentifier.of("default", "iteblog");
Schema schema = new Schema(
    Types.NestedField.required(1, "id", Types.IntegerType.get()),
    Types.NestedField.optional(2, "name", Types.StringType.get()),
    Types.NestedField.required(3, "age", Types.IntegerType.get()),
    Types.NestedField.optional(4, "ts", Types.TimestampType.withZone())
);

PartitionSpec spec = PartitionSpec.builderFor(schema).year("ts").bucket("id", 2).build();
Table table = catalog.createTable(name, schema, spec);

Use Hadoop catalog

Hadoop catalog Do not rely on Hive MetaStore To store metadata , Its use HDFS Or a similar file system to store metadata . Be careful , File systems need to support atomic renaming operations , So the local file system （local FS）、 Object storage （S3、OSS etc. ） To store Apache Iceberg Metadata is not secure . Here's how to get HadoopCatalog Example ：

import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopCatalog;

Configuration conf = new Configuration();
String warehousePath = "hdfs://www.iteblog.com:8020/warehouse_path";
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);

and Hive catalog equally ,HadoopCatalog Also realize Catalog Interface , So it also implements various operations of the table , Include createTable, loadTable, as well as dropTable. Here's how to use HadoopCatalog To create Iceberg Example ：

import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;

TableIdentifier name = TableIdentifier.of("logging", "logs");
Table table = catalog.createTable(name, schema, spec);

Use Hadoop tables

Iceberg It also supports storing in HDFS Table in table of contents . and Hadoop catalog equally , File systems need to support atomic renaming operations , So the local file system （local FS）、 Object storage （S3、OSS etc. ） To store Apache Iceberg Metadata is not secure . Tables stored in this way do not support various operations of the table , For example, it doesn't support renameTable. Here's how to get HadoopTables Example ：

import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopTables;
import org.apache.iceberg.Table;

Configuration conf = new Configuration():
HadoopTables tables = new HadoopTables(conf);
Table table = tables.create(schema, spec, table_location);

stay Spark in , It supports HiveCatalog、HadoopCatalog as well as HadoopTables Way to create 、 Load table . If the incoming table is not a path , select HiveCatalog, otherwise Spark It will be inferred that the table is stored in HDFS Upper .

Of course ,Apache Iceberg The storage place of table metadata is pluggable , So we can customize the way metadata is stored , such as AWS Just one for the community issue, Its handle Apache Iceberg Metadata in is stored in glue Inside , See #1633、#1608.

In addition to this blog post , It's all original ！
Please add ： Reprinted from Past memory （https://www.iteblog.com/）
Link to this article : 【Apache Iceberg There are three ways to operate the table 】（https://www.iteblog.com/archives/9886.html）

版权声明
本文为[osc_tjee7s]所创，转载请带上原文链接，感谢

当前位置：网站首页>Three ways to operate tables in Apache iceberg

Three ways to operate tables in Apache iceberg

Use Hive catalog

Use Hadoop catalog

Use Hadoop tables

边栏推荐

猜你喜欢

随机推荐