当前位置:网站首页>Three ways to operate tables in Apache iceberg

Three ways to operate tables in Apache iceberg

2020-11-09 07:35:00 osc_tjee7s

stay Apache Iceberg There are many ways to create tables in , Among them is the use of Catalog How or how to implement org.apache.iceberg.Tables Interface . Let's briefly introduce how to use ..

Apache iceberg:Java API Quickstart
If you want to know in time Spark、Hadoop perhaps HBase Related articles , Welcome to WeChat official account. : iteblog_hadoop

Use Hive catalog

You can tell by the name ,Hive catalog It's through connections Hive Of MetaStore, hold Iceberg The table is stored in it , Its implementation class is org.apache.iceberg.hive.HiveCatalog, Here is the passage sparkContext Medium hadoopConfiguration To get HiveCatalog The way :

import org.apache.iceberg.hive.HiveCatalog;

Catalog catalog = new HiveCatalog(spark.sparkContext().hadoopConfiguration());

Catalog The interface defines the method of operation table , such as createTable, loadTable, renameTable, as well as dropTable. If you want to create a table , We need to define TableIdentifier, Tabular Schema And partition information , as follows :

import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.PartitionSpec;
import org.apache.iceberg.Schema;

TableIdentifier name = TableIdentifier.of("default", "iteblog");
Schema schema = new Schema(
    Types.NestedField.required(1, "id", Types.IntegerType.get()),
    Types.NestedField.optional(2, "name", Types.StringType.get()),
    Types.NestedField.required(3, "age", Types.IntegerType.get()),
    Types.NestedField.optional(4, "ts", Types.TimestampType.withZone())
);

PartitionSpec spec = PartitionSpec.builderFor(schema).year("ts").bucket("id", 2).build();
Table table = catalog.createTable(name, schema, spec);

Use Hadoop catalog

Hadoop catalog Do not rely on Hive MetaStore To store metadata , Its use HDFS Or a similar file system to store metadata . Be careful , File systems need to support atomic renaming operations , So the local file system (local FS)、 Object storage (S3、OSS etc. ) To store Apache Iceberg Metadata is not secure . Here's how to get HadoopCatalog Example :

import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopCatalog;

Configuration conf = new Configuration();
String warehousePath = "hdfs://www.iteblog.com:8020/warehouse_path";
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath);

and Hive catalog equally ,HadoopCatalog Also realize Catalog Interface , So it also implements various operations of the table , Include createTable, loadTable, as well as dropTable. Here's how to use HadoopCatalog To create Iceberg Example :

import org.apache.iceberg.Table;
import org.apache.iceberg.catalog.TableIdentifier;

TableIdentifier name = TableIdentifier.of("logging", "logs");
Table table = catalog.createTable(name, schema, spec);

Use Hadoop tables

Iceberg It also supports storing in HDFS Table in table of contents . and Hadoop catalog equally , File systems need to support atomic renaming operations , So the local file system (local FS)、 Object storage (S3、OSS etc. ) To store Apache Iceberg Metadata is not secure . Tables stored in this way do not support various operations of the table , For example, it doesn't support renameTable. Here's how to get HadoopTables Example :

import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.hadoop.HadoopTables;
import org.apache.iceberg.Table;

Configuration conf = new Configuration():
HadoopTables tables = new HadoopTables(conf);
Table table = tables.create(schema, spec, table_location);

stay Spark in , It supports HiveCatalog、HadoopCatalog as well as HadoopTables Way to create 、 Load table . If the incoming table is not a path , select HiveCatalog, otherwise Spark It will be inferred that the table is stored in HDFS Upper .

Of course ,Apache Iceberg The storage place of table metadata is pluggable , So we can customize the way metadata is stored , such as AWS Just one for the community issue, Its handle Apache Iceberg Metadata in is stored in glue Inside , See #1633#1608.

In addition to this blog post , It's all original !
Please add : Reprinted from Past memory (https://www.iteblog.com/)
Link to this article : 【Apache Iceberg There are three ways to operate the table 】(https://www.iteblog.com/archives/9886.html)

like (1) Appreciation Share (0)

版权声明
本文为[osc_tjee7s]所创,转载请带上原文链接,感谢