当前位置:网站首页>数据湖(十九):SQL API 读取Kafka数据实时写入Iceberg表
数据湖(十九):SQL API 读取Kafka数据实时写入Iceberg表
2022-07-31 11:31:00 【Lanson】
SQL API 读取Kafka数据实时写入Iceberg表
从Kafka中实时读取数据写入到Iceberg表中,操作步骤如下:
一、首先需要创建对应的Iceberg表
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tblEnv = StreamTableEnvironment.create(env);
env.enableCheckpointing(1000);
//1.创建Catalog
tblEnv.executeSql("CREATE CATALOG hadoop_iceberg WITH (" +
"'type'='iceberg'," +
"'catalog-type'='hadoop'," +
"'warehouse'='hdfs://mycluster/flink_iceberg')");
//2.创建iceberg表 flink_iceberg_tbl
tblEnv.executeSql("create table hadoop_iceberg.iceberg_db.flink_iceberg_tbl3(id int,name string,age int,loc string) partitioned by (loc)");
二、编写代码读取Kafka数据实时写入Iceberg
public class ReadKafkaToIceberg {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tblEnv = StreamTableEnvironment.create(env);
env.enableCheckpointing(1000);
/**
* 1.需要预先创建 Catalog 及Iceberg表
*/
//1.创建Catalog
tblEnv.executeSql("CREATE CATALOG hadoop_iceberg WITH (" +
"'type'='iceberg'," +
"'catalog-type'='hadoop'," +
"'warehouse'='hdfs://mycluster/flink_iceberg')");
//2.创建iceberg表 flink_iceberg_tbl
// tblEnv.executeSql("create table hadoop_iceberg.iceberg_db.flink_iceberg_tbl3(id int,name string,age int,loc string) partitioned by (loc)");
//3.创建 Kafka Connector,连接消费Kafka中数据
tblEnv.executeSql("create table kafka_input_table(" +
" id int," +
" name varchar," +
" age int," +
" loc varchar" +
") with (" +
" 'connector' = 'kafka'," +
" 'topic' = 'flink-iceberg-topic'," +
" 'properties.bootstrap.servers'='node1:9092,node2:9092,node3:9092'," +
" 'scan.startup.mode'='latest-offset'," +
" 'properties.group.id' = 'my-group-id'," +
" 'format' = 'csv'" +
")");
//4.配置 table.dynamic-table-options.enabled
Configuration configuration = tblEnv.getConfig().getConfiguration();
// 支持SQL语法中的 OPTIONS 选项
configuration.setBoolean("table.dynamic-table-options.enabled", true);
//5.写入数据到表 flink_iceberg_tbl3
tblEnv.executeSql("insert into hadoop_iceberg.iceberg_db.flink_iceberg_tbl3 select id,name,age,loc from kafka_input_table");
//6.查询表数据
TableResult tableResult = tblEnv.executeSql("select * from hadoop_iceberg.iceberg_db.flink_iceberg_tbl3 /*+ OPTIONS('streaming'='true', 'monitor-interval'='1s')*/");
tableResult.print();
}
}
启动以上代码,向Kafka topic中生产如下数据:
1,zs,18,beijing
2,ls,19,shanghai
3,ww,20,beijing
4,ml,21,shanghai
我们可以看到控制台上有对应实时数据输出,查看对应的Icberg HDFS目录,数据写入成功。
边栏推荐
- AWS亚马逊云账号注册,免费申请12个月亚马逊云服务器详细教程
- WSL2安装.NET 6
- 2022/7/28
- 《MySQL高级篇》四、索引的存储结构
- 内网渗透学习(四)域横向移动——SMB和WMI服务利用
- Use jOOQ to write vendor-agnostic SQL with JPA's native query or @Formula.
- Data Persistence Technology - MP
- Yarn安装配置(vsftpd安装配置)
- [Virtualization ecological platform] Raspberry Pi installation virtualization platform operation process
- [Part 1 of Cloud Native Monitoring Series] A detailed explanation of Prometheus monitoring system
猜你喜欢
Redis学习笔记-3.慢查询和其他高级数据结构
5 个开源的 Rust Web 开发框架,你选择哪个?
mysql 索引使用与优化
准确率(Accuracy)、精度(Precision)、召回率(Recall)和 mAP 的图解
最新MySql安装教学,非常详细
Docker build Mysql master-slave replication
ApiPost is really fragrant and powerful, it's time to throw away Postman and Swagger
The item 'node.exe' was not recognized as the name of a cmdlet, function, script file, or runnable program.
502 bad gateway原因、解决方法
unity computeshader的可读写buffer
随机推荐
线程池 ThreadPoolExecutor 详解
[Virtualization Ecological Platform] Platform Architecture Diagram & Ideas and Implementation Details
[Part 1 of Cloud Native Monitoring Series] A detailed explanation of Prometheus monitoring system
MySQL index usage and optimization
学自动化测试哪个培训机构好 试听课程后就选了这个地方学习
The principle of v-model
[ 图 论 ]二分图判定及其匹配(基础+提高)
mysql automatically adds creation time and update time
mysql 自动添加创建时间、更新时间
xmind使用指南(XMind具有下列哪些功能)
deeplab implements its own remote sensing geological segmentation dataset
MySQL模糊查询性能优化
Master SSR
《MySQL高级篇》四、索引的存储结构
Can I find a Go job in 7 days?Learn Go with arrays and pointers
Distributed Transactions - Introduction to Distributed Transactions, Distributed Transaction Framework Seata (AT Mode, Tcc Mode, Tcc Vs AT), Distributed Transactions - MQ
Initial JDBC programming
最新MySql安装教学,非常详细
数据持久化技术——MP
B/S架构模式的一个整体执行流程