当前位置：网站首页>Practice data Lake iceberg lesson 37 kakfa write the enfour, not enfour test of iceberg's icberg table

Practice data Lake iceberg lesson 37 kakfa write the enfour, not enfour test of iceberg's icberg table

2022-07-23 21:36:00 【*Spark*】

List of articles

List of articles

List of articles
Preface
One 、 Test ideas
Two 、 test not enforced Code
3、 ... and 、 Change it to enforce, Report errors
- 3.1 Test code
summary

Preface

test iceberg Read kafka The data of , Can it be based on kafka Upper id, When entering the lake , Auto update iceberg The data of , Test this scenario
test result ： You can't

One 、 Test ideas

from kafka Manufacturing data writing iceberg,iceberg Set up pk when , Observe whether to append write or update .

Two 、 test not enforced Code

2.1 Test code

Test ideas ： 1. select from kafka
2. insert to iceberg
The code is as follows ：

CREATE TABLE IF NOT EXISTS KafkaTableTest2_XXZH (
    `id` bigint,
    `data` STRING
) WITH (
    'connector' = 'kafka',
    'topic' = 'test2_xxzh',
    'properties.bootstrap.servers' = 'hadoop101:9092,hadoop102:9092,hadoop103:9092',
    'properties.group.id' = 'testGroup',
    'scan.startup.mode' = 'latest-offset',
    'csv.ignore-parse-errors'='true',
    'format' = 'csv'
);


CREATE CATALOG hive_iceberg_catalog WITH (
    'type'='iceberg',
    'catalog-type'='hive',
    'uri'='thrift://hadoop101:9083',
    'clients'='5',
    'property-version'='1',
    'warehouse'='hdfs:///user/hive/warehouse/hive_iceberg_catalog'
);
use catalog hive_iceberg_catalog;
CREATE TABLE IF NOT EXISTS ods_base.IcebergTest2_XXZH (
    `id` bigint,
    `data` STRING,
    primary key (id) not enforced
)with(
    'write.metadata.delete-after-commit.enabled'='true',
    'write.metadata.previous-versions-max'='5',
    'format-version'='2'
 );
 

 
 insert into  hive_iceberg_catalog.ods_base.IcebergTest2_XXZH select * from default_catalog.default_database.KafkaTableTest2_XXZH;

2.2 Manufacturing data

[[email protected] conf]#  kafka-console-producer.sh --broker-list  hadoop101:9092,hadoop102:9092,hadoop103:9092  --topic test2_xxzh
>1,abc
[2022-07-22 14:55:51,643] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 3 : {
    test2_xxzh=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
>2,bb
>3,cc
>4,dd
>5,ee
>3,cccc
>6,666
>4,ddddd
>

2.3 Running results

spark-sql (default)> select *  from ods_base.IcebergTest2_XXZH;
22/07/22 15:12:28 WARN HiveConf: HiveConf of name hive.metastore.event.db.notification.api.auth does not exist
id      data
3       cc
4       ddddd
5       ee
3       cccc
6       666
4       dd
Time taken: 0.405 seconds, Fetched 6 row(s)

flink-sql Results of operation ：
Insert picture description here

2.4 Operation conclusion

Not according to the kafka Declarative pk Yes iceberg updated . iceberg It is written in the appended mode .

3、 ... and 、 Change it to enforce, Report errors

3.1 Test code

iceberg Tabular pk Change it to enforced, Heavy run


Flink SQL> CREATE TABLE IF NOT EXISTS KafkaTableTest3_XXZH (
>     `id` bigint,
>     `data` STRING
> ) WITH (
>     'connector' = 'kafka',
>     'topic' = 'test2_xxzh',
>     'properties.bootstrap.servers' = 'hadoop101:9092,hadoop102:9092,hadoop103:9092',
>     'properties.group.id' = 'testGroup',
>     'scan.startup.mode' = 'latest-offset',
>     'csv.ignore-parse-errors'='true',
>     'format' = 'csv'
> );
> 
[INFO] Execute statement succeed.

Flink SQL> CREATE CATALOG hive_iceberg_catalog WITH (
>     'type'='iceberg',
>     'catalog-type'='hive',
>     'uri'='thrift://hadoop101:9083',
>     'clients'='5',
>     'property-version'='1',
>     'warehouse'='hdfs:///user/hive/warehouse/hive_iceberg_catalog'
> );
[INFO] Execute statement succeed.

Flink SQL> use catalog hive_iceberg_catalog;
[INFO] Execute statement succeed.

Flink SQL> CREATE TABLE IF NOT EXISTS ods_base.IcebergTest3_XXZH (
>     `id` bigint,
>     `data` STRING,
>     primary key (id) enforced
> )with(
>     'write.metadata.delete-after-commit.enabled'='true',
>     'write.metadata.previous-versions-max'='5',
>     'format-version'='2'
>  );
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.ValidationException: Flink doesn't support ENFORCED mode for PRIMARY KEY constraint. ENFORCED/NOT ENFORCED  controls if the constraint checks are performed on the incoming/outgoing data. Flink does not own the data therefore the only supported mode is the NOT ENFORCED mode

Error message ：
org.apache.flink.table.api.ValidationException: Flink doesn’t support ENFORCED mode for PRIMARY KEY constraint. ENFORCED/NOT ENFORCED controls if the constraint checks are performed on the incoming/outgoing data. Flink does not own the data therefore the only supported mode is the NOT ENFORCED mode

flink You don't own these data , Therefore, only supported modes are non strong .

Running results ：

Conclusion ： iceberg There is no basis pk On data update