当前位置:网站首页>hudi记录
hudi记录
2022-06-30 03:03:00 【胖胖胖胖胖虎】
数据结构


---->>> Flink on hudi
https://www.jianshu.com/p/f509429c2f20
hudi 同步 hive
https://blog.csdn.net/hjl18309163914/article/details/107844269
建表语句 demo
CREATE TABLE sink_order_mysql_goods_order(
`goods_order_id` bigint COMMENT '自增主键id'
, `goods_order_uid` string COMMENT '订单uid'
, `customer_uid` string COMMENT '客户uid'
, `customer_name` string COMMENT '客户name'
, `student_uid` string COMMENT '学生uid'
, `order_status` bigint COMMENT '订单状态 1:待付款 2:部分付款 3:付款审核 4:已付款 5:已取消'
, `is_end` bigint COMMENT '订单是否完结 1.未完结 2.已完结'
, `discount_deduction` bigint COMMENT '优惠总金额(单位:分)'
, `contract_deduction` bigint COMMENT '老合同抵扣金额(单位:分)'
, `wallet_deduction` bigint COMMENT '钱包抵扣金额(单位:分)'
, `original_price` bigint COMMENT '订单原价(单位:分)'
, `real_price` bigint COMMENT '实付金额(单位:分)'
, `pay_success_time` timestamp(3) COMMENT '完全支付时间'
, `tags` string COMMENT '订单标签(1新签 2续费 3扩科 4报名-合新 5转班-合新 6续费-合新 7试听-合新)'
, `status` bigint COMMENT '是否有效(1.生效 2.失效 3.超时未付款)'
, `remark` string COMMENT '订单备注'
, `delete_flag` bigint COMMENT '是否删除(1.否,2.是)'
, `test_flag` bigint COMMENT '是否测试数据(1.否,2.是)'
, `create_time` timestamp(3) COMMENT '创建时间'
, `update_time` timestamp(3) COMMENT '更新时间'
, `create_by` string COMMENT '创建人uid(唯一标识)'
, `update_by` string COMMENT '更新人uid(唯一标识)'
, `belong_school` bigint COMMENT '归属校区'
, `share_sale_no` string COMMENT '共享销售工号'
,PRIMARY KEY (goods_order_id) NOT ENFORCED
) COMMENT '订单表'
WITH (
'connector' = 'hudi'
, 'path' = 'hdfs://hdfs-namenode-service:9000/hudi-warehouse/goods_order' ---路径会自动创建
, 'hoodie.datasource.write.recordkey.field' = 'goods_order_id' -- 主键
, 'write.precombine.field' = 'update_time' -- 相同的键值时,取此字段最大值,默认ts字段
, 'read.streaming.skip_compaction' = 'true' -- 避免重复消费问题
, 'write.bucket_assign.tasks' = '2' --并发写的 bucekt 数
, 'write.tasks' = '2'
, 'compaction.tasks' = '1'
, 'write.operation' = 'upsert' --UPSERT(插入更新)\INSERT(插入)\BULK_INSERT(批插入)(upsert性能会低些,不适合埋点上报)
, 'write.rate.limit' = '20000' -- 限制每秒多少条
, 'table.type' = 'COPY_ON_WRITE'
, 'compaction.async.enabled' = 'true' -- 在线压缩
, 'compaction.trigger.strategy' = 'num_or_time' -- 按次数压缩
, 'compaction.delta_commits' = '20' -- 默认为5
, 'compaction.delta_seconds' = '60' -- 默认为1小时
, 'hive_sync.enable' = 'true' -- 启用hive同步
, 'hive_sync.mode' = 'hms' -- 启用hive hms同步,默认jdbc
, 'hive_sync.metastore.uris' = 'thrift://hive-metastore-svc:9083' -- required, metastore的端口
, 'hive_sync.jdbc_url' = 'jdbc:hive2://hive-service-svc:10000' -- required, hiveServer地址
, 'hive_sync.table' = 'order_mysql_goods_order' -- required, hive 新建的表名 会自动同步hudi的表结构和数据到hive
, 'hive_sync.db' = 'cdc_ods' -- required, hive 新建的数据库名
, 'hive_sync.username' = 'root' -- required, HMS 用户名
, 'hive_sync.password' = '123456' -- required, HMS 密码
, 'hive_sync.skip_ro_suffix' = 'true' -- 去除ro后缀
);
Flink写数据到 hudi中,hive读取
https://blog.csdn.net/weixin_44131414/article/details/122983339
Flink SQL Kafka写入Hudi详解 hudi cow mor
https://www.233tw.com/database/117599
—>> 雨雀 hudi flink 答疑解惑
https://www.yuque.com/docs/share/01c98494-a980-414c-9c45-152023bf3c17?#IsoNU
问题记录
hive 同步 hudi 任务报错
CREATE TABLE flink_cdc_sink_hudi_hive(
uuid varchar(20),
name varchar(10),
age int,
ts timestamp(3),
dt varchar(20)
)
PARTITIONED BY (dt)
with(
'connector'='hudi',
'path'= 'hdfs://hdfs-namenode-service:9000/flink_cdc_sink_hudi_hive',
'table.type'= 'MERGE_ON_READ',
'hoodie.datasource.write.recordkey.field'= 'uuid',
'write.precombine.field'= 'ts',
'write.tasks'= '1',
'write.rate.limit'= '2000',
'compaction.tasks'= '1',
'compaction.async.enabled'= 'true',
'compaction.trigger.strategy'= 'num_commits',
'compaction.delta_commits'= '1',
'changelog.enabled'= 'true',
'read.streaming.enabled'= 'true',
'read.streaming.check-interval'= '3',
'hive_sync.enable'= 'true',
'hive_sync.mode'= 'hms',
'hive_sync.metastore.uris'= 'thrift://hive-metastore-svc:9083',
'hive_sync.jdbc_url'= 'jdbc:hive2://hive-service-svc:10000',
'hive_sync.table'= 'flink_cdc_sink_hudi_hive',
'hive_sync.db'= 'default',
'hive_sync.username'= 'root',
'hive_sync.password'= '123456',
'hive_sync.support_timestamp'= 'true'
);

报错如下:java.lang.ClassNotFoundException: org.apache.hudi.org.apache.hadoop.hive.conf.HiveConf
docker logs 


https://www.yuque.com/docs/share/01c98494-a980-414c-9c45-152023bf3c17?#IsoNU

mvn package -DskipTests -Drat.skip=true -Pflink-bundle-shade-hive2


接上面 hive 并未同步的到报错如下:




博客记录
1、Hudi查询&写入&常见问题汇总
边栏推荐
- 2022 new test questions for safety management personnel of metal and nonmetal mines (small open pit quarries) and certificate examination for safety management personnel of metal and nonmetal mines (s
- 在php中字符串的概念是什么
- *Write a program to initialize a string object with a vector < char> container*/
- [untitled]
- 链接乱码转义符
- Summary of interview and Employment Questions
- 2. successfully solved bug:exception when publishing [Failed to connect and initialize SSH connection...
- Which is a good foreign exchange trading platform? Is it safe to have regulated funds?
- C console format code
- How to realize remote collaborative office, keep this strategy!
猜你喜欢

HTA入门基础教程 | VBS脚本的GUI界面 HTA简明教程 ,附带完整历程及界面美化

Raki's notes on reading paper: Leveraging type descriptions for zero shot named entity recognition and classification

中断操作:AbortController学习笔记

如何在 JupyterLab 中把 ipykernel 切换到不同的 conda 虚拟环境?

F1C100S自制开发板调试过程

【微信小程序】条件渲染 列表渲染 原来这样用?

什么是外链和内链?

What is the concept of string in PHP

High paid programmers & interview questions series 63: talk about the differences between sleep (), yield (), join (), and wait ()

C # basic learning (XIII) | breakpoint debugging
随机推荐
HTA introductory basic tutorial | GUI interface of vbs script HTA concise tutorial, with complete course and interface beautification
Gulang bilibilibili Live Screen Jackie
Servlet面试题
Hands on in-depth learning notes (XV) 4.1 Multilayer perceptron
【实战技能】如何撰写敏捷开发文档
Mysql提取表字段中的字符串
zabbix 触发器详解
怎么利用Redis实现点赞功能
Raki's notes on reading paper: Leveraging type descriptions for zero shot named entity recognition and classification
List of development tools
GTK interface programming (I): Environment Construction
Auto.js学习笔记15:autojs的UI界面基础篇2
Use of Arthas
How does the trading platform for speculation in spot gold ensure capital security?
mysqldump原理
Study diary: February 15, 2022
重磅来袭--UE5的开源数字孪生解决方案
中断操作:AbortController学习笔记
Federal learning: dividing non IID samples by Dirichlet distribution
How to modify and add fields when MySQL table data is large