当前位置:网站首页>使用Sqoop把ADS层数据导出到MySQL
使用Sqoop把ADS层数据导出到MySQL
2022-07-02 09:42:00 【小基基o_O】
背景
- 使用Sqoop把ADS层数据导出到MySQL
- 使用
sqoop export时要添加--columns,避免一些奇奇怪怪的报错 - 使用正则表达式获取字段名
流程
- ADS层不分区,不压缩,行存
- ADS层建表SQL要有单独的文件,如果表更新就要更新该文件的建表语句
- 表名:ADS层的HIVE表有
ads_前缀,对应到MySQL建表时去掉前缀 - 字段:ADS层表和MySQL表的 字段名及字段顺序都要一致,用`符号包裹
- 遍历ADS层建表语句,使用正则表达式获取 表名、所有字段名
- 传参到Sqoop命令
代码
ADS层建表语句(ADS层建表.sql)
-- HIVE建表语句,字段用`符号包裹,表名不需要包裹
CREATE EXTERNAL TABLE ads_purchase_order_info (
`prch_order_id` BIGINT COMMENT '采购订单头id',
`exfactory_total_price` DOUBLE COMMENT '出厂价总额',
`insert_time` STRING COMMENT '数据插入日期'
) COMMENT '采购信息';
MySQL建表语句
CREATE TABLE purchase_order_info (
`prch_order_id` bigint COMMENT '采购订单头id',
`exfactory_total_price` DOUBLE COMMENT '出厂价总额',
`insert_time` text COMMENT '数据插入日期',
PRIMARY KEY (`prch_order_id`)
) COMMENT '采购信息';
Python
class Sqoop(Shell):
def sqoop(self, cmd):
return self.sh_cmd_and_alert(' '.join(cmd.split()))
def sqoop_export(self, mysql_tb, export_dir, columns='', update_mode='allowinsert', update_key='prch_order_id'):
""" --columns缺省默认是全部列;建议加上,避免一些莫名其妙的bug --update-mode缺省默认是updateonly,可改为allowinsert --update-key是用于更新的锚列;多个列用逗号分隔 """
return self.sqoop(r''' {sqoop} export --connect jdbc:mysql://{host}:{port}/{database} --username '{username}' --password '{password}' --table {table} --num-mappers 1 --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --export-dir '{export_dir}' {columns} '''.format(
sqoop=self.get('sqoop', 'sqoop'),
host=self.get('mysql_host', 'localhost'),
port=self.get('mysql_port', '3306'),
database=self['mysql_db'],
username=self.get('mysql_user', 'root'),
password=self['mysql_pwd'],
table=mysql_tb,
export_dir=export_dir,
columns=columns,
))
from re import findall
s = get_sqoop()
for ads_ddl in read_sql_file('ADS层建表.sql').split(';')[:-1]:
columns = '--columns ' + ','.join(findall('`([^`]+)`', ads_ddl))
hive_tb = findall(r'CREATE EXTERNAL TABLE (\S+)', ads_ddl)[0]
mysql_tb = hive_tb.replace('ads_', '')
print(s.sqoop_export(mysql_tb, EXPORT_DIR_PREFIX + hive_tb, columns))
因为是你
边栏推荐
- ES集群中节点与分片的区别
- HOW TO CREATE AN INTERACTIVE CORRELATION MATRIX HEATMAP IN R
- uniapp uni-list-item @click,uniapp uni-list-item带参数跳转
- 还不会安装WSL 2?看这一篇文章就够了
- FastDateFormat为什么线程安全
- YYGH-10-微信支付
- On data preprocessing in sklearn
- 全链路压测
- SSH automatically disconnects (pretends to be dead) after a period of no operation
- 倍增 LCA(最近公共祖先)
猜你喜欢

Small guide for rapid formation of manipulator (VII): description method of position and posture of manipulator

ES集群中节点与分片的区别

Power Spectral Density Estimates Using FFT---MATLAB

Mish shake the new successor of the deep learning relu activation function

GGPUBR: HOW TO ADD ADJUSTED P-VALUES TO A MULTI-PANEL GGPLOT

Natural language processing series (II) -- building character level language model using RNN

From scratch, develop a web office suite (3): mouse events

基于Arduino和ESP8266的Blink代码运行成功(包含错误分析)

排序---

CONDA common command summary
随机推荐
Jenkins voucher management
(C语言)八进制转换十进制
ORB-SLAM2不同线程间的数据共享与传递
YYGH-BUG-04
Implementation of address book (file version)
机械臂速成小指南(七):机械臂位姿的描述方法
[C language] Yang Hui triangle, customize the number of lines of the triangle
CDA数据分析——Excel数据处理的常见知识点归纳
Natural language processing series (III) -- LSTM
初始JDBC 编程
Take you ten days to easily finish the finale of go micro services (distributed transactions)
PgSQL string is converted to array and associated with other tables, which are displayed in the original order after matching and splicing
YYGH-BUG-05
How to Easily Create Barplots with Error Bars in R
Analyse de l'industrie
Leetcode739 每日温度
二分刷题记录(洛谷题单)区间的甄别
自然语言处理系列(一)——RNN基础
Deep understanding of NN in pytorch Embedding
ESP32 Arduino 引入LVGL 碰到的一些问题
