当前位置:网站首页>使用Sqoop把ADS层数据导出到MySQL
使用Sqoop把ADS层数据导出到MySQL
2022-07-02 09:42:00 【小基基o_O】
背景
- 使用Sqoop把ADS层数据导出到MySQL
- 使用
sqoop export
时要添加--columns
,避免一些奇奇怪怪的报错 - 使用正则表达式获取字段名
流程
- ADS层不分区,不压缩,行存
- ADS层建表SQL要有单独的文件,如果表更新就要更新该文件的建表语句
- 表名:ADS层的HIVE表有
ads_
前缀,对应到MySQL建表时去掉前缀 - 字段:ADS层表和MySQL表的 字段名及字段顺序都要一致,用`符号包裹
- 遍历ADS层建表语句,使用正则表达式获取 表名、所有字段名
- 传参到Sqoop命令
代码
ADS层建表语句(ADS层建表.sql
)
-- HIVE建表语句,字段用`符号包裹,表名不需要包裹
CREATE EXTERNAL TABLE ads_purchase_order_info (
`prch_order_id` BIGINT COMMENT '采购订单头id',
`exfactory_total_price` DOUBLE COMMENT '出厂价总额',
`insert_time` STRING COMMENT '数据插入日期'
) COMMENT '采购信息';
MySQL建表语句
CREATE TABLE purchase_order_info (
`prch_order_id` bigint COMMENT '采购订单头id',
`exfactory_total_price` DOUBLE COMMENT '出厂价总额',
`insert_time` text COMMENT '数据插入日期',
PRIMARY KEY (`prch_order_id`)
) COMMENT '采购信息';
Python
class Sqoop(Shell):
def sqoop(self, cmd):
return self.sh_cmd_and_alert(' '.join(cmd.split()))
def sqoop_export(self, mysql_tb, export_dir, columns='', update_mode='allowinsert', update_key='prch_order_id'):
""" --columns缺省默认是全部列;建议加上,避免一些莫名其妙的bug --update-mode缺省默认是updateonly,可改为allowinsert --update-key是用于更新的锚列;多个列用逗号分隔 """
return self.sqoop(r''' {sqoop} export --connect jdbc:mysql://{host}:{port}/{database} --username '{username}' --password '{password}' --table {table} --num-mappers 1 --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --export-dir '{export_dir}' {columns} '''.format(
sqoop=self.get('sqoop', 'sqoop'),
host=self.get('mysql_host', 'localhost'),
port=self.get('mysql_port', '3306'),
database=self['mysql_db'],
username=self.get('mysql_user', 'root'),
password=self['mysql_pwd'],
table=mysql_tb,
export_dir=export_dir,
columns=columns,
))
from re import findall
s = get_sqoop()
for ads_ddl in read_sql_file('ADS层建表.sql').split(';')[:-1]:
columns = '--columns ' + ','.join(findall('`([^`]+)`', ads_ddl))
hive_tb = findall(r'CREATE EXTERNAL TABLE (\S+)', ads_ddl)[0]
mysql_tb = hive_tb.replace('ads_', '')
print(s.sqoop_export(mysql_tb, EXPORT_DIR_PREFIX + hive_tb, columns))
因为是你
边栏推荐
- Enter the top six! Boyun's sales ranking in China's cloud management software market continues to rise
- mysql索引和事务
- 【2022 ACTF-wp】
- (C语言)八进制转换十进制
- Map和Set
- Natural language processing series (I) -- RNN Foundation
- Take you ten days to easily finish the finale of go micro services (distributed transactions)
- 基于Arduino和ESP8266的连接手机热点实验(成功)
- Differences between nodes and sharding in ES cluster
- Orb-slam2 data sharing and transmission between different threads
猜你喜欢
jenkins 凭证管理
YYGH-9-预约下单
MSI announced that its motherboard products will cancel all paper accessories
SVO2系列之深度滤波DepthFilter
Jenkins voucher management
自然语言处理系列(三)——LSTM
GGPUBR: HOW TO ADD ADJUSTED P-VALUES TO A MULTI-PANEL GGPLOT
动态内存(进阶四)
Thesis translation: 2022_ PACDNN: A phase-aware composite deep neural network for speech enhancement
Larvel modify table fields
随机推荐
YYGH-BUG-04
Leetcode14 longest public prefix
Dynamic memory (advanced 4)
jenkins 凭证管理
字符串回文hash 模板题 O(1)判字符串是否回文
On data preprocessing in sklearn
[C language] Yang Hui triangle, customize the number of lines of the triangle
XSS labs master shooting range environment construction and 1-6 problem solving ideas
[QT] Qt development environment installation (QT version 5.14.2 | QT download | QT installation)
Implementation of address book (file version)
还不会安装WSL 2?看这一篇文章就够了
Natural language processing series (III) -- LSTM
How to Easily Create Barplots with Error Bars in R
Filtre de profondeur de la série svo2
SCM power supply
HOW TO ADD P-VALUES ONTO A GROUPED GGPLOT USING THE GGPUBR R PACKAGE
多文件程序X32dbg动态调试
Leetcode14 最长公共前缀
CMake交叉编译
php 二维、多维 数组打乱顺序,PHP_php打乱数组二维数组多维数组的简单实例,php中的shuffle函数只能打乱一维