当前位置:网站首页>数仓分层设计及数据同步问题,,220728,,,,
数仓分层设计及数据同步问题,,220728,,,,
2022-07-29 08:08:00 【啊六六六】

制作技术架构图????????????
Hadoop容器:最容易遇到进程没有启动成功的问题
50070
8088



全量同步、全量覆盖、新增同步、新增及更新同步
快照表、全量表、增量表、拉链表

> >>
覆盖和追加
重定向:重新定义一个新的方向
>:输出重定向
< :输入重定向

ephemeral:短暂的;
文件本身算作一个副本,,

combiner,spark map预聚合,
join,reduce 中shuffle join,

Support

约束

主键、唯一、非空、外键、默认值

维度表数据量没那么大
维度:数据量少,很少发生拜年话
变化
每次都全量覆盖



维度退化:将维度退化到事实表
不是所有维度都能退化的
维度退化目的在于:减少维度表的个数,减少了关联的次数,来提高性能
维度退化缺点:冗余度增加
省市县乡三级联动,不能退化维度,,

维度建模流程:业务调研
选择业务过程:业务调研、数据调研

-m

找管理要,连接地址,,

--fields-terminated-by "\001" \
hive默认的分隔符

Hive:将HDFS与Hive表构建一个映射关系
location:指定Hive表对应的HDFS地址
不指定,默认/user/hive/warehouse
指定了:Hive表对应HDFS目录就是指定的目录
功能:存放Hive表的数据 目录
查询:Hive就去读取映射的HDFS目录

自动化建表需要依赖于Sqoop产生的Schema文件
这样做有个前提吧,就是oracle的表结构hive中能够适用
Sqoop会自动转换,并且Hive支持这个格式



功能:读取数据放入一个变量中
Linux:默认输入和输出都是命令行
不想输出在命令行,使用输出重定向
linux下:x????????

^^:将表名转换为大写
反斜杠,转义,,
分区表



命令不讲顺序,,
--outdir:指定将生成Java文件和Schema文件存储的位置
.java文件里面是些什么?
MapReduce执行文件,,
运行101文件,休眠30s,执行一两个小时,,
101个Schema + 1个备份压缩文件
py写数据处理的程序多,调度脚本一般都用shell

cur_time=`date "+%F %T"`
![]()
![]()
#!/usr/bin/env bash
# /bin/bash
biz_date=20210101
biz_fmt_date=2021-01-01
dw_parent_dir=/data/dw/ods/one_make/full_imp
workhome=/opt/sqoop/one_make
full_imp_tables=${workhome}/full_import_tables.txt
mkdir ${workhome}/log
orcl_srv=oracle.bigdata.cn
orcl_port=1521
orcl_sid=helowin
orcl_user=ciss
or_pwd=123456
sqoop_import_params="sqoop import -Dmapreduce.job.user.classpath.first=true --outer ${workhome}/java_code --as-avrodatafile"
sqoop_jdbc_params="--connect jdbc:oracle:thin:@${orcl_srv}$:{orcl_port}:${orcl_sid} --username ${orcl_user} --password ${orcl_pwd}"
#load hadoop/sqoop env
source /etc/profile
while read p:do
#parallel execution import
${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_data} --table ${p^^} -m 1&
#?????????
cur_time=`date"+%F %T"`
echo "${cur_time}:${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p} -m 1 &">>${workhome}/log/
${biz_fmt_date}_full_imp.log
sleep 30
done <${full_imp_tables} 

?这个变量是LINUX系统使用的,用于表示上个命令执行过程中是否有错误,没有错误则为0,那$? 就是取这个变量的值,亦即获取上一个命令的执行是否出错的标志,然后IF里和0做了比较。
p, --parents需要时创建上层目录,如目录早已存在则不当作错误
backup (文件等的)备份; 后援; 增援;
preview
画技术架构图????

看回顾md或视频???


重启yarn,重启spark中thriftServer,,

有时间把测试数据库制作自动化shell??
#!/usr/bin/env bash
# /bin/bash
biz_date=20210101
biz_fmt_date=2021-01-01
dw_parent_dir=/data/dw/ods/one_make/test_full_imp
workhome=/opt/datas/shell
full_imp_tables=${workhome}/test_full_table.txt
mkdir ${workhome}/log
orcl_srv=oracle.bigdata.cn
orcl_port=1521
orcl_sid=helowin
orcl_user=ciss
orcl_pwd=123456
sqoop_import_params="sqoop import -Dmapreduce.job.user.classpath.first=true --outdir ${workhome}/java_code --as-avrodatafile"
sqoop_jdbc_params="--connect jdbc:oracle:thin:@${orcl_srv}:${orcl_port}:${orcl_sid} --username ${orcl_user} --password ${orcl_pwd}"
# load hadoop/sqoop env
source /etc/profile
while read p; do
# parallel execution import
${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p^^} -m 1 &
cur_time=`date "+%F %T"`
echo "${cur_time}: ${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p} -m 1 &" >> ${workhome}/log/${biz_fmt_date}_full_imp.log
sleep 30
done < ${full_imp_tables}有时间看菜鸟shell语法??
边栏推荐
- [freeze electron microscope] analysis of the source code of the subtomogram alignment function of relion4.0 (for self use)
- Tb6600+stm32f407 test
- Simplefoc+platformio stepping on the path of the pit
- The software package is set to - > Yum source
- (视频+图文)机器学习入门系列-第5章 机器学习实践
- Rotation in model space and rotation in world space
- Preparation of SQL judgment statement
- Character shader exercise
- [cryoelectron microscope] relation4.0 - subtomogram tutorial
- CentOS deploy PostgreSQL 13
猜你喜欢

Qt/pyqt window type and window flag

Tcp/ip five layer reference model and corresponding typical devices and IPv6

Redshift 2.6.41 for maya2018 watermark removal

随机抽奖转盘微信小程序项目源码
![[paper reading | cryoet] gum net: fast and accurate 3D subtomo image alignment and average unsupervised geometric matching](/img/dc/255bf122d5243f2a08ca0e03b53137.png)
[paper reading | cryoet] gum net: fast and accurate 3D subtomo image alignment and average unsupervised geometric matching

Detailed explanation of two modes of FTP

Simplefoc parameter adjustment 2- speed and position control

Huawei wireless device configuration uses WDS technology to deploy WLAN services
![[beauty of software engineering - column notes] 29 | automated testing: how to kill bugs in the cradle?](/img/e1/8a61f85bf93801d842e78ab4f7edc7.png)
[beauty of software engineering - column notes] 29 | automated testing: how to kill bugs in the cradle?
![[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification](/img/0e/e5be0fffb154d081c20b09832530d4.png)
[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification
随机推荐
Lora opens a new era of Internet of things -asr6500s, asr6501/6502, asr6505, asr6601
Convert source package to RPM package
Unity多人联机框架Mirro学习记录(一)
[academic related] why can't many domestic scholars' AI papers be reproduced?
[beauty of software engineering - column notes] 25 | what methods can improve development efficiency?
Solve the problem that the disk is full due to large files
In an SQL file, a test table and data are defined above, and you can select* from the test table below
Very practical shell and shellcheck
torch.nn.functional.one_hot()
Nrf52832-qfaa Bluetooth wireless chip
简易计算器微信小程序项目源码
[lecture notes] how to do in-depth learning in poor data?
Unity Shader学习(六)实现雷达扫描效果
亚马逊测评自养号是什么,卖家应该怎么做?
Phased learning about the entry-level application of SQL Server statements - necessary for job hunting (I)
Alibaba political commissar system - Chapter 1: political commissars are built on companies
Use the cloud code to crack the problem of authentication code encountered during login
The computer system has no standard tcp/ip port processing operations
Solving linear programming problems based on MATLAB
Rotation in model space and rotation in world space