当前位置:网站首页>Data warehouse layered design and data synchronization,, 220728,,,,
Data warehouse layered design and data synchronization,, 220728,,,,
2022-07-29 08:16:00 【Ah, six six six】

Make technical architecture diagram ????????????
Hadoop Containers : It is most likely to encounter the problem that the process did not start successfully
50070
8088



Full amount of synchronization 、 Full coverage 、 New synchronization 、 Add and update synchronization
Snapshot table 、 Full scale 、 Increment table 、 Zipper table

> >>
Overlay and append
Redirect : Redefine a new direction
>: Output redirection
< : Input redirection

ephemeral: For a short time ;
The document itself counts as a copy ,,

combiner,spark map Prepolymerization ,
join,reduce in shuffle join,

Support

constraint

Primary key 、 only 、 Non empty 、 Foreign keys 、 The default value is

The data volume of dimension table is not so large
dimension : Less data , New Year greetings seldom occur
change
Every time, it is covered in full



Dimension degradation : Degenerate the dimension to the fact table
Not all dimensions can degenerate
The purpose of dimensional degradation is : Reduce the number of dimension tables , Reduce the number of associations , To improve performance
Dimension degradation disadvantages : Increased redundancy
Three level linkage of provinces, cities, counties and townships , Cannot degenerate dimension ,,

Dimension modeling process : Business Research
Select business process : Business Research 、 Data research

-m

Look for management , Connection address ,,

--fields-terminated-by "\001" \
hive Default separator

Hive: take HDFS And Hive Table builds a mapping relationship
location: Appoint Hive Table corresponding to the HDFS Address
Don't specify , Default /user/hive/warehouse
It specifies :Hive Table correspondence HDFS The directory is the designated directory
function : Deposit Hive The data table Catalog
Inquire about :Hive Just read the mapped HDFS Catalog

Automatic table creation depends on Sqoop Produced Schema file
There is a premise for this , Namely oracle The table structure hive Can be applied
Sqoop It will automatically switch , also Hive Support this format



function : Put the read data into a variable
Linux: The default input and output are both command lines
Don't want to output on the command line , Use output redirection
linux Next :x????????

^^: Convert table names to uppercase
The backslash , escape ,,
Partition table



Orders don't speak in order ,,
--outdir: Specifies that Java Document and Schema Where the file is stored
.java What's in the file ?
MapReduce Executable files ,,
function 101 file , Sleep 30s, Execute for oneortwo hours ,,
101 individual Schema + 1 Backup compressed files
py There are many programs to write data processing , Scheduling scripts are generally used shell

cur_time=`date "+%F %T"`
![]()
![]()
#!/usr/bin/env bash
# /bin/bash
biz_date=20210101
biz_fmt_date=2021-01-01
dw_parent_dir=/data/dw/ods/one_make/full_imp
workhome=/opt/sqoop/one_make
full_imp_tables=${workhome}/full_import_tables.txt
mkdir ${workhome}/log
orcl_srv=oracle.bigdata.cn
orcl_port=1521
orcl_sid=helowin
orcl_user=ciss
or_pwd=123456
sqoop_import_params="sqoop import -Dmapreduce.job.user.classpath.first=true --outer ${workhome}/java_code --as-avrodatafile"
sqoop_jdbc_params="--connect jdbc:oracle:thin:@${orcl_srv}$:{orcl_port}:${orcl_sid} --username ${orcl_user} --password ${orcl_pwd}"
#load hadoop/sqoop env
source /etc/profile
while read p:do
#parallel execution import
${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_data} --table ${p^^} -m 1&
#?????????
cur_time=`date"+%F %T"`
echo "${cur_time}:${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p} -m 1 &">>${workhome}/log/
${biz_fmt_date}_full_imp.log
sleep 30
done <${full_imp_tables} 

? This variable is LINUX system-used , Used to indicate whether there was an error during the execution of the last command , If there is no mistake, it is 0, that $? Is to take the value of this variable , That is, get the flag of whether the execution of the last command is wrong , then IF Li He 0 Made a comparison .
p, --parentsCreate the upper directory if necessary , If the directory already exists, it will not be regarded as an error
backup ( Documents etc. ) Backup ; backup forces ; reinforcements ;
preview
Draw a technical architecture ????

Look back md Or video ???


restart yarn, restart spark in thriftServer,,

Have time to automate the production of the test database shell??
#!/usr/bin/env bash
# /bin/bash
biz_date=20210101
biz_fmt_date=2021-01-01
dw_parent_dir=/data/dw/ods/one_make/test_full_imp
workhome=/opt/datas/shell
full_imp_tables=${workhome}/test_full_table.txt
mkdir ${workhome}/log
orcl_srv=oracle.bigdata.cn
orcl_port=1521
orcl_sid=helowin
orcl_user=ciss
orcl_pwd=123456
sqoop_import_params="sqoop import -Dmapreduce.job.user.classpath.first=true --outdir ${workhome}/java_code --as-avrodatafile"
sqoop_jdbc_params="--connect jdbc:oracle:thin:@${orcl_srv}:${orcl_port}:${orcl_sid} --username ${orcl_user} --password ${orcl_pwd}"
# load hadoop/sqoop env
source /etc/profile
while read p; do
# parallel execution import
${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p^^} -m 1 &
cur_time=`date "+%F %T"`
echo "${cur_time}: ${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p} -m 1 &" >> ${workhome}/log/${biz_fmt_date}_full_imp.log
sleep 30
done < ${full_imp_tables}Have time to watch rookies shell grammar ??
边栏推荐
- UE4 principle and difference between skylight and reflecting sphere
- 阿里巴巴政委体系-第一章、政委建在连队上
- How to connect VMware virtual machine to external network under physical machine win10 system
- [beauty of software engineering - column notes] 22 | how to do a good job in technology selection for the project?
- Simplefoc+platformio stepping on the path of the pit
- UE4 highlight official reference value
- Simplefoc parameter adjustment 1-torque control
- Cs4344 domestic substitute for dp4344 192K dual channel 24 bit DA converter
- 华为无线设备配置利用WDS技术部署WLAN业务
- [academic related] why can't many domestic scholars' AI papers be reproduced?
猜你喜欢

TCP - sliding window

Implementation of simple cubecap+fresnel shader in unity
![[beauty of software engineering - column notes] 25 | what methods can improve development efficiency?](/img/c8/c2d45abbf36b898040f9f1cf6274ff.png)
[beauty of software engineering - column notes] 25 | what methods can improve development efficiency?

Simplefoc parameter adjustment 2- speed and position control

110 MySQL interview questions and answers (continuously updated)

Unity Shader学习(六)实现雷达扫描效果
![[beauty of software engineering - column notes] 26 | continuous delivery: how to release new versions to the production environment at any time?](/img/65/79f876b62fa3db421e5038a2445b83.png)
[beauty of software engineering - column notes] 26 | continuous delivery: how to release new versions to the production environment at any time?
![[beauty of software engineering - column notes] 21 | architecture design: can ordinary programmers also implement complex systems?](/img/db/ef33a111bcb543f9704706049bccc2.png)
[beauty of software engineering - column notes] 21 | architecture design: can ordinary programmers also implement complex systems?

Unity beginner 3 - enemy movement control and setting of blood loss area (2D)

Proteus simulation based on msp430f2491 (realize water lamp)
随机推荐
Dynamically load data
Noise monitoring and sensing system
STM32 serial port garbled
Implementation of support vector machine with ml11 sklearn
Preparation of SQL judgment statement
[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification
Qt/PyQt 窗口类型与窗口标志
Huawei wireless device configuration uses WDS technology to deploy WLAN services
Some simple uses of crawler requests Library
Tcp/ip five layer reference model and corresponding typical devices and IPv6
产品推广的渠道和策略,化妆品品牌推广方法及步骤
Simplefoc parameter adjustment 1-torque control
Convert source package to RPM package
Intelligent temperature control system
Operator overloading
NFC two-way communication 13.56MHz contactless reader chip -- si512 replaces pn512
Solve the problem of MSVC2017 compiler with yellow exclamation mark in kits component of QT
[robomaster] control RM motor from scratch (2) -can communication principle and electric regulation communication protocol
[beauty of software engineering - column notes] 29 | automated testing: how to kill bugs in the cradle?
Mysql rownum 实现