当前位置:网站首页>Data warehouse layered design and data synchronization,, 220728,,,,
Data warehouse layered design and data synchronization,, 220728,,,,
2022-07-29 08:16:00 【Ah, six six six】

Make technical architecture diagram ????????????
Hadoop Containers : It is most likely to encounter the problem that the process did not start successfully
50070
8088



Full amount of synchronization 、 Full coverage 、 New synchronization 、 Add and update synchronization
Snapshot table 、 Full scale 、 Increment table 、 Zipper table

> >>
Overlay and append
Redirect : Redefine a new direction
>: Output redirection
< : Input redirection

ephemeral: For a short time ;
The document itself counts as a copy ,,

combiner,spark map Prepolymerization ,
join,reduce in shuffle join,

Support

constraint

Primary key 、 only 、 Non empty 、 Foreign keys 、 The default value is

The data volume of dimension table is not so large
dimension : Less data , New Year greetings seldom occur
change
Every time, it is covered in full



Dimension degradation : Degenerate the dimension to the fact table
Not all dimensions can degenerate
The purpose of dimensional degradation is : Reduce the number of dimension tables , Reduce the number of associations , To improve performance
Dimension degradation disadvantages : Increased redundancy
Three level linkage of provinces, cities, counties and townships , Cannot degenerate dimension ,,

Dimension modeling process : Business Research
Select business process : Business Research 、 Data research

-m

Look for management , Connection address ,,

--fields-terminated-by "\001" \
hive Default separator

Hive: take HDFS And Hive Table builds a mapping relationship
location: Appoint Hive Table corresponding to the HDFS Address
Don't specify , Default /user/hive/warehouse
It specifies :Hive Table correspondence HDFS The directory is the designated directory
function : Deposit Hive The data table Catalog
Inquire about :Hive Just read the mapped HDFS Catalog

Automatic table creation depends on Sqoop Produced Schema file
There is a premise for this , Namely oracle The table structure hive Can be applied
Sqoop It will automatically switch , also Hive Support this format



function : Put the read data into a variable
Linux: The default input and output are both command lines
Don't want to output on the command line , Use output redirection
linux Next :x????????

^^: Convert table names to uppercase
The backslash , escape ,,
Partition table



Orders don't speak in order ,,
--outdir: Specifies that Java Document and Schema Where the file is stored
.java What's in the file ?
MapReduce Executable files ,,
function 101 file , Sleep 30s, Execute for oneortwo hours ,,
101 individual Schema + 1 Backup compressed files
py There are many programs to write data processing , Scheduling scripts are generally used shell

cur_time=`date "+%F %T"`
![]()
![]()
#!/usr/bin/env bash
# /bin/bash
biz_date=20210101
biz_fmt_date=2021-01-01
dw_parent_dir=/data/dw/ods/one_make/full_imp
workhome=/opt/sqoop/one_make
full_imp_tables=${workhome}/full_import_tables.txt
mkdir ${workhome}/log
orcl_srv=oracle.bigdata.cn
orcl_port=1521
orcl_sid=helowin
orcl_user=ciss
or_pwd=123456
sqoop_import_params="sqoop import -Dmapreduce.job.user.classpath.first=true --outer ${workhome}/java_code --as-avrodatafile"
sqoop_jdbc_params="--connect jdbc:oracle:thin:@${orcl_srv}$:{orcl_port}:${orcl_sid} --username ${orcl_user} --password ${orcl_pwd}"
#load hadoop/sqoop env
source /etc/profile
while read p:do
#parallel execution import
${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_data} --table ${p^^} -m 1&
#?????????
cur_time=`date"+%F %T"`
echo "${cur_time}:${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p} -m 1 &">>${workhome}/log/
${biz_fmt_date}_full_imp.log
sleep 30
done <${full_imp_tables} 

? This variable is LINUX system-used , Used to indicate whether there was an error during the execution of the last command , If there is no mistake, it is 0, that $? Is to take the value of this variable , That is, get the flag of whether the execution of the last command is wrong , then IF Li He 0 Made a comparison .
p, --parentsCreate the upper directory if necessary , If the directory already exists, it will not be regarded as an error
backup ( Documents etc. ) Backup ; backup forces ; reinforcements ;
preview
Draw a technical architecture ????

Look back md Or video ???


restart yarn, restart spark in thriftServer,,

Have time to automate the production of the test database shell??
#!/usr/bin/env bash
# /bin/bash
biz_date=20210101
biz_fmt_date=2021-01-01
dw_parent_dir=/data/dw/ods/one_make/test_full_imp
workhome=/opt/datas/shell
full_imp_tables=${workhome}/test_full_table.txt
mkdir ${workhome}/log
orcl_srv=oracle.bigdata.cn
orcl_port=1521
orcl_sid=helowin
orcl_user=ciss
orcl_pwd=123456
sqoop_import_params="sqoop import -Dmapreduce.job.user.classpath.first=true --outdir ${workhome}/java_code --as-avrodatafile"
sqoop_jdbc_params="--connect jdbc:oracle:thin:@${orcl_srv}:${orcl_port}:${orcl_sid} --username ${orcl_user} --password ${orcl_pwd}"
# load hadoop/sqoop env
source /etc/profile
while read p; do
# parallel execution import
${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p^^} -m 1 &
cur_time=`date "+%F %T"`
echo "${cur_time}: ${sqoop_import_params} ${sqoop_jdbc_params} --target-dir ${dw_parent_dir}/${p}/${biz_date} --table ${p} -m 1 &" >> ${workhome}/log/${biz_fmt_date}_full_imp.log
sleep 30
done < ${full_imp_tables}Have time to watch rookies shell grammar ??
边栏推荐
- Qt/PyQt 窗口类型与窗口标志
- What constitutes the smart charging pile system?
- Unity beginner 3 - enemy movement control and setting of blood loss area (2D)
- 【学术相关】为什么很多国内学者的AI的论文复现不了?
- Si12t and si14t low power capacitive touch chips
- The difference between torch.tensor and torch.tensor
- 110 MySQL interview questions and answers (continuously updated)
- Some tools, plug-ins and software links are shared with you~
- Cs5340 domestic alternative dp5340 multi bit audio a/d converter
- The computer system has no standard tcp/ip port processing operations
猜你喜欢

Privacy is more secure in the era of digital RMB

Unity Shader学习(六)实现雷达扫描效果

CentOS deploy PostgreSQL 13

DC motor speed regulation system based on 51 single chip microcomputer (use of L298)

(Video + graphic) machine learning introduction series - Chapter 5 machine learning practice

Cs4344 domestic substitute for dp4344 192K dual channel 24 bit DA converter

An Optimal Buffer Management Scheme with Dynamic Thresholds论文总结

【学术相关】为什么很多国内学者的AI的论文复现不了?

Arduino uno error analysis avrdude: stk500_ recv(): programmer is not responding
![[beauty of software engineering - column notes]](/img/b9/43db3fdfe1d9f08035668a66da37e2.png)
[beauty of software engineering - column notes] "one question and one answer" issue 3 | 18 common software development problem-solving strategies
随机推荐
Qt/pyqt window type and window flag
torch.Tensor.to的用法
Temperature acquisition and control system based on WiFi
阿里巴巴政委体系-第一章、政委建在连队上
125kHz wake-up function 2.4GHz single transmitter chip-si24r2h
Alibaba political commissar system - Chapter III, Alibaba political commissar and cultural docking
STM32 serial port garbled
BiSeNet v2
Low cost 2.4GHz wireless transceiver chip -- ci24r1
Proteus simulation based on 51 MCU ADC0808
What is Amazon self support number and what should sellers do?
Crawl expression bag
[beauty of software engineering - column notes] 25 | what methods can improve development efficiency?
Arduinoide + stm32link burning debugging
Proteus simulation based on msp430f2491
C language interview preparation I (about understanding Department)
Alibaba political commissar system - Chapter 4: political commissars are built on companies
Low power Bluetooth 5.0 chip nrf52832-qfaa
Mysql rownum 实现
[robomaster] a board receives jy-me01 angle sensor data -- Modbus Protocol & CRC software verification