当前位置:网站首页>Zipper table in data warehouse (compressed storage)
Zipper table in data warehouse (compressed storage)
2022-07-03 23:40:00 【hzp666】
One 、 Zipper table Introduction
1. What is a zipper watch
Zipper table : Record the lifecycle of each piece of information , Once the life cycle of a record ends , Just start a new record , And put the current date into the effective start date .
If the current information is still valid , Enter a maximum value in the effective end date ( Such as 9999-99-99), The following table ( surface 1):
image.png
2. Why make a zipper watch
Zipper watch is suitable for : The data will change , But most of them remain the same .
such as : Order information was never paid 、 Paid 、 Not delivered 、 It has been completed for a week , It doesn't change most of the time . If the data volume has a certain scale , It cannot be saved in full amount every day . such as :1 Billion users *365 God , A copy of user information every day .( It is inefficient to do full daily work )
The full scale is as follows ( surface 2):
And zipper table ( surface 1) We can see the advantages of zipper watch by comparison .
3. How to use a zipper watch
adopt , Effective from <= A certain date And Effective end date >= A certain date , Can get a full slice of data at a certain point in time .
for example :
select * from dw.t_order_info_his where start_date<='2020-01-01' and end_date>='2020-01-01'
4. Zipper watch forming process
image.png
5. Zipper table production flow chart
All data of the order day and mysql The data that changes every day in , Form a new temporary zipper table data . Overwrite the old zipper table data with a temporary zipper table .( That's it hive The data in the table cannot be updated )
6. Zipper watch production process code level
Step one : Initialize zipper table
1) Generate original order table dw.t_order_info And insert data (2020-01-01 Start to 2020-01-02 Two day data )
create table dw.t_order_info(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time '
) COMMENT ' The order sheet '
stored as parquet;
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('1',100,'0','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('2',100,'0','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('3',100,'1','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('2',100,'1','2020-01-01','2020-01-02');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('4',100,'1','2020-01-02','2020-01-02');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('5',100,'1','2020-01-02','2020-01-02');
2) Create a zipper table dw.t_order_info_his
create table dw.t_order_info_his(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time ',
`start_date` string COMMENT ' Effective start date ',
`end_date` string COMMENT ' Valid end date '
) COMMENT ' Order zipper table '
stored as parquet
- Initialize zipper table (2020-01-01 data )
insert into dw.t_order_info_his
select id,total_amount,order_status,create_time,operate_time,'2020-01-01' as start_date,'9999-99-99' end_date from dw.t_order_info a
where a.operate_time='2020-01-01'
Step two : Make daily change data ( Including new , modify ) Execute... Daily
1) According to the original order form dw.t_order_info The operation time can be recorded
select
*
from dw.t_order_info
where operate_time='2020-01-02'
Step three : Merge change information , And add new information , Insert into temporary table
1) Create a zipper temporary table
create table dw.t_order_info_his_tmp(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time ',
`start_date` string COMMENT ' Effective start date ',
`end_date` string COMMENT ' Valid end date '
) COMMENT ' Order zipper temporary table '
stored as parquet
2) Insert the consolidated change information into the temporary table
insert overwrite table dw.t_order_info_his_tmp
select * from
(
select
id,
total_amount,
order_status,
create_time,
operate_time,
'2020-01-02' start_date,
'9999-99-99' end_date
from dw.t_order_info where operate_time='2020-01-02'
union all
select oh.id,
oh.total_amount,
oh.order_status,
oh.create_time,
oh.operate_time,
oh.start_date,
if(oi.id is null, oh.end_date, date_add(oi.operate_time,-1)) end_date
from dw.t_order_info_his oh left join
(
select
*
from dw.t_order_info
where operate_time='2020-01-02'
) oi
on oh.id=oi.id and oh.end_date='9999-99-99'
)his
order by his.id, start_date;
Step four : Cover the temporary watch with the zipper watch
insert overwrite table dw.t_order_info_his
select * from dw.t_order_info_his_tmp;
Inquire about dw.t_order_info_his surface , We have got the zipper watch we want , Here's the picture :
select * from dw.t_order_info_his
step 5: Organize into daily scripts
Set date parameters , Organize into daily scripts , Scheduled tasks
link :https://www.jianshu.com/p/cd8081701348
边栏推荐
- Gossip about redis source code 81
- Kubedl hostnetwork: accelerating the efficiency of distributed training communication
- China standard gas market prospect investment and development feasibility study report 2022-2028
- Shiftvit uses the precision of swing transformer to outperform the speed of RESNET, and discusses that the success of Vit does not lie in attention!
- Open 2022 efficient office, starting from project management
- Tencent interview: can you pour water?
- A preliminary study on the middleware of script Downloader
- Gossip about redis source code 74
- EPF: a fuzzy testing framework for network protocols based on evolution, protocol awareness and coverage guidance
- Pyqt5 sensitive word detection tool production, operator's Gospel
猜你喜欢
leetcode-43. String multiplication
A treasure open source software, cross platform terminal artifact tabby
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Actual combat | use composite material 3 in application
Unity shader visualizer shader graph
2022 chemical automation control instrument examination content and chemical automation control instrument simulation examination
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Smart fan system based on stm32f407
Solve the problem that the kaggle account registration does not display the verification code
Gorilla/mux framework (RK boot): add tracing Middleware
随机推荐
Briefly understand the operation mode of developing NFT platform
Ramble 72 of redis source code
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Distributed transaction -- middleware of TCC -- selection / comparison
2022.02.14
How to make icons easily
How about opening an account at Hengtai securities? Is it safe?
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
MLX90614 driver, function introduction and PEC verification
Generic tips
"Learning notes" recursive & recursive
JDBC Technology
[BSP video tutorial] stm32h7 video tutorial phase 5: MDK topic, system introduction to MDK debugging, AC5, AC6 compilers, RTE development environment and the role of various configuration items (2022-
Interesting 10 CMD commands
URLEncoder. Encode and urldecoder Decode processing URL
[source code] VB6 chat robot
D23:multiple of 3 or 5 (multiple of 3 or 5, translation + solution)
ADB command to get XML
C summary of knowledge point definitions, summary notes
leetcode-43. String multiplication