当前位置:网站首页>Zipper table in data warehouse (compressed storage)
Zipper table in data warehouse (compressed storage)
2022-07-03 23:40:00 【hzp666】
One 、 Zipper table Introduction
1. What is a zipper watch
Zipper table : Record the lifecycle of each piece of information , Once the life cycle of a record ends , Just start a new record , And put the current date into the effective start date .
If the current information is still valid , Enter a maximum value in the effective end date ( Such as 9999-99-99), The following table ( surface 1):
image.png
2. Why make a zipper watch
Zipper watch is suitable for : The data will change , But most of them remain the same .
such as : Order information was never paid 、 Paid 、 Not delivered 、 It has been completed for a week , It doesn't change most of the time . If the data volume has a certain scale , It cannot be saved in full amount every day . such as :1 Billion users *365 God , A copy of user information every day .( It is inefficient to do full daily work )
The full scale is as follows ( surface 2):
And zipper table ( surface 1) We can see the advantages of zipper watch by comparison .
3. How to use a zipper watch
adopt , Effective from <= A certain date And Effective end date >= A certain date , Can get a full slice of data at a certain point in time .
for example :
select * from dw.t_order_info_his where start_date<='2020-01-01' and end_date>='2020-01-01'
4. Zipper watch forming process
image.png
5. Zipper table production flow chart
All data of the order day and mysql The data that changes every day in , Form a new temporary zipper table data . Overwrite the old zipper table data with a temporary zipper table .( That's it hive The data in the table cannot be updated )
6. Zipper watch production process code level
Step one : Initialize zipper table
1) Generate original order table dw.t_order_info And insert data (2020-01-01 Start to 2020-01-02 Two day data )
create table dw.t_order_info(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time '
) COMMENT ' The order sheet '
stored as parquet;
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('1',100,'0','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('2',100,'0','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('3',100,'1','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('2',100,'1','2020-01-01','2020-01-02');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('4',100,'1','2020-01-02','2020-01-02');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('5',100,'1','2020-01-02','2020-01-02');
2) Create a zipper table dw.t_order_info_his
create table dw.t_order_info_his(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time ',
`start_date` string COMMENT ' Effective start date ',
`end_date` string COMMENT ' Valid end date '
) COMMENT ' Order zipper table '
stored as parquet
- Initialize zipper table (2020-01-01 data )
insert into dw.t_order_info_his
select id,total_amount,order_status,create_time,operate_time,'2020-01-01' as start_date,'9999-99-99' end_date from dw.t_order_info a
where a.operate_time='2020-01-01'
Step two : Make daily change data ( Including new , modify ) Execute... Daily
1) According to the original order form dw.t_order_info The operation time can be recorded
select
*
from dw.t_order_info
where operate_time='2020-01-02'
Step three : Merge change information , And add new information , Insert into temporary table
1) Create a zipper temporary table
create table dw.t_order_info_his_tmp(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time ',
`start_date` string COMMENT ' Effective start date ',
`end_date` string COMMENT ' Valid end date '
) COMMENT ' Order zipper temporary table '
stored as parquet
2) Insert the consolidated change information into the temporary table
insert overwrite table dw.t_order_info_his_tmp
select * from
(
select
id,
total_amount,
order_status,
create_time,
operate_time,
'2020-01-02' start_date,
'9999-99-99' end_date
from dw.t_order_info where operate_time='2020-01-02'
union all
select oh.id,
oh.total_amount,
oh.order_status,
oh.create_time,
oh.operate_time,
oh.start_date,
if(oi.id is null, oh.end_date, date_add(oi.operate_time,-1)) end_date
from dw.t_order_info_his oh left join
(
select
*
from dw.t_order_info
where operate_time='2020-01-02'
) oi
on oh.id=oi.id and oh.end_date='9999-99-99'
)his
order by his.id, start_date;
Step four : Cover the temporary watch with the zipper watch
insert overwrite table dw.t_order_info_his
select * from dw.t_order_info_his_tmp;
Inquire about dw.t_order_info_his surface , We have got the zipper watch we want , Here's the picture :
select * from dw.t_order_info_his
step 5: Organize into daily scripts
Set date parameters , Organize into daily scripts , Scheduled tasks
link :https://www.jianshu.com/p/cd8081701348
边栏推荐
- [note] glide process and source code analysis
- Analysis on the scale of China's smart health industry and prediction report on the investment trend of the 14th five year plan 2022-2028 Edition
- Actual combat | use composite material 3 in application
- Gossip about redis source code 81
- [15th issue] Tencent PCG background development internship I, II and III (OC)
- How can I get the Commission discount of stock trading account opening? Is it safe to open an account online
- Yyds dry goods inventory [practical] simply encapsulate JS cycle with FP idea~
- Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
- I wrote a chat software with timeout connect function
- 33 restrict the input of qlineedit control (verifier)
猜你喜欢
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Hcip day 12 notes
IO flow review
Idea set class header comments
SPI based on firmware library
Shiftvit uses the precision of swing transformer to outperform the speed of RESNET, and discusses that the success of Vit does not lie in attention!
Hcip day 16 notes
[network security] what is emergency response? What indicators should you pay attention to in emergency response?
Briefly understand the operation mode of developing NFT platform
Unity shader visualizer shader graph
随机推荐
Enter MySQL in docker container by command under Linux
Exclusive download! Alibaba cloud native brings 10 + technical experts to bring "new possibilities of cloud native and cloud future"
[source code] VB6 chat robot
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Current detection circuit - including op amp current scheme
Fluent learning (4) listview
MLX90614 driver, function introduction and PEC verification
Pyqt5 sensitive word detection tool production, operator's Gospel
JarPath
C # basic knowledge (2)
Analysis on the scale of China's smart health industry and prediction report on the investment trend of the 14th five year plan 2022-2028 Edition
Ningde times and BYD have refuted rumors one after another. Why does someone always want to harm domestic brands?
Fashion cloud interview questions series - JS high-frequency handwritten code questions
在恒泰证券开户怎么样?安全吗?
2022 Guangdong Provincial Safety Officer a certificate third batch (main person in charge) simulated examination and Guangdong Provincial Safety Officer a certificate third batch (main person in charg
Hcip day 12 notes
EPF: a fuzzy testing framework for network protocols based on evolution, protocol awareness and coverage guidance
Recursion and recursion
D28:maximum sum (maximum sum, translation)
After the Lunar New Year and a half