当前位置:网站首页>Zipper table in data warehouse (compressed storage)
Zipper table in data warehouse (compressed storage)
2022-07-03 23:40:00 【hzp666】
One 、 Zipper table Introduction
1. What is a zipper watch
Zipper table : Record the lifecycle of each piece of information , Once the life cycle of a record ends , Just start a new record , And put the current date into the effective start date .
If the current information is still valid , Enter a maximum value in the effective end date ( Such as 9999-99-99), The following table ( surface 1):
image.png
2. Why make a zipper watch
Zipper watch is suitable for : The data will change , But most of them remain the same .
such as : Order information was never paid 、 Paid 、 Not delivered 、 It has been completed for a week , It doesn't change most of the time . If the data volume has a certain scale , It cannot be saved in full amount every day . such as :1 Billion users *365 God , A copy of user information every day .( It is inefficient to do full daily work )
The full scale is as follows ( surface 2):
And zipper table ( surface 1) We can see the advantages of zipper watch by comparison .
3. How to use a zipper watch
adopt , Effective from <= A certain date And Effective end date >= A certain date , Can get a full slice of data at a certain point in time .
for example :
select * from dw.t_order_info_his where start_date<='2020-01-01' and end_date>='2020-01-01'
4. Zipper watch forming process
image.png
5. Zipper table production flow chart
All data of the order day and mysql The data that changes every day in , Form a new temporary zipper table data . Overwrite the old zipper table data with a temporary zipper table .( That's it hive The data in the table cannot be updated )
6. Zipper watch production process code level
Step one : Initialize zipper table
1) Generate original order table dw.t_order_info And insert data (2020-01-01 Start to 2020-01-02 Two day data )
create table dw.t_order_info(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time '
) COMMENT ' The order sheet '
stored as parquet;
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('1',100,'0','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('2',100,'0','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('3',100,'1','2020-01-01','2020-01-01');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('2',100,'1','2020-01-01','2020-01-02');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('4',100,'1','2020-01-02','2020-01-02');
insert into online.t_order_info (id,total_amount,order_status,create_time,operate_time)values('5',100,'1','2020-01-02','2020-01-02');
2) Create a zipper table dw.t_order_info_his
create table dw.t_order_info_his(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time ',
`start_date` string COMMENT ' Effective start date ',
`end_date` string COMMENT ' Valid end date '
) COMMENT ' Order zipper table '
stored as parquet
- Initialize zipper table (2020-01-01 data )
insert into dw.t_order_info_his
select id,total_amount,order_status,create_time,operate_time,'2020-01-01' as start_date,'9999-99-99' end_date from dw.t_order_info a
where a.operate_time='2020-01-01'
Step two : Make daily change data ( Including new , modify ) Execute... Daily
1) According to the original order form dw.t_order_info The operation time can be recorded
select
*
from dw.t_order_info
where operate_time='2020-01-02'
Step three : Merge change information , And add new information , Insert into temporary table
1) Create a zipper temporary table
create table dw.t_order_info_his_tmp(
`id` string COMMENT ' The order no. ',
`total_amount` decimal(10,2) COMMENT ' Order amount ',
`order_status` string COMMENT ' The order status ',
`create_time` string COMMENT ' Creation time ',
`operate_time` string COMMENT ' Operating time ',
`start_date` string COMMENT ' Effective start date ',
`end_date` string COMMENT ' Valid end date '
) COMMENT ' Order zipper temporary table '
stored as parquet
2) Insert the consolidated change information into the temporary table
insert overwrite table dw.t_order_info_his_tmp
select * from
(
select
id,
total_amount,
order_status,
create_time,
operate_time,
'2020-01-02' start_date,
'9999-99-99' end_date
from dw.t_order_info where operate_time='2020-01-02'
union all
select oh.id,
oh.total_amount,
oh.order_status,
oh.create_time,
oh.operate_time,
oh.start_date,
if(oi.id is null, oh.end_date, date_add(oi.operate_time,-1)) end_date
from dw.t_order_info_his oh left join
(
select
*
from dw.t_order_info
where operate_time='2020-01-02'
) oi
on oh.id=oi.id and oh.end_date='9999-99-99'
)his
order by his.id, start_date;
Step four : Cover the temporary watch with the zipper watch
insert overwrite table dw.t_order_info_his
select * from dw.t_order_info_his_tmp;
Inquire about dw.t_order_info_his surface , We have got the zipper watch we want , Here's the picture :
select * from dw.t_order_info_his
step 5: Organize into daily scripts
Set date parameters , Organize into daily scripts , Scheduled tasks
link :https://www.jianshu.com/p/cd8081701348
边栏推荐
- Idea integrates Microsoft TFs plug-in
- Report on prospects and future investment recommendations of China's assisted reproductive industry, 2022-2028 Edition
- D23:multiple of 3 or 5 (multiple of 3 or 5, translation + solution)
- Bufferpool caching mechanism for executing SQL in MySQL
- Maxwell equation and Euler formula - link
- What is the Valentine's Day gift given by the operator to the product?
- Analysis on the scale of China's smart health industry and prediction report on the investment trend of the 14th five year plan 2022-2028 Edition
- "Learning notes" recursive & recursive
- Fudan 961 review
- [BSP video tutorial] stm32h7 video tutorial phase 5: MDK topic, system introduction to MDK debugging, AC5, AC6 compilers, RTE development environment and the role of various configuration items (2022-
猜你喜欢
2022.02.13
Fluent learning (5) GridView
Loop compensation - explanation and calculation of first-order, second-order and op amp compensation
How to quickly build high availability of service discovery
2022 examination of safety production management personnel of hazardous chemical production units and examination skills of safety production management personnel of hazardous chemical production unit
Report on prospects and future investment recommendations of China's assisted reproductive industry, 2022-2028 Edition
Sort merge sort
Solve the problem that the kaggle account registration does not display the verification code
Shiftvit uses the precision of swing transformer to outperform the speed of RESNET, and discusses that the success of Vit does not lie in attention!
Analysis on the scale of China's smart health industry and prediction report on the investment trend of the 14th five year plan 2022-2028 Edition
随机推荐
Runtime. getRuntime(). totalMemory/maxMemory()
What are the securities companies with the lowest Commission for stock account opening? Would you recommend it? Is it safe to open an account on your mobile phone
Errors taken 1 Position1 argument but 2 were given in Mockingbird
Subset enumeration method
How will the complete NFT platform work in 2022? How about its core functions and online time?
33 restrict the input of qlineedit control (verifier)
Docking Alipay process [pay in person, QR code Payment]
How to quickly build high availability of service discovery
Qtoolbutton available signal
C summary of knowledge point definitions, summary notes
In VS_ In 2019, scanf and other functions are used to prompt the error of unsafe functions
Gossip about redis source code 78
[BSP video tutorial] stm32h7 video tutorial phase 5: MDK topic, system introduction to MDK debugging, AC5, AC6 compilers, RTE development environment and the role of various configuration items (2022-
D23:multiple of 3 or 5 (multiple of 3 or 5, translation + solution)
炒股开户佣金优惠怎么才能获得,网上开户安全吗
Kubedl hostnetwork: accelerating the efficiency of distributed training communication
Arc135 partial solution
Gossip about redis source code 81
How to prevent malicious crawling of information by one-to-one live broadcast source server
2022.02.14