当前位置:网站首页>Clickhouse eliminates the gap caused by group by
Clickhouse eliminates the gap caused by group by
2022-07-01 02:09:00 【I'm a bad person】
This is the problem I encountered several months ago , I always thought I had recorded it , Suddenly found no / Over your face .jpg/. Demand is based on date 、 Country 、 supplier 、 equipment …… Equal conditions , Statistics DUA、 Retention rate and other data information . I am not responsible for this demand , It's just that colleagues meet with group by Resulting in date discontinuity , Ask me . This article only provides one idea , In end, colleagues did not use this scheme to solve this problem at database level , But in java Solve in the program .
1. The simplified problem reappears ,e.g. Due to China 4 month 5 To 20 Japan dua and use_times Are all 0, So these days disappeared in the results , Produced time discontinuity .
SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘2. Fill in the time gap first (WITH FILL STEP n)
SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC WITH FILL STEP 1
┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-05 │ │ 0 │ 0 │
│ 2022-04-06 │ │ 0 │ 0 │
│ 2022-04-07 │ │ 0 │ 0 │
│ 2022-04-08 │ │ 0 │ 0 │
│ 2022-04-09 │ │ 0 │ 0 │
│ 2022-04-10 │ │ 0 │ 0 │
│ 2022-04-11 │ │ 0 │ 0 │
│ 2022-04-12 │ │ 0 │ 0 │
│ 2022-04-13 │ │ 0 │ 0 │
│ 2022-04-14 │ │ 0 │ 0 │
│ 2022-04-15 │ │ 0 │ 0 │
│ 2022-04-16 │ │ 0 │ 0 │
│ 2022-04-17 │ │ 0 │ 0 │
│ 2022-04-18 │ │ 0 │ 0 │
│ 2022-04-19 │ │ 0 │ 0 │
│ 2022-04-20 │ │ 0 │ 0 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-22 │ │ 0 │ 0 │
│ 2022-04-23 │ │ 0 │ 0 │
│ 2022-04-24 │ │ 0 │ 0 │
│ 2022-04-25 │ │ 0 │ 0 │
│ 2022-04-26 │ │ 0 │ 0 │
│ 2022-04-27 │ │ 0 │ 0 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘3. At this time, a new problem appears , namely country_name There is a gap , So we need to use the previous 《Clickhouse Vacancy value processing 》 This article refers to “ Filling adjacent value method ” Fill the gap (arrayFill).
with temp as( SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC WITH FILL STEP 1 )
select tuple.1 as event_date, tuple.2 as country_name,tuple.3 as dau, tuple.4 as use_times from
(select arrayJoin(
arrayZip(
groupArray(event_date),
arrayFill(x ->x !='',groupArray(country_name)) ,
groupArray(dau),
groupArray(use_times)
)
) as tuple
from temp)┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-05 │ China │ 0 │ 0 │
│ 2022-04-06 │ China │ 0 │ 0 │
│ 2022-04-07 │ China │ 0 │ 0 │
│ 2022-04-08 │ China │ 0 │ 0 │
│ 2022-04-09 │ China │ 0 │ 0 │
│ 2022-04-10 │ China │ 0 │ 0 │
│ 2022-04-11 │ China │ 0 │ 0 │
│ 2022-04-12 │ China │ 0 │ 0 │
│ 2022-04-13 │ China │ 0 │ 0 │
│ 2022-04-14 │ China │ 0 │ 0 │
│ 2022-04-15 │ China │ 0 │ 0 │
│ 2022-04-16 │ China │ 0 │ 0 │
│ 2022-04-17 │ China │ 0 │ 0 │
│ 2022-04-18 │ China │ 0 │ 0 │
│ 2022-04-19 │ China │ 0 │ 0 │
│ 2022-04-20 │ China │ 0 │ 0 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-22 │ China │ 0 │ 0 │
│ 2022-04-23 │ China │ 0 │ 0 │
│ 2022-04-24 │ China │ 0 │ 0 │
│ 2022-04-25 │ China │ 0 │ 0 │
│ 2022-04-26 │ China │ 0 │ 0 │
│ 2022-04-27 │ China │ 0 │ 0 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘边栏推荐
- [punch in questions] integrated daily 5 questions sharing (phase I)
- FL Studio20.9水果软件高级中文版电音编曲
- After working for 6 years, let's take stock of the golden rule of the workplace where workers mix up
- Analysis on user behavior loss of data exploration e-commerce platform
- The latest CSDN salary increase technology stack in 2022 overview of APP automated testing
- 522. 最长的特殊序列 II
- PHP converts two-dimensional array elements into key value pairs
- AS400 large factory interview
- SWT / anr problem - binder stuck
- SWT / anr problem - anr/je causes SWT
猜你喜欢

Upstream and downstream in software development

Calculate special bonus

求两个线段公共部分的长度
The latest CSDN salary increase technology stack in 2022 overview of APP automated testing

With one-stop insight into industry hot spots, the new function "traffic market" of feigua data station B is launched!

【JS】【掘金】获取关注了里不在关注者里的人

The personal test is effective, and the JMeter desktop shortcut is quickly created

Selenium classic interview question - multi window switching solution

Necessary tools for testing - postman practical tutorial

FL studio20.9 fruit software advanced Chinese edition electronic music arrangement
随机推荐
运算符重载的初识
Clickhouse 消除由group by产生的间隙
Necessary tools for testing - postman practical tutorial
FL studio20.9 fruit software advanced Chinese edition electronic music arrangement
数学知识:满足条件的01序列—求组合数
【agora】用户管理
AS400 API 从零到一的整个历程
Check the disk usage of MySQL database
Do you write API documents or code first?
【毕业季·进击的技术er】--毕业到工作小结
Objects and object variables
SWT/ANR问题--Binder Stuck
[无线通信基础-14]:图解移动通信技术与应用发展-2-第一代移动模拟通信大哥大
静态域与静态方法
P6773 [noi2020] destiny (DP, segment tree merging)
小程序云开发之--微信公众号文章采集篇
How does ZABBIX configure alarm SMS? (alert SMS notification setting process)
【JS】【掘金】获取关注了里不在关注者里的人
Short video platform development, relying on drawerlayout to achieve side sliding menu effect
数学知识:求组合数 IV—求组合数