当前位置:网站首页>Clickhouse eliminates the gap caused by group by
Clickhouse eliminates the gap caused by group by
2022-07-01 02:09:00 【I'm a bad person】
This is the problem I encountered several months ago , I always thought I had recorded it , Suddenly found no / Over your face .jpg/. Demand is based on date 、 Country 、 supplier 、 equipment …… Equal conditions , Statistics DUA、 Retention rate and other data information . I am not responsible for this demand , It's just that colleagues meet with group by Resulting in date discontinuity , Ask me . This article only provides one idea , In end, colleagues did not use this scheme to solve this problem at database level , But in java Solve in the program .
1. The simplified problem reappears ,e.g. Due to China 4 month 5 To 20 Japan dua and use_times Are all 0, So these days disappeared in the results , Produced time discontinuity .
SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘2. Fill in the time gap first (WITH FILL STEP n)
SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC WITH FILL STEP 1
┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-05 │ │ 0 │ 0 │
│ 2022-04-06 │ │ 0 │ 0 │
│ 2022-04-07 │ │ 0 │ 0 │
│ 2022-04-08 │ │ 0 │ 0 │
│ 2022-04-09 │ │ 0 │ 0 │
│ 2022-04-10 │ │ 0 │ 0 │
│ 2022-04-11 │ │ 0 │ 0 │
│ 2022-04-12 │ │ 0 │ 0 │
│ 2022-04-13 │ │ 0 │ 0 │
│ 2022-04-14 │ │ 0 │ 0 │
│ 2022-04-15 │ │ 0 │ 0 │
│ 2022-04-16 │ │ 0 │ 0 │
│ 2022-04-17 │ │ 0 │ 0 │
│ 2022-04-18 │ │ 0 │ 0 │
│ 2022-04-19 │ │ 0 │ 0 │
│ 2022-04-20 │ │ 0 │ 0 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-22 │ │ 0 │ 0 │
│ 2022-04-23 │ │ 0 │ 0 │
│ 2022-04-24 │ │ 0 │ 0 │
│ 2022-04-25 │ │ 0 │ 0 │
│ 2022-04-26 │ │ 0 │ 0 │
│ 2022-04-27 │ │ 0 │ 0 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘3. At this time, a new problem appears , namely country_name There is a gap , So we need to use the previous 《Clickhouse Vacancy value processing 》 This article refers to “ Filling adjacent value method ” Fill the gap (arrayFill).
with temp as( SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC WITH FILL STEP 1 )
select tuple.1 as event_date, tuple.2 as country_name,tuple.3 as dau, tuple.4 as use_times from
(select arrayJoin(
arrayZip(
groupArray(event_date),
arrayFill(x ->x !='',groupArray(country_name)) ,
groupArray(dau),
groupArray(use_times)
)
) as tuple
from temp)┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-05 │ China │ 0 │ 0 │
│ 2022-04-06 │ China │ 0 │ 0 │
│ 2022-04-07 │ China │ 0 │ 0 │
│ 2022-04-08 │ China │ 0 │ 0 │
│ 2022-04-09 │ China │ 0 │ 0 │
│ 2022-04-10 │ China │ 0 │ 0 │
│ 2022-04-11 │ China │ 0 │ 0 │
│ 2022-04-12 │ China │ 0 │ 0 │
│ 2022-04-13 │ China │ 0 │ 0 │
│ 2022-04-14 │ China │ 0 │ 0 │
│ 2022-04-15 │ China │ 0 │ 0 │
│ 2022-04-16 │ China │ 0 │ 0 │
│ 2022-04-17 │ China │ 0 │ 0 │
│ 2022-04-18 │ China │ 0 │ 0 │
│ 2022-04-19 │ China │ 0 │ 0 │
│ 2022-04-20 │ China │ 0 │ 0 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-22 │ China │ 0 │ 0 │
│ 2022-04-23 │ China │ 0 │ 0 │
│ 2022-04-24 │ China │ 0 │ 0 │
│ 2022-04-25 │ China │ 0 │ 0 │
│ 2022-04-26 │ China │ 0 │ 0 │
│ 2022-04-27 │ China │ 0 │ 0 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘边栏推荐
- 機器學習10-信念貝葉斯分類器
- org.redisson.client.RedisResponseTimeoutException: Redis server response timeout (3000 ms)错误解决
- 数学知识:求组合数 IV—求组合数
- 对象与对象变量
- 【agora】用户管理
- Fix names in the table (first character uppercase, other lowercase)
- What are the applications of SMS in enterprises?
- 如何选择券商?另外,手机开户安全么?
- Pytorch - - Basic Reference North Deux élèves du secondaire peuvent comprendre [Rétropropagation et Gradient descendant]
- When facing the industrial Internet, they even use the ways and methods of consuming the Internet to land and practice the industrial Internet
猜你喜欢

Qu'est - ce que le PMP?

House change for agricultural products? "Disguised" house purchase subsidy!

PMP是什么?

(翻译)使用眉状文本提高标题点击率

SWT/ANR问题--Binder Stuck

(translation) reasons why real-time inline verification is easier for users to make mistakes

go导入自建包
![[fundamentals of wireless communication-14]: illustrated mobile communication technology and application development-2-the first generation mobile analog communication big brother](/img/fa/f9bad44147ba9af21183b7bd630e32.png)
[fundamentals of wireless communication-14]: illustrated mobile communication technology and application development-2-the first generation mobile analog communication big brother

@The difference between configurationproperties and @value

求两个线段公共部分的长度
随机推荐
Necessary tools for testing - postman practical tutorial
Clickhouse 消除由group by产生的间隙
Do you write API documents or code first?
When facing the industrial Internet, they even use the ways and methods of consuming the Internet to land and practice the industrial Internet
模板:全局平衡二叉树
(summary I) Halcon Foundation's target finding features + becoming a regular
QT web 开发 - video -- 笔记
FL Studio20.9水果软件高级中文版电音编曲
Rocketqa: cross batch negatives, de noised hard negative sampling and data augmentation
机器学习10-信念贝叶斯分类器
Short video platform development, relying on drawerlayout to achieve side sliding menu effect
URL和URI
(总结一)Halcon基础之寻找目标特征+转正
(翻译)实时内联验证更容易让用户犯错的原因
Qu'est - ce que le PMP?
With one-stop insight into industry hot spots, the new function "traffic market" of feigua data station B is launched!
SWT / anr problem - deadlock
522. 最长的特殊序列 II
数据探索电商平台用户行为流失分析
[punch in questions] integrated daily 5 questions sharing (phase I)