当前位置:网站首页>Clickhouse eliminates the gap caused by group by
Clickhouse eliminates the gap caused by group by
2022-07-01 02:09:00 【I'm a bad person】
This is the problem I encountered several months ago , I always thought I had recorded it , Suddenly found no / Over your face .jpg/. Demand is based on date 、 Country 、 supplier 、 equipment …… Equal conditions , Statistics DUA、 Retention rate and other data information . I am not responsible for this demand , It's just that colleagues meet with group by Resulting in date discontinuity , Ask me . This article only provides one idea , In end, colleagues did not use this scheme to solve this problem at database level , But in java Solve in the program .
1. The simplified problem reappears ,e.g. Due to China 4 month 5 To 20 Japan dua and use_times Are all 0, So these days disappeared in the results , Produced time discontinuity .
SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘2. Fill in the time gap first (WITH FILL STEP n)
SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC WITH FILL STEP 1
┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-05 │ │ 0 │ 0 │
│ 2022-04-06 │ │ 0 │ 0 │
│ 2022-04-07 │ │ 0 │ 0 │
│ 2022-04-08 │ │ 0 │ 0 │
│ 2022-04-09 │ │ 0 │ 0 │
│ 2022-04-10 │ │ 0 │ 0 │
│ 2022-04-11 │ │ 0 │ 0 │
│ 2022-04-12 │ │ 0 │ 0 │
│ 2022-04-13 │ │ 0 │ 0 │
│ 2022-04-14 │ │ 0 │ 0 │
│ 2022-04-15 │ │ 0 │ 0 │
│ 2022-04-16 │ │ 0 │ 0 │
│ 2022-04-17 │ │ 0 │ 0 │
│ 2022-04-18 │ │ 0 │ 0 │
│ 2022-04-19 │ │ 0 │ 0 │
│ 2022-04-20 │ │ 0 │ 0 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-22 │ │ 0 │ 0 │
│ 2022-04-23 │ │ 0 │ 0 │
│ 2022-04-24 │ │ 0 │ 0 │
│ 2022-04-25 │ │ 0 │ 0 │
│ 2022-04-26 │ │ 0 │ 0 │
│ 2022-04-27 │ │ 0 │ 0 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘3. At this time, a new problem appears , namely country_name There is a gap , So we need to use the previous 《Clickhouse Vacancy value processing 》 This article refers to “ Filling adjacent value method ” Fill the gap (arrayFill).
with temp as( SELECT
event_date,
country_name,
groupBitmapOr(active_user) AS dau,
sum(use_times) AS use_times
FROM dws_app_global_active_retained
WHERE (country_name IN ('China', 'France', 'Hong Kong')) AND (event_date >= '2022-03-25')
GROUP BY
event_date,
country_name
ORDER BY
country_name ASC,
event_date ASC WITH FILL STEP 1 )
select tuple.1 as event_date, tuple.2 as country_name,tuple.3 as dau, tuple.4 as use_times from
(select arrayJoin(
arrayZip(
groupArray(event_date),
arrayFill(x ->x !='',groupArray(country_name)) ,
groupArray(dau),
groupArray(use_times)
)
) as tuple
from temp)┌─event_date─┬─country_name─┬─dau─┬─use_times─┐
│ 2022-03-27 │ China │ 1 │ 1 │
│ 2022-03-28 │ China │ 1 │ 2 │
│ 2022-03-29 │ China │ 2 │ 64 │
│ 2022-03-30 │ China │ 1 │ 4 │
│ 2022-03-31 │ China │ 2 │ 4 │
│ 2022-04-01 │ China │ 1 │ 5 │
│ 2022-04-02 │ China │ 1 │ 1 │
│ 2022-04-03 │ China │ 1 │ 1 │
│ 2022-04-04 │ China │ 1 │ 2 │
│ 2022-04-05 │ China │ 0 │ 0 │
│ 2022-04-06 │ China │ 0 │ 0 │
│ 2022-04-07 │ China │ 0 │ 0 │
│ 2022-04-08 │ China │ 0 │ 0 │
│ 2022-04-09 │ China │ 0 │ 0 │
│ 2022-04-10 │ China │ 0 │ 0 │
│ 2022-04-11 │ China │ 0 │ 0 │
│ 2022-04-12 │ China │ 0 │ 0 │
│ 2022-04-13 │ China │ 0 │ 0 │
│ 2022-04-14 │ China │ 0 │ 0 │
│ 2022-04-15 │ China │ 0 │ 0 │
│ 2022-04-16 │ China │ 0 │ 0 │
│ 2022-04-17 │ China │ 0 │ 0 │
│ 2022-04-18 │ China │ 0 │ 0 │
│ 2022-04-19 │ China │ 0 │ 0 │
│ 2022-04-20 │ China │ 0 │ 0 │
│ 2022-04-21 │ China │ 1 │ 5 │
│ 2022-04-22 │ China │ 0 │ 0 │
│ 2022-04-23 │ China │ 0 │ 0 │
│ 2022-04-24 │ China │ 0 │ 0 │
│ 2022-04-25 │ China │ 0 │ 0 │
│ 2022-04-26 │ China │ 0 │ 0 │
│ 2022-04-27 │ China │ 0 │ 0 │
│ 2022-04-28 │ China │ 1 │ 6 │
│ 2022-04-29 │ China │ 1 │ 4 │
│ 2022-04-30 │ China │ 1 │ 12 │
│ 2022-05-01 │ China │ 1 │ 10 │
│ 2022-05-02 │ China │ 1 │ 9 │
│ 2022-05-03 │ China │ 1 │ 3 │
│ 2022-05-04 │ China │ 1 │ 2 │
│ 2022-05-05 │ China │ 1 │ 6 │
│ 2022-05-06 │ China │ 1 │ 3 │
│ 2022-05-07 │ China │ 1 │ 5 │
│ 2022-05-08 │ China │ 1 │ 4 │
│ 2022-03-25 │ Hong Kong │ 2 │ 5 │
│ 2022-03-26 │ Hong Kong │ 1 │ 2 │
│ 2022-03-27 │ Hong Kong │ 1 │ 2 │
│ 2022-03-28 │ Hong Kong │ 2 │ 26 │
│ 2022-03-29 │ Hong Kong │ 2 │ 56 │
│ 2022-03-30 │ Hong Kong │ 3 │ 17 │
│ 2022-03-31 │ Hong Kong │ 4 │ 12 │
│ 2022-04-01 │ Hong Kong │ 1 │ 1 │
│ 2022-04-02 │ Hong Kong │ 2 │ 23 │
│ 2022-04-06 │ Hong Kong │ 1 │ 1 │
│ 2022-04-07 │ Hong Kong │ 2 │ 9 │
│ 2022-04-08 │ Hong Kong │ 2 │ 29 │
└────────────┴──────────────┴─────┴───────────┘边栏推荐
- 机器学习9-通用逼近器径向基函数神经网络,在新观点下审视PDA和SVM
- go导入自建包
- Sitge joined the opengauss open source community to jointly promote the ecological development of the database industry
- 零基础自学SQL课程 | 窗口函数
- 数学知识:满足条件的01序列—求组合数
- Short video platform development, relying on drawerlayout to achieve side sliding menu effect
- VirtualBox 安装增强功能
- AS400 large factory interview
- 思特奇加入openGauss开源社区,共同推动数据库产业生态发展
- (translation) use eyebrow shaped text to improve Title click through rate
猜你喜欢

What is project management?

@The difference between configurationproperties and @value

计算特殊奖金

运算符重载的初识

思特奇加入openGauss开源社区,共同推动数据库产业生态发展

RocketQA:通过跨批次负采样(cross-batch negatives)、去噪的强负例采样(denoised hard negative sampling)与数据增强(data augment

FL studio20.9 fruit software advanced Chinese edition electronic music arrangement

For the sustainable development of software testing, we must learn to knock code?

3500 word summary: a complete set of skills that a qualified software testing engineer needs to master

7-2 punch in reward DP for puzzle a
随机推荐
Ernie gram, an explicit and complete n-gram mask language model, implements explicit n-gram semantic unit knowledge modeling.
模板:全局平衡二叉树
How do the top ten securities firms open accounts? Also, is it safe to open an account online?
SWT/ANR问题--Native方法执行时间过长导致SWT
Leetcode (524) -- match the longest word in the dictionary by deleting letters
QML control type: tooltip
Mathematical knowledge: finding combinatorial number IV - finding combinatorial number
RocketQA:通过跨批次负采样(cross-batch negatives)、去噪的强负例采样(denoised hard negative sampling)与数据增强(data augment
Batch import of Excel data in applet
(总结一)Halcon基础之寻找目标特征+转正
halcon变量窗口的图像变量不显示,重启软件和电脑都没用
SQL语句关联表 如何添加关联表的条件 [需要null值或不需要null值]
PHP converts two-dimensional array elements into key value pairs
FL studio20.9 fruit software advanced Chinese edition electronic music arrangement
How does ZABBIX configure alarm SMS? (alert SMS notification setting process)
Windows quick add boot entry
Alphabet rearrange inator 3000 (dictionary tree custom sorting)
CorelDRAW 2022 Chinese Simplified 64 bit direct download
Sitge joined the opengauss open source community to jointly promote the ecological development of the database industry
AS400 entretien d'usine