当前位置:网站首页>Flick SQL knows why (10): everyone uses accumulate window to calculate cumulative indicators
Flick SQL knows why (10): everyone uses accumulate window to calculate cumulative indicators
2022-07-03 13:10:00 【Big data sheep said】
What do you think , Baby , Not three in a row ???( Focus on + give the thumbs-up + Look again ), Affirmation of bloggers , It will urge bloggers to continuously output more high-quality practical content !!!
1. Preface
Source official account back office reply 1.13.2 cumulate window The wonderful way of analysis obtain .
This section is the third chapter of window aggregation , Last section introduced 1.13 window tvf tumble window Realization , This section focuses on the introduction 1.13. window tvf A blockbuster update for , namely cumulate window.
This section introduces you in detail from the following chapters cumulate window The ability of .
Introduction to application scenarios
Expected results
Solution Introduction
Summary and prospect
2. Introduction to application scenarios
Let's start with a simple survey : In a real-time scene , What kind of indicator demand scenario have you seen most ?
answer : Bloggers believe , It's not that PCU( That is, online at the same time PV,UV), It's cumulative over the period PV,UV indicators ( For example, the amount accumulated to the current minute every day PV,UV). Because such indicators are cumulative in a period of time , It has more statistical analysis value for analysts , And almost all composite indicators are based on the statistics of such indicators ( Otherwise, why do you need a day's data offline , Not a minute of data ).
This article will introduce the accumulation in the cycle PV,UV Indicators in flink 1.13 Version of the best solution .
3. Expected results
First, let's take a real case to see in the scenario of specific input values , What the output value should look like .
indicators : The cumulative number of minutes up to the current day money(sum(money)), duplicate removal id Count (count(distinct id)). Every day Represents that the window size is 1 God , minute Represents that the moving step is in minute level .
A wave of input data :
| time | id | money |
|---|---|---|
| 2021-11-01 00:01:00 | A | 3 |
| 2021-11-01 00:01:00 | B | 5 |
| 2021-11-01 00:01:00 | A | 7 |
| 2021-11-01 00:02:00 | C | 3 |
| 2021-11-01 00:03:00 | C | 10 |
Expected output data :
| time | count distinct id | sum money |
|---|---|---|
| 2021-11-01 00:01:00 | 2 | 15 |
| 2021-11-01 00:02:00 | 3 | 18 |
| 2021-11-01 00:03:00 | 3 | 28 |
Convert to a line chart with a length like this :

Accumulated on that day
You can see , Its characteristic is , The output result of each minute is the result accumulated from the zero point of the day to the current .
4. Solution Introduction
4.1.flink 1.13 Before
There are two alternative solutions
tumble window(1 Sky window ) + early-fire(1 minute )
group by(1 God ) + minibatch(1 minute )
But both solutions produce retract flow , About retract See the following article for the disadvantages of streaming :
[

Step on a hole | flink sql count And this kind of pit !
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247487969&idx=1&sn=7215b101f49bd5e62b81a746dc6b15e6&chksm=c1549d19f623140f84cd436dd02be7e6ae08e28e5117bff88166237c09c17ca491a3e88a3a3d&scene=21#wechat_redirect)
also tumble window + early-fire The trigger mechanism is based on processing time rather than event time , See the following article for specific disadvantages :
https://mp.weixin.qq.com/s/L8-RSS6v3Ppts60CWngiOA
4.2.flink 1.13 And after
The birth of cumulate window solution , See the official website link for details :
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-tvf/#cumulate
As shown in the following official website documents , Introduce cumulate window The first sentence is cumulate window Very suitable for previous use tumble window + early-fire Scene . so to speak cumulate window It is accumulated in the user's calculation cycle PV,UV When it comes to indicators , Used tumble window + early-fire It is found that there are many pits in this scheme , The birth of the !

cumulate window
The computer system is shown in the figure below :

cumulate window
Or take the case just now to illustrate , Take the sky as the window , Output the cumulative value from zero point of the day to the current minute every minute , stay cumulate window in , The window division rules are as follows :
[2021-11-01 00:00:00, 2021-11-01 00:01:00]
[2021-11-01 00:00:00, 2021-11-01 00:02:00]
[2021-11-01 00:00:00, 2021-11-01 00:03:00] …
[2021-11-01 00:00:00, 2021-11-01 23:58:00]
[2021-11-01 00:00:00, 2021-11-01 23:59:00]
first window The statistics are the data of an interval ; the second window The statistics are the data of the first interval and the second interval ; Third window The statistic is the first interval , The data of the second interval and the third interval .
So in order to cumulate window Realize the above requirements , Concrete SQL as follows :
SELECT UNIX_TIMESTAMP(CAST(window_end AS STRING)) * 1000 as window_end,
window_start,
sum(money) as sum_money,
count(distinct id) as count_distinct_id
FROM TABLE(CUMULATE(
TABLE source_table
, DESCRIPTOR(row_time)
, INTERVAL '60' SECOND
, INTERVAL '1' DAY))
GROUP BY window_start,
window_end
among CUMULATE(TABLE source_table, DESCRIPTOR(row_time), INTERVAL '60' SECOND, INTERVAL '1' DAY) Medium INTERVAL '1' DAY Represents that the window size is 1 God ,INTERVAL '60' SECOND, The window is divided in steps of 60s.
among window_start, window_end The fields are cumulate window The type of automatic generation is timestamp(3).
window_start Fixed to the start time of the window .window_end Is the end time of a child window .
The final results are as follows .
input data :
| row_time | id | money |
|---|---|---|
| 2021-11-01 00:01:00 | A | 3 |
| 2021-11-01 00:01:00 | B | 5 |
| 2021-11-01 00:01:00 | A | 7 |
| 2021-11-01 00:02:00 | C | 3 |
| 2021-11-01 00:03:00 | C | 10 |
Output data :
| window_end | window_start | sum_money | count_distinct_id |
|---|---|---|---|
| 2021-11-21T00:01 | 1635696000000 | 2 | 15 |
| 2021-11-21T00:02 | 1635696000000 | 3 | 18 |
| 2021-11-21T00:03 | 1635696000000 | 3 | 28 |
Notes: When dividing the day level window, you must pay attention to the time zone !https://nightlies.apache.org/flink/flink-docs-master/zh/docs/dev/table/timezone/
4.3.cumulate window Principle analysis
First cumulate window It's a window , Its window calculation is triggered entirely by watermark Pushing . And tumble window equally .
Take the above day window minute accumulation case as an example :cumulate window One was maintained slice state and merged state,slice state Is the window data every minute ( It's called slicing ),merged state The function of is when watermark Push to the next minute , This one minute slice state Will be merge To merged stated in , therefore merged state The value in is the cumulative value from zero point of the day to the current minute , Our output is from merged state Got .
4.4.cumulate window How to solve tumble window + early-fire The problem of
- problem 1:tumble window + early-fire Deal with time triggered problems .
cumulate window It can be triggered with the advance of event time .
- problem 1:tumble window + early-fire retract Flow problem .
cumulate window yes append flow , Of course not retract The problem of flow .
5. summary
Source official account back office reply 1.13.2 cumulate window The wonderful way of analysis obtain .
This paper mainly introduces window tvf Realized cumulate window Scenario cases of aggregation indicators and their operation principle :
This paper introduces the accumulation in the cycle PV,UV It is our most commonly used indicator scenario .
stay tumble window + early-fire perhaps groupby + minibatch Accumulated during the calculation period PV,UV The problems are , The birth of cumulate window Helped us solve these problems , And a case is given to illustrate .
[

When we are doing flow batch Integration , What are we doing ?
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489496&idx=1&sn=016e580c5932b232005ff1d5e345588a&chksm=c1549b20f6231236571444c621f5c94dedc06daad7db0fa662674e544424e4a0d0fb16df3168&scene=21#wechat_redirect)
[

flink sql Know why ( Nine ):window tvf tumble window Wonderful ideas and solutions
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489404&idx=1&sn=af13ff231b113e702024e9efe068c54f&chksm=c1549b84f62312924ce6aed7df2578fa4cbe5d21ec1fe7f45f1bef80b0cf2f6967e11b0f053c&scene=21#wechat_redirect)
[

flink sql Know why ( 8、 ... and ):flink sql tumble window The wonderful way of analysis
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489338&idx=1&sn=941d8f05e194182071e55550969bc49e&chksm=c1549bc2f62312d48b6cd7d322ecf466df5610643a803f65db4f06e360a80a9471b1e1f35634&scene=21#wechat_redirect)
[

flink sql Know why ( 7、 ... and ): Not even the most suitable flink sql Of ETL and group agg I haven't seen any scenes ?
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489235&idx=1&sn=66c2b95aa3e22069a12b3d53b6d1d9f3&chksm=c1549a2bf623133d7a75732b5cea4bc304bf06a53777963d5cac8406958114873294dc3e4699&scene=21#wechat_redirect)
[

flink sql Know why ( 6、 ... and )| flink sql Appointment calcite( Just read this one )
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489112&idx=1&sn=21e86dab0e20da211c28cd0963b75ee2&chksm=c1549aa0f62313b6674833cd376b2a694752a154a63532ec9446c9c3013ef97f2d57b4e2eb64&scene=21#wechat_redirect)
[

flink sql Know why ( 5、 ... and )| Customize protobuf format
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247488994&idx=1&sn=20236350b1c8cfc4ec5055687b35603d&chksm=c154991af623100c46c0ed224a8264be08235ab30c9f191df7400e69a8ee873a3b74859fb0b7&scene=21#wechat_redirect)
边栏推荐
- Tencent cloud tdsql database delivery and operation and maintenance Junior Engineer - some questions of Tencent cloud cloudlite certification (TCA) examination
- CVPR 2022 图像恢复论文
- [exercice 7] [principe de la base de données]
- 2022-02-10 introduction to the design of incluxdb storage engine TSM
- 2022-02-11 practice of using freetsdb to build an influxdb cluster
- context. Getexternalfilesdir() is compared with the returned path
- 剑指 Offer 14- II. 剪绳子 II
- elk笔记24--用gohangout替代logstash消费日志
- My creation anniversary: the fifth anniversary
- Idea full text search shortcut ctr+shift+f failure problem
猜你喜欢

Seven second order ladrc-pll structure design of active disturbance rejection controller

【数据库原理及应用教程(第4版|微课版)陈志泊】【第三章习题】

Application of ncnn neural network computing framework in orange school orangepi 3 lts development board

【R】【密度聚类、层次聚类、期望最大化聚类】

Four problems and isolation level of MySQL concurrency

Social community forum app ultra-high appearance UI interface

My creation anniversary: the fifth anniversary

Drop down refresh conflicts with recyclerview sliding (swiperefreshlayout conflicts with recyclerview sliding)

Low code platform international multilingual (I18N) technical solution

如何在微信小程序中获取用户位置?
随机推荐
Application of ncnn Neural Network Computing Framework in Orange Pi 3 Lts Development Board
Harmonic current detection based on synchronous coordinate transformation
sitesCMS v3.0.2发布,升级JFinal等依赖
(latest version) WiFi distribution multi format + installation framework
2022-01-27 redis cluster cluster proxy predixy analysis
Sword finger offer 11 Rotate the minimum number of the array
SSH登录服务器发送提醒
elk笔记24--用gohangout替代logstash消费日志
Project video based on Linu development
IDEA 全文搜索快捷键Ctr+Shift+F失效问题
【習題五】【數據庫原理】
C graphical tutorial (Fourth Edition)_ Chapter 15 interface: interfacesamplep268
GaN图腾柱无桥 Boost PFC(单相)七-PFC占空比前馈
2022-02-14 incluxdb cluster write data writetoshard parsing
基于同步坐标变换的谐波电流检测
[Exercice 5] [principe de la base de données]
Understanding of CPU buffer line
Analysis of the influence of voltage loop on PFC system performance
01 three solutions to knapsack problem (greedy dynamic programming branch gauge)
SSH login server sends a reminder