当前位置:网站首页>Flick SQL knows why (10): everyone uses accumulate window to calculate cumulative indicators
Flick SQL knows why (10): everyone uses accumulate window to calculate cumulative indicators
2022-07-03 13:10:00 【Big data sheep said】
What do you think , Baby , Not three in a row ???( Focus on + give the thumbs-up + Look again ), Affirmation of bloggers , It will urge bloggers to continuously output more high-quality practical content !!!
1. Preface
Source official account back office reply 1.13.2 cumulate window The wonderful way of analysis obtain .
This section is the third chapter of window aggregation , Last section introduced 1.13 window tvf tumble window Realization , This section focuses on the introduction 1.13. window tvf A blockbuster update for , namely cumulate window.
This section introduces you in detail from the following chapters cumulate window The ability of .
Introduction to application scenarios
Expected results
Solution Introduction
Summary and prospect
2. Introduction to application scenarios
Let's start with a simple survey : In a real-time scene , What kind of indicator demand scenario have you seen most ?
answer : Bloggers believe , It's not that PCU( That is, online at the same time PV,UV), It's cumulative over the period PV,UV indicators ( For example, the amount accumulated to the current minute every day PV,UV). Because such indicators are cumulative in a period of time , It has more statistical analysis value for analysts , And almost all composite indicators are based on the statistics of such indicators ( Otherwise, why do you need a day's data offline , Not a minute of data ).
This article will introduce the accumulation in the cycle PV,UV Indicators in flink 1.13 Version of the best solution .
3. Expected results
First, let's take a real case to see in the scenario of specific input values , What the output value should look like .
indicators : The cumulative number of minutes up to the current day money(sum(money)), duplicate removal id Count (count(distinct id)). Every day Represents that the window size is 1 God , minute Represents that the moving step is in minute level .
A wave of input data :
time | id | money |
---|---|---|
2021-11-01 00:01:00 | A | 3 |
2021-11-01 00:01:00 | B | 5 |
2021-11-01 00:01:00 | A | 7 |
2021-11-01 00:02:00 | C | 3 |
2021-11-01 00:03:00 | C | 10 |
Expected output data :
time | count distinct id | sum money |
---|---|---|
2021-11-01 00:01:00 | 2 | 15 |
2021-11-01 00:02:00 | 3 | 18 |
2021-11-01 00:03:00 | 3 | 28 |
Convert to a line chart with a length like this :
Accumulated on that day
You can see , Its characteristic is , The output result of each minute is the result accumulated from the zero point of the day to the current .
4. Solution Introduction
4.1.flink 1.13 Before
There are two alternative solutions
tumble window(1 Sky window ) + early-fire(1 minute )
group by(1 God ) + minibatch(1 minute )
But both solutions produce retract flow , About retract See the following article for the disadvantages of streaming :
[
Step on a hole | flink sql count And this kind of pit !
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247487969&idx=1&sn=7215b101f49bd5e62b81a746dc6b15e6&chksm=c1549d19f623140f84cd436dd02be7e6ae08e28e5117bff88166237c09c17ca491a3e88a3a3d&scene=21#wechat_redirect)
also tumble window + early-fire The trigger mechanism is based on processing time rather than event time , See the following article for specific disadvantages :
https://mp.weixin.qq.com/s/L8-RSS6v3Ppts60CWngiOA
4.2.flink 1.13 And after
The birth of cumulate window solution , See the official website link for details :
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/queries/window-tvf/#cumulate
As shown in the following official website documents , Introduce cumulate window The first sentence is cumulate window Very suitable for previous use tumble window + early-fire Scene . so to speak cumulate window It is accumulated in the user's calculation cycle PV,UV When it comes to indicators , Used tumble window + early-fire It is found that there are many pits in this scheme , The birth of the !
cumulate window
The computer system is shown in the figure below :
cumulate window
Or take the case just now to illustrate , Take the sky as the window , Output the cumulative value from zero point of the day to the current minute every minute , stay cumulate window in , The window division rules are as follows :
[2021-11-01 00:00:00, 2021-11-01 00:01:00]
[2021-11-01 00:00:00, 2021-11-01 00:02:00]
[2021-11-01 00:00:00, 2021-11-01 00:03:00] …
[2021-11-01 00:00:00, 2021-11-01 23:58:00]
[2021-11-01 00:00:00, 2021-11-01 23:59:00]
first window The statistics are the data of an interval ; the second window The statistics are the data of the first interval and the second interval ; Third window The statistic is the first interval , The data of the second interval and the third interval .
So in order to cumulate window Realize the above requirements , Concrete SQL as follows :
SELECT UNIX_TIMESTAMP(CAST(window_end AS STRING)) * 1000 as window_end,
window_start,
sum(money) as sum_money,
count(distinct id) as count_distinct_id
FROM TABLE(CUMULATE(
TABLE source_table
, DESCRIPTOR(row_time)
, INTERVAL '60' SECOND
, INTERVAL '1' DAY))
GROUP BY window_start,
window_end
among CUMULATE(TABLE source_table, DESCRIPTOR(row_time), INTERVAL '60' SECOND, INTERVAL '1' DAY)
Medium INTERVAL '1' DAY
Represents that the window size is 1 God ,INTERVAL '60' SECOND
, The window is divided in steps of 60s.
among window_start, window_end
The fields are cumulate window The type of automatic generation is timestamp(3).
window_start
Fixed to the start time of the window .window_end
Is the end time of a child window .
The final results are as follows .
input data :
row_time | id | money |
---|---|---|
2021-11-01 00:01:00 | A | 3 |
2021-11-01 00:01:00 | B | 5 |
2021-11-01 00:01:00 | A | 7 |
2021-11-01 00:02:00 | C | 3 |
2021-11-01 00:03:00 | C | 10 |
Output data :
window_end | window_start | sum_money | count_distinct_id |
---|---|---|---|
2021-11-21T00:01 | 1635696000000 | 2 | 15 |
2021-11-21T00:02 | 1635696000000 | 3 | 18 |
2021-11-21T00:03 | 1635696000000 | 3 | 28 |
Notes: When dividing the day level window, you must pay attention to the time zone !https://nightlies.apache.org/flink/flink-docs-master/zh/docs/dev/table/timezone/
4.3.cumulate window Principle analysis
First cumulate window It's a window , Its window calculation is triggered entirely by watermark Pushing . And tumble window equally .
Take the above day window minute accumulation case as an example :cumulate window One was maintained slice state and merged state,slice state Is the window data every minute ( It's called slicing ),merged state The function of is when watermark Push to the next minute , This one minute slice state Will be merge To merged stated in , therefore merged state The value in is the cumulative value from zero point of the day to the current minute , Our output is from merged state Got .
4.4.cumulate window How to solve tumble window + early-fire The problem of
- problem 1:tumble window + early-fire Deal with time triggered problems .
cumulate window It can be triggered with the advance of event time .
- problem 1:tumble window + early-fire retract Flow problem .
cumulate window yes append flow , Of course not retract The problem of flow .
5. summary
Source official account back office reply 1.13.2 cumulate window The wonderful way of analysis obtain .
This paper mainly introduces window tvf Realized cumulate window Scenario cases of aggregation indicators and their operation principle :
This paper introduces the accumulation in the cycle PV,UV It is our most commonly used indicator scenario .
stay tumble window + early-fire perhaps groupby + minibatch Accumulated during the calculation period PV,UV The problems are , The birth of cumulate window Helped us solve these problems , And a case is given to illustrate .
[
When we are doing flow batch Integration , What are we doing ?
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489496&idx=1&sn=016e580c5932b232005ff1d5e345588a&chksm=c1549b20f6231236571444c621f5c94dedc06daad7db0fa662674e544424e4a0d0fb16df3168&scene=21#wechat_redirect)
[
flink sql Know why ( Nine ):window tvf tumble window Wonderful ideas and solutions
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489404&idx=1&sn=af13ff231b113e702024e9efe068c54f&chksm=c1549b84f62312924ce6aed7df2578fa4cbe5d21ec1fe7f45f1bef80b0cf2f6967e11b0f053c&scene=21#wechat_redirect)
[
flink sql Know why ( 8、 ... and ):flink sql tumble window The wonderful way of analysis
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489338&idx=1&sn=941d8f05e194182071e55550969bc49e&chksm=c1549bc2f62312d48b6cd7d322ecf466df5610643a803f65db4f06e360a80a9471b1e1f35634&scene=21#wechat_redirect)
[
flink sql Know why ( 7、 ... and ): Not even the most suitable flink sql Of ETL and group agg I haven't seen any scenes ?
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489235&idx=1&sn=66c2b95aa3e22069a12b3d53b6d1d9f3&chksm=c1549a2bf623133d7a75732b5cea4bc304bf06a53777963d5cac8406958114873294dc3e4699&scene=21#wechat_redirect)
[
flink sql Know why ( 6、 ... and )| flink sql Appointment calcite( Just read this one )
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247489112&idx=1&sn=21e86dab0e20da211c28cd0963b75ee2&chksm=c1549aa0f62313b6674833cd376b2a694752a154a63532ec9446c9c3013ef97f2d57b4e2eb64&scene=21#wechat_redirect)
[
flink sql Know why ( 5、 ... and )| Customize protobuf format
](http://mp.weixin.qq.com/s?__biz=MzkxNjA1MzM5OQ==&mid=2247488994&idx=1&sn=20236350b1c8cfc4ec5055687b35603d&chksm=c154991af623100c46c0ed224a8264be08235ab30c9f191df7400e69a8ee873a3b74859fb0b7&scene=21#wechat_redirect)
边栏推荐
- C graphical tutorial (Fourth Edition)_ Chapter 15 interface: interfacesamplep268
- Grid connection - Analysis of low voltage ride through and island coexistence
- Luogup3694 Bangbang chorus standing in line
- 【習題五】【數據庫原理】
- Understanding of CPU buffer line
- 剑指 Offer 15. 二进制中1的个数
- 剑指 Offer 14- II. 剪绳子 II
- 【习题六】【数据库原理】
- [exercise 6] [Database Principle]
- Logback 日志框架
猜你喜欢
studio All flavors must now belong to a named flavor dimension. Learn more
4. 无线体内纳米网:电磁传播模型和传感器部署要点
Drop down refresh conflicts with recyclerview sliding (swiperefreshlayout conflicts with recyclerview sliding)
2022-02-14 analysis of the startup and request processing process of the incluxdb cluster Coordinator
Grid connection - Analysis of low voltage ride through and island coexistence
IDEA 全文搜索快捷键Ctr+Shift+F失效问题
OpenHarmony应用开发之ETS开发方式中的Image组件
[Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter III exercises]
Xctf mobile--app3 problem solving
Deeply understand the mvcc mechanism of MySQL
随机推荐
[exercise 5] [Database Principle]
Sword finger offer 15 Number of 1 in binary
高效能人士的七个习惯
[Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter 6 exercises]
Seven habits of highly effective people
Sitescms v3.1.0 release, launch wechat applet
How to stand out quickly when you are new to the workplace?
【数据库原理及应用教程(第4版|微课版)陈志泊】【SQLServer2012综合练习】
CVPR 2022 image restoration paper
2022-01-27 use liquibase to manage MySQL execution version
Integer case study of packaging
[Database Principle and Application Tutorial (4th Edition | wechat Edition) Chen Zhibo] [Chapter V exercises]
context. Getexternalfilesdir() is compared with the returned path
[data mining review questions]
C graphical tutorial (Fourth Edition)_ Chapter 18 enumerator and iterator: enumerator samplep340
2022-01-27 redis cluster cluster proxy predixy analysis
自抗扰控制器七-二阶 LADRC-PLL 结构设计
Some thoughts on business
Glide 4.6.1 API initial
[review questions of database principles]