当前位置:网站首页>[Flink] temporal semantics and watermark
[Flink] temporal semantics and watermark
2022-07-04 07:11:00 【飝 鱻】
Temporal semantics and WaterMark
Time semantics
stay Flink Middle time can be divided into three kinds , Namely
1️⃣:Event Time: When the event was created
2️⃣:Ingestion Time: Data into the Flink Time for
3️⃣:Processing Time: Local system time to execute the operator , Machine related
- Talking about these three times is mainly to lead to watemark, Because in many scenes , The time of the event is what our business cares about , Calculate based on event time , Adopt a strategy , Whether it is real-time streaming data or historical data , Can ensure that the results are consistent, in order to more vividly describe the event time and the event flow into the system ( Here it means Flink) The relationship between , But in the process of transmission, the data cannot be transmitted to the program in the order of timestamp
1️⃣: For real-time stream computing , The general processing method is to process one element by one , In this way, real-time . But based on Event Time Some applications of , We require the accuracy of processing , Must be cached , Because when the first event arrives , I don't know that the later event occurred earlier than the current event , Therefore, it is necessary to wait until at least the second event arrives to determine whether to output the calculation result of the first event , This will cause delays .
2️⃣: But after the second event arrives , Is there any event earlier than its occurrence , Whether to continue caching and wait ? If you wait , How long to wait ? Therefore, there must be a mechanism strategy to ensure that there is no waiting , Trigger the current cached data calculation and output .
3️⃣: that , The current calculation has been calculated and output , If an earlier event arrives late , How to deal with ? We thought of two processing strategies :1, Add the late event to the last cached data and recalculate the output ; 2, Discard do not calculate the second strategy discard do not calculate easy to handle , The first strategy requires the last cached data , Here we will face another two problems :1, The cache cannot be cleared after the last cache data calculation ; 2, How long should the cache be kept , Because if you keep the cache , It is bound to increase the memory pressure of the whole system .
- It needs to be used waterMark 了
WaterMark
Watermark It's a measure Event Time The mechanism of progress , It's a hidden property of the data itself . Usually, a field in a record represents the occurrence time of the record . For example, based on Event Time The data of , Each of them contains a type of timestamp Properties of
、Based on this attribute, define a policy as offset 3s Of watermark, The watermark timestamp of this data is :
Timestamp of this attribute -3000
At this time, if the time of the time window we define is 15s, When the fifteenth second is up, it won't end , Because the waterMark Time ratio window Three seconds slow
The illustration watermark
We set an offset to 5 Of a second watermark Strategy , The size is 10 Second window , In order to better understand watermark, We make the following analogy , The time and space of data occurrence is A Time and space ,watermark The time space of is B Time and space , be B Time and space are always better than A Time and space are late 5 Second occurs
Pictured above , The small rectangular box represents the window size , The size is 10 second ,Flink By default, it will be based on the selected time ( Here is Event Time) Assign window . Suppose the time when the data occurred rowtime from 0 Start , Then the pre allocated window even [0,10),[10,20],[20,30],[30,40]
A The time on the timeline is certain , Again B The time on the timeline is also certain ,B The time on the space-time axis is relative A The time on the time axis is always late 5 second . In the same time coordinate system S Next , hypothesis S Time coordinates and A Time is the same , be A The time on the timeline is S The time value remains unchanged in the coordinate system , but B The time on the timeline is S The time value changes in the time coordinate system “ Big ” 5s 了 . In the first window [0,10], If a record rowtime by 10s Data in S In coordinate system 9s Arrived at the , But its watemark It's actually 10-5 = 5s, Have not reached the first window end Time, Therefore, window calculation will not be triggered ; If a record rowtime by 8s Data in S In coordinate system 12s Arrived at the , But it's watermark It's actually 8-5=3s Less than the previous watermark, Therefore, it is not updated at this time watermark( In general ),watermark The timestamp of is still 5 second , The trigger condition of the first window is not reached ; If a record rowtime by 12s Data in S In coordinate system 13s Arrived at the , Its watemark It's actually 12-5 = 7 > 5, to update watermark The timestamp of is 7 second , But the trigger condition of a window is not reached ; If a record rowtime by 15s The data of has arrived , Its watemark It's actually 15 -5 = 10s, The trigger condition is reached , Greater than window endTime, So the window triggers calculation , If there is another rowtime<10s The data arrives at , Will be discarded ( No settings latness Options )
- Code implementation
Event Time The use of must specify the timestamp call in the data sourceassignTimestampAndWatermarks
Method , Pass in aBoundedOutOfOrdernessTimestampExtractor
, You can specify - If the data is ordered , There is no need to delay triggering , You can just specify a timestamp
- About latness Set up ,latness Mainly dealing with late data
OutputTag<SenSorReading> late = new OutputTag<>("late");
dataStream
.keyBy("id")
.timeWindow(Time.seconds(15))
.allowedLateness(Time.minutes(1))// Allow the supermarket one minute
.sideOutputLateData(late);// The timeout data is divided into... Separately i A flow
- Data that takes less than one minute can be added to the calculation , If it takes more than one minute, it will be saved to late Waiting in the stream
Jump top
边栏推荐
- Cell reports: Wei Fuwen group of the Institute of zoology, Chinese Academy of Sciences analyzes the function of seasonal changes in the intestinal flora of giant pandas
- About how idea sets up shortcut key sets
- MySQL 45 lecture learning notes (VII) line lock
- 【GF(q)+LDPC】基于二值图GF(q)域的规则LDPC编译码设计与matlab仿真
- Why does the producer / consumer mode wait () use while instead of if (clear and understandable)
- Highly paid programmers & interview questions: how does redis of series 119 realize distributed locks?
- 【FreeRTOS】FreeRTOS學習筆記(7)— 手寫FreeRTOS雙向鏈錶/源碼分析
- There is no Chinese prompt below when inputting text in win10 Microsoft Pinyin input method
- Centos8 install mysql 7 unable to start up
- Computer connects raspberry pie remotely through putty
猜你喜欢
Pangu open source: multi support and promotion, the wave of chip industry
The most effective futures trend strategy: futures reverse merchandising
Boosting the Performance of Video Compression Artifact Reduction with Reference Frame Proposals and
How to share the source code anti disclosure scheme
Splicing plain text into JSON strings - easy language method
Review of enterprise security incidents: how can enterprises do a good job in preventing source code leakage?
响应式移动Web测试题
BasicVSR++: Improving Video Super-Resolutionwith Enhanced Propagation and Alignment
Shopping malls, storerooms, flat display, user-defined maps can also be played like this!
uniapp小程序分包
随机推荐
How does the inner roll break?
Selection (023) - what are the three stages of event propagation?
How to buy financial products in 2022?
Download address of the official website of national economic industry classification gb/t 4754-2017
[freertos] freertos Learning notes (7) - written freertos bidirectionnel Link LIST / source analysis
Mobile adaptation: vw/vh
Summary of June 2022
云Redis 有什么用? 云redis怎么用?
Google Chrome Portable Google Chrome browser portable version official website download method
请问旧版的的常用SQL怎么迁移到新版本里来?
Tar source code analysis 8
What is the use of cloud redis? How to use cloud redis?
在已經知道錶格列勾選一個顯示一列
Splicing plain text into JSON strings - easy language method
A new understanding of how to encrypt industrial computers: host reinforcement application
Industrial computer anti-virus
BasicVSR++: Improving Video Super-Resolutionwith Enhanced Propagation and Alignment
MySQL relearn 2- Alibaba cloud server CentOS installation mysql8.0
About how idea sets up shortcut key sets
2022年6月小结