当前位置:网站首页>Flink learning 8: data consistency
Flink learning 8: data consistency
2022-07-04 04:18:00 【hzp666】

1. brief introduction
In the distributed stream processing engine , High throughput Low latency , Is the core requirement .
At the same time, data consistency is also very important in distributed applications .
( In a precise scene , Accuracy and consistency are often required )

2.flink Data consistency
flink How to ensure the consistency of calculation state .
Asynchronous barrier snapshot mechanism , To achieve accurate data consistency .
When the task crashes or is canceled , You can use checkpoints or savepoints , To achieve recovery , Realize the replay of data flow , So as to achieve the consistency of tasks .( This mechanism will not sacrifice system performance )

2.1 Stateful and stateless events
Let's first look at what is a state event :
1. No state , That is, each event is independent , There is no correlation between events .
The output result is only related to the current event .
eg: Statistical weather temperature , When more than 40°C When , Issue high temperature alarm .( It has nothing to do with the previous temperature )
2. A stateful , That is, the event is related to the previous event state .
The output is a combination of previous events , Results of comprehensive consideration
eg: Count the recent 1 Hourly average temperature ,


2.2 Data consistency
When distributed systems introduce state , Naturally, the problem of data consistency is introduced .
According to the different correctness , Can be divided into 3 class :
1. The correctness is the lowest : At most once . When the fault occurs , Don't do anything? .
2. Medium accuracy : At least once . When the fault occurs , The system will not miss previous events , But the calculation may be repeated .( The final statistical value may be greater than or equal to the real data value )
3. The highest accuracy : Exactly the same . The aggregation result is consistent with the result without failure .
“” Exactly the same “” relative “” At least once “”, The system will be more complex , The processing speed will be relatively slow . Because there will be data alignment .

At the very beginning storm,samza At least once ,
Later, Storm Trident and Spark Streaming Although the accuracy and consistency are guaranteed , But at the expense of a lot of performance .
Flink Without sacrificing too much performance , Ensure accuracy once .

2.3 Flink Asynchronous barrier snapshot mechanism
2.3.1 Snapshot mechanism
First, let's see what the snapshot mechanism is : Record job status and data flow regularly

2.3.2 But the traditional snapshot mechanism , There are two main problems :

2.3.3 flink How to optimize the snapshot mechanism
1. Adopt asynchronous snapshot mechanism . be based on chandy-lamport Algorithm , A checkpoint mechanism has been developed , It is called asynchronous barrier checkpoint mechanism .

2. Asynchronous barrier snapshot mechanism

3. Checkpoint barrier , Is a special kind of internal message ,
Divide the data flow into multiple windows in time ,
One window corresponds to , A snapshot in the data stream .
Barrier by JobManager Broadcast to all computing tasks regularly source, And flow downstream with the data flow .
Each barrier is located at , Current snapshot and The split point of the next snapshot .
When the downstream data check the voucher , The snapshot action will be triggered , There is no need to pause this computing task .

4. In asynchronous checkpoints “ asynchronous ”

边栏推荐
- leetcode刷题:二叉树07(二叉树的最大深度)
- Exercises in quantum mechanics
- vue多级路由嵌套怎么动态缓存组件
- Common methods of threads
- My opinion on how to effectively telecommute | community essay solicitation
- Infiltration practice guest account mimikatz sunflower SQL rights lifting offline decryption
- Graduation project: design seckill e-commerce system
- 【读书会第十三期】多媒体处理工具 FFmpeg 工具集
- Support the first triggered go ticker
- Activiti7 task service - process variables (setvariable and setvariablelocal)
猜你喜欢

Wechat official account web page authorization

mysql数据库的存储

如何有效远程办公之我见 | 社区征文

SQL語句加强練習(MySQL8.0為例)

Brief explanation of depth first search (with basic questions)

leetcode刷题:二叉树04(二叉树的层序遍历)

Flink学习6:编程模型

Restore the subtlety of window position

Flink learning 7: application structure

leetcode刷题:二叉树07(二叉树的最大深度)
随机推荐
JS实现文字滚动 跑马灯效果
Graduation summary
Katalon中控件的参数化
VIM mapping command
01 QEMU starts the compiled image vfs: unable to mount root FS on unknown block (0,0)
leetcode刷题:二叉树05(翻转二叉树)
*. No main manifest attribute in jar
Pytest multi process / multi thread execution test case
Perf simple process for multithreaded profile
Unity 绘制弹球和台球的运动轨迹
Unity移动端游戏性能优化简谱之 画面表现与GPU压力的权衡
Katalon framework test web (XXVI) automatic email
Tcpclientdemo for TCP protocol interaction
【读书会第十三期】多媒体处理工具 FFmpeg 工具集
【微服务|openfeign】feign的两种降级方式|Fallback|FallbackFactory
How to dynamically cache components in Vue multi-level route nesting
疫情来袭--远程办公之思考|社区征文
PostgreSQL users cannot create table configurations by themselves
SDP中的SPA
Flink学习7:应用程序结构