当前位置:网站首页>Flink learning 8: data consistency
Flink learning 8: data consistency
2022-07-04 04:18:00 【hzp666】
1. brief introduction
In the distributed stream processing engine , High throughput Low latency , Is the core requirement .
At the same time, data consistency is also very important in distributed applications .
( In a precise scene , Accuracy and consistency are often required )
2.flink Data consistency
flink How to ensure the consistency of calculation state .
Asynchronous barrier snapshot mechanism , To achieve accurate data consistency .
When the task crashes or is canceled , You can use checkpoints or savepoints , To achieve recovery , Realize the replay of data flow , So as to achieve the consistency of tasks .( This mechanism will not sacrifice system performance )
2.1 Stateful and stateless events
Let's first look at what is a state event :
1. No state , That is, each event is independent , There is no correlation between events .
The output result is only related to the current event .
eg: Statistical weather temperature , When more than 40°C When , Issue high temperature alarm .( It has nothing to do with the previous temperature )
2. A stateful , That is, the event is related to the previous event state .
The output is a combination of previous events , Results of comprehensive consideration
eg: Count the recent 1 Hourly average temperature ,
2.2 Data consistency
When distributed systems introduce state , Naturally, the problem of data consistency is introduced .
According to the different correctness , Can be divided into 3 class :
1. The correctness is the lowest : At most once . When the fault occurs , Don't do anything? .
2. Medium accuracy : At least once . When the fault occurs , The system will not miss previous events , But the calculation may be repeated .( The final statistical value may be greater than or equal to the real data value )
3. The highest accuracy : Exactly the same . The aggregation result is consistent with the result without failure .
“” Exactly the same “” relative “” At least once “”, The system will be more complex , The processing speed will be relatively slow . Because there will be data alignment .
At the very beginning storm,samza At least once ,
Later, Storm Trident and Spark Streaming Although the accuracy and consistency are guaranteed , But at the expense of a lot of performance .
Flink Without sacrificing too much performance , Ensure accuracy once .
2.3 Flink Asynchronous barrier snapshot mechanism
2.3.1 Snapshot mechanism
First, let's see what the snapshot mechanism is : Record job status and data flow regularly
2.3.2 But the traditional snapshot mechanism , There are two main problems :
2.3.3 flink How to optimize the snapshot mechanism
1. Adopt asynchronous snapshot mechanism . be based on chandy-lamport Algorithm , A checkpoint mechanism has been developed , It is called asynchronous barrier checkpoint mechanism .
2. Asynchronous barrier snapshot mechanism
3. Checkpoint barrier , Is a special kind of internal message ,
Divide the data flow into multiple windows in time ,
One window corresponds to , A snapshot in the data stream .
Barrier by JobManager Broadcast to all computing tasks regularly source, And flow downstream with the data flow .
Each barrier is located at , Current snapshot and The split point of the next snapshot .
When the downstream data check the voucher , The snapshot action will be triggered , There is no need to pause this computing task .
4. In asynchronous checkpoints “ asynchronous ”
边栏推荐
- Penetration practice - sqlserver empowerment
- 【微服务|openfeign】feign的两种降级方式|Fallback|FallbackFactory
- Pytest multi process / multi thread execution test case
- Unity 绘制弹球和台球的运动轨迹
- 【华为云IoT】读书笔记之《万物互联:物联网核心技术与安全》第3章(上)
- Tcpclientdemo for TCP protocol interaction
- Sales management system of lightweight enterprises based on PHP
- SQL statement strengthening exercise (MySQL 8.0 as an example)
- 图解网络:什么是热备份路由器协议HSRP?
- 01 QEMU starts the compiled image vfs: unable to mount root FS on unknown block (0,0)
猜你喜欢
随机推荐
Perf simple process for multithreaded profile
Distributed system: what, why, how
*. No main manifest attribute in jar
【读书会第十三期】视频文件的封装格式
ctf-pikachu-XSS
拼夕夕二面:说说布隆过滤器与布谷鸟过滤器?应用场景?我懵了。。
Introduction to asynchronous task capability of function calculation - task trigger de duplication
Three years of graduation, half a year of distance | community essay solicitation
【读书会第十三期】多媒体处理工具 FFmpeg 工具集
毕业总结
Pytest multi process / multi thread execution test case
[book club issue 13] packaging format of video files
量子力学习题
Unity 绘制弹球和台球的运动轨迹
Cesiumjs 2022^ source code interpretation [0] - article directory and source code engineering structure
【webrtc】m98 ninja 构建和编译指令
How was my life in 2021
Mitsubishi M70 macro variable reading Mitsubishi M80 public variable acquisition Mitsubishi CNC variable reading acquisition Mitsubishi CNC remote tool compensation Mitsubishi machine tool online tool
vim映射命令
Tcpclientdemo for TCP protocol interaction