当前位置:网站首页>Flink learning 8: data consistency
Flink learning 8: data consistency
2022-07-04 04:18:00 【hzp666】

1. brief introduction
In the distributed stream processing engine , High throughput Low latency , Is the core requirement .
At the same time, data consistency is also very important in distributed applications .
( In a precise scene , Accuracy and consistency are often required )

2.flink Data consistency
flink How to ensure the consistency of calculation state .
Asynchronous barrier snapshot mechanism , To achieve accurate data consistency .
When the task crashes or is canceled , You can use checkpoints or savepoints , To achieve recovery , Realize the replay of data flow , So as to achieve the consistency of tasks .( This mechanism will not sacrifice system performance )

2.1 Stateful and stateless events
Let's first look at what is a state event :
1. No state , That is, each event is independent , There is no correlation between events .
The output result is only related to the current event .
eg: Statistical weather temperature , When more than 40°C When , Issue high temperature alarm .( It has nothing to do with the previous temperature )
2. A stateful , That is, the event is related to the previous event state .
The output is a combination of previous events , Results of comprehensive consideration
eg: Count the recent 1 Hourly average temperature ,


2.2 Data consistency
When distributed systems introduce state , Naturally, the problem of data consistency is introduced .
According to the different correctness , Can be divided into 3 class :
1. The correctness is the lowest : At most once . When the fault occurs , Don't do anything? .
2. Medium accuracy : At least once . When the fault occurs , The system will not miss previous events , But the calculation may be repeated .( The final statistical value may be greater than or equal to the real data value )
3. The highest accuracy : Exactly the same . The aggregation result is consistent with the result without failure .
“” Exactly the same “” relative “” At least once “”, The system will be more complex , The processing speed will be relatively slow . Because there will be data alignment .

At the very beginning storm,samza At least once ,
Later, Storm Trident and Spark Streaming Although the accuracy and consistency are guaranteed , But at the expense of a lot of performance .
Flink Without sacrificing too much performance , Ensure accuracy once .

2.3 Flink Asynchronous barrier snapshot mechanism
2.3.1 Snapshot mechanism
First, let's see what the snapshot mechanism is : Record job status and data flow regularly

2.3.2 But the traditional snapshot mechanism , There are two main problems :

2.3.3 flink How to optimize the snapshot mechanism
1. Adopt asynchronous snapshot mechanism . be based on chandy-lamport Algorithm , A checkpoint mechanism has been developed , It is called asynchronous barrier checkpoint mechanism .

2. Asynchronous barrier snapshot mechanism

3. Checkpoint barrier , Is a special kind of internal message ,
Divide the data flow into multiple windows in time ,
One window corresponds to , A snapshot in the data stream .
Barrier by JobManager Broadcast to all computing tasks regularly source, And flow downstream with the data flow .
Each barrier is located at , Current snapshot and The split point of the next snapshot .
When the downstream data check the voucher , The snapshot action will be triggered , There is no need to pause this computing task .

4. In asynchronous checkpoints “ asynchronous ”

边栏推荐
- 智慧地铁| 云计算为城市地铁交通注入智慧
- LNK2038 检测到“RuntimeLibrary”的不匹配项: 值“MD_DynamicRelease”不匹配值“MDd_DynamicDebug”(main.obj 中)
- 2020 Bioinformatics | TransformerCPI
- 毕业三年,远程半年 | 社区征文
- pytest多进程/多线程执行测试用例
- How to telecommute more efficiently | community essay solicitation
- There is a problem that the package cannot be parsed in the like project
- 三菱M70宏变量读取三菱M80公共变量采集三菱CNC变量读取采集三菱CNC远程刀补三菱机床在线刀补三菱数控在线测量
- 毕业总结
- Cesiumjs 2022^ source code interpretation [0] - article directory and source code engineering structure
猜你喜欢

Activiti7 task service - process variables (setvariable and setvariablelocal)

Graduation project: design seckill e-commerce system

拼夕夕二面:说说布隆过滤器与布谷鸟过滤器?应用场景?我懵了。。

pytest多进程/多线程执行测试用例

还原窗口位置的微妙之处

My opinion on how to effectively telecommute | community essay solicitation

Msgraphmailbag - search only driveitems of file types

I Build a simple microservice project

The difference between bagging and boosting in machine learning

Wechat official account web page authorization
随机推荐
Flink learning 6: programming model
毕业三年,远程半年 | 社区征文
Cesiumjs 2022^ source code interpretation [0] - article directory and source code engineering structure
ctf-pikachu-CSRF
【微服务|openfeign】使用openfeign远程调用文件上传接口
Cesiumjs 2022^ source code interpretation [0] - article directory and source code engineering structure
STM32外接DHT11显示温湿度
JDBC advanced
用于TCP协议交互的TCPClientDemo
ctf-pikachu-XSS
2021 RSC | Drug–target affinity prediction using graph neural network and contact maps
Global exposure and roller shutter exposure of industrial cameras
vim映射命令
[paddleseg source code reading] normalize operation of paddleseg transform
Understand the principle of bytecode enhancement technology through the jvm-sandbox source code
I Build a simple microservice project
【微服务|openfeign】@FeignClient详解
【读书会第十三期】视频文件的封装格式
Exercises in quantum mechanics
Sales management system of lightweight enterprises based on PHP