当前位置:网站首页>Flink learning 8: data consistency
Flink learning 8: data consistency
2022-07-04 04:18:00 【hzp666】
1. brief introduction
In the distributed stream processing engine , High throughput Low latency , Is the core requirement .
At the same time, data consistency is also very important in distributed applications .
( In a precise scene , Accuracy and consistency are often required )
2.flink Data consistency
flink How to ensure the consistency of calculation state .
Asynchronous barrier snapshot mechanism , To achieve accurate data consistency .
When the task crashes or is canceled , You can use checkpoints or savepoints , To achieve recovery , Realize the replay of data flow , So as to achieve the consistency of tasks .( This mechanism will not sacrifice system performance )
2.1 Stateful and stateless events
Let's first look at what is a state event :
1. No state , That is, each event is independent , There is no correlation between events .
The output result is only related to the current event .
eg: Statistical weather temperature , When more than 40°C When , Issue high temperature alarm .( It has nothing to do with the previous temperature )
2. A stateful , That is, the event is related to the previous event state .
The output is a combination of previous events , Results of comprehensive consideration
eg: Count the recent 1 Hourly average temperature ,
2.2 Data consistency
When distributed systems introduce state , Naturally, the problem of data consistency is introduced .
According to the different correctness , Can be divided into 3 class :
1. The correctness is the lowest : At most once . When the fault occurs , Don't do anything? .
2. Medium accuracy : At least once . When the fault occurs , The system will not miss previous events , But the calculation may be repeated .( The final statistical value may be greater than or equal to the real data value )
3. The highest accuracy : Exactly the same . The aggregation result is consistent with the result without failure .
“” Exactly the same “” relative “” At least once “”, The system will be more complex , The processing speed will be relatively slow . Because there will be data alignment .
At the very beginning storm,samza At least once ,
Later, Storm Trident and Spark Streaming Although the accuracy and consistency are guaranteed , But at the expense of a lot of performance .
Flink Without sacrificing too much performance , Ensure accuracy once .
2.3 Flink Asynchronous barrier snapshot mechanism
2.3.1 Snapshot mechanism
First, let's see what the snapshot mechanism is : Record job status and data flow regularly
2.3.2 But the traditional snapshot mechanism , There are two main problems :
2.3.3 flink How to optimize the snapshot mechanism
1. Adopt asynchronous snapshot mechanism . be based on chandy-lamport Algorithm , A checkpoint mechanism has been developed , It is called asynchronous barrier checkpoint mechanism .
2. Asynchronous barrier snapshot mechanism
3. Checkpoint barrier , Is a special kind of internal message ,
Divide the data flow into multiple windows in time ,
One window corresponds to , A snapshot in the data stream .
Barrier by JobManager Broadcast to all computing tasks regularly source, And flow downstream with the data flow .
Each barrier is located at , Current snapshot and The split point of the next snapshot .
When the downstream data check the voucher , The snapshot action will be triggered , There is no need to pause this computing task .
4. In asynchronous checkpoints “ asynchronous ”
边栏推荐
- VIM mapping command
- Lnk2038 detected a mismatch of "runtimelibrary": the value "md_dynamicrelease" does not match the value "mdd_dynamicdebug" (in main.obj)
- laravel admin里百度编辑器自定义路径和文件名
- vim正确加区间注释
- [csrf-01] basic principle and attack and defense of Cross Site Request Forgery vulnerability
- 01 QEMU starts the compiled image vfs: unable to mount root FS on unknown block (0,0)
- Exercices de renforcement des déclarations SQL (MySQL 8.0 par exemple)
- “软硬皆施”,助力建成新型云计算数据中心
- The new data center helps speed up the construction of a digital economy with data as a key element
- 【华为云IoT】读书笔记之《万物互联:物联网核心技术与安全》第3章(上)
猜你喜欢
Infiltration practice guest account mimikatz sunflower SQL rights lifting offline decryption
mysql数据库的存储
Distributed system: what, why, how
Graduation project: design seckill e-commerce system
Penetration practice - sqlserver empowerment
透过JVM-SANDBOX源码,了解字节码增强技术原理
02 specific implementation of LS command
如何远程办公更有效率 | 社区征文
Brief explanation of depth first search (with basic questions)
PPt 教程,如何在 PowerPoint 中将演示文稿另存为 PDF 文件?
随机推荐
[microservice openfeign] use openfeign to remotely call the file upload interface
mysql数据库的存储
02 specific implementation of LS command
Katalon framework tests web (XXI) to obtain element attribute assertions
[Yugong series] go teaching course 002 go language environment installation in July 2022
Tcpclientdemo for TCP protocol interaction
Brief explanation of depth first search (with basic questions)
[paddleseg source code reading] normalize operation of paddleseg transform
Flink学习7:应用程序结构
AAAI2022 | Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving
SQL statement strengthening exercise (MySQL 8.0 as an example)
Katalon框架测试web(二十一)获取元素属性断言
图解网络:什么是热备份路由器协议HSRP?
VIM mapping command
函数计算异步任务能力介绍 - 任务触发去重
【微服务|openfeign】使用openfeign远程调用文件上传接口
Perf simple process for multithreaded profile
【愚公系列】2022年7月 Go教学课程 002-Go语言环境安装
Sales management system of lightweight enterprises based on PHP
How to telecommute more efficiently | community essay solicitation