当前位置:网站首页>Talk about the understanding of fault tolerance mechanism and state consistency in Flink framework
Talk about the understanding of fault tolerance mechanism and state consistency in Flink framework
2022-07-05 10:41:00 【InfoQ】
Fault tolerance mechanism
- restart app
- from checkpoint Read status in , Reset the status
- Start consuming and processing all data from checkpoint to failure
public class Test{
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Configuration of checkpoints
// The checkpoint start cycle is milliseconds
env.enableCheckpointing(300);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// Set timeout
env.getCheckpointConfig().setCheckpointTimeout(60000);
// Set the maximum execution checkp Number
env.getCheckpointConfig().setMaxConcurrentCheckpoints(2);
// Set the minimum interval , Two checkpoint Between , The interval between the previous completion and the next start cannot be less than XXX
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(100);
//env.getCheckpointConfig().setPreferCheckpointForRecovery();
// tolerate checkpoint How many times have you failed
env.getCheckpointConfig().setTolerableCheckpointFailureNumber(0);
// Configuration of restart policy
// Fixed delay restart every other 10 Seconds to restart three times
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3,10000L));
// Failure rate restart stay 10 Count restarts within minutes , stay 10 Try to restart three times in minutes , Every septum 1 Minutes to restart
env.setRestartStrategy(RestartStrategies.failureRateRestart(3, Time.minutes(10),Time.minutes(1)));
DataStreamSource<String> inputStream = env.socketTextStream("localhost", 7777);
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
env.execute();
}
}
State consistency
- AT-MOST-ONCE( At most once ) When the mission fails , The easiest way is to do nothing , Neither restore the lost state , No replay of lost data .At-most-once The meaning of semantics is to handle an event at most once .
- AT-LEAST-ONCE( At least once ) In most real application scenarios , We hope not to lose Events . This type of protection is called atleast-once, It means that all the events have been dealt with , And some events can be handled multiple times .
- EXACTLY-ONCE( Exactly once ) Just once is the strictest guarantee , It's also the most difficult . Just dealing with semantics once doesn't just mean that no events are lost , It also means for every data , The internal state is only updated once .
@Public
public enum CheckpointingMode {
/**
* Sets the checkpointing mode to "exactly once". This mode means that the system will
* checkpoint the operator and user function state in such a way that, upon recovery,
* every record will be reflected exactly once in the operator state.
*
* <p>For example, if a user function counts the number of elements in a stream,
* this number will consistently be equal to the number of actual elements in the stream,
* regardless of failures and recovery.</p>
*
* <p>Note that this does not mean that each record flows through the streaming data flow
* only once. It means that upon recovery, the state of operators/functions is restored such
* that the resumed data streams pick up exactly at after the last modification to the state.</p>
*
* <p>Note that this mode does not guarantee exactly-once behavior in the interaction with
* external systems (only state in Flink's operators and user functions). The reason for that
* is that a certain level of "collaboration" is required between two systems to achieve
* exactly-once guarantees. However, for certain systems, connectors can be written that facilitate
* this collaboration.</p>
*
* <p>This mode sustains high throughput. Depending on the data flow graph and operations,
* this mode may increase the record latency, because operators need to align their input
* streams, in order to create a consistent snapshot point. The latency increase for simple
* dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average
* latency remains small, but the slowest records typically have an increased latency.</p>
*/
EXACTLY_ONCE,
/**
* Sets the checkpointing mode to "at least once". This mode means that the system will
* checkpoint the operator and user function state in a simpler way. Upon failure and recovery,
* some records may be reflected multiple times in the operator state.
*
* <p>For example, if a user function counts the number of elements in a stream,
* this number will equal to, or larger, than the actual number of elements in the stream,
* in the presence of failure and recovery.</p>
*
* <p>This mode has minimal impact on latency and may be preferable in very-low latency
* scenarios, where a sustained very-low latency (such as few milliseconds) is needed,
* and where occasional duplicate messages (on recovery) do not matter.</p>
*/
AT_LEAST_ONCE
}
- flink Internal assurance : rely on checkpoint
- source End : An external source is required to reset the read location of the data
- sink End : Need to ensure recovery from failure , Data is not repeatedly written to external systems . Two ways : Idempotent writing 、 Transaction write .
边栏推荐
- Activity enter exit animation
- A usage example that can be compatible with various database transactions
- 2022年化工自动化控制仪表考试试题及在线模拟考试
- How to write high-quality code?
- Apple 5g chip research and development failure? It's too early to get rid of Qualcomm
- Golang应用专题 - channel
- Redis如何实现多可用区?
- 《通信软件开发与应用》课程结业报告
- Comparative learning in the period of "arms race"
- 数据库中的范式:第一范式,第二范式,第三范式
猜你喜欢
WorkManager學習一
微信核酸检测预约小程序系统毕业设计毕设(7)中期检查报告
在C# 中实现上升沿,并模仿PLC环境验证 If 语句使用上升沿和不使用上升沿的不同
ModuleNotFoundError: No module named ‘scrapy‘ 终极解决方式
赛克瑞浦动力电池首台产品正式下线
How does redis implement multiple zones?
Web3基金会「Grant计划」赋能开发者,盘点四大成功项目
Go语言-1-开发环境配置
Comparative learning in the period of "arms race"
C语言实现QQ聊天室小项目 [完整源码]
随机推荐
PHP solves the problems of cache avalanche, cache penetration and cache breakdown of redis
In wechat applet, after jumping from one page to another, I found that the page scrolled synchronously after returning
【js学习笔记五十四】BFC方式
NAS与SAN
Share Net lightweight ORM
[vite] 1371 - develop vite plug-ins by hand
ModuleNotFoundError: No module named ‘scrapy‘ 终极解决方式
How to write high-quality code?
LDAP概述
uniapp
[paper reading] ckan: collaborative knowledge aware autonomous network for adviser systems
数组、、、
What are the top ten securities companies? Is it safe to open an account online?
Who is the "conscience" domestic brand?
[observation] with the rise of the "independent station" model of cross-border e-commerce, how to seize the next dividend explosion era?
flex4 和 flex3 combox 下拉框长度的解决办法
TypeError: Cannot read properties of undefined (reading ‘cancelToken‘)
[paper reading] kgat: knowledge graph attention network for recommendation
WorkManager學習一
Window下线程与线程同步总结