当前位置:网站首页>Talk about the understanding of fault tolerance mechanism and state consistency in Flink framework
Talk about the understanding of fault tolerance mechanism and state consistency in Flink framework
2022-07-05 10:41:00 【InfoQ】
Fault tolerance mechanism
- restart app
- from checkpoint Read status in , Reset the status
- Start consuming and processing all data from checkpoint to failure
public class Test{
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// Configuration of checkpoints
// The checkpoint start cycle is milliseconds
env.enableCheckpointing(300);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
// Set timeout
env.getCheckpointConfig().setCheckpointTimeout(60000);
// Set the maximum execution checkp Number
env.getCheckpointConfig().setMaxConcurrentCheckpoints(2);
// Set the minimum interval , Two checkpoint Between , The interval between the previous completion and the next start cannot be less than XXX
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(100);
//env.getCheckpointConfig().setPreferCheckpointForRecovery();
// tolerate checkpoint How many times have you failed
env.getCheckpointConfig().setTolerableCheckpointFailureNumber(0);
// Configuration of restart policy
// Fixed delay restart every other 10 Seconds to restart three times
env.setRestartStrategy(RestartStrategies.fixedDelayRestart(3,10000L));
// Failure rate restart stay 10 Count restarts within minutes , stay 10 Try to restart three times in minutes , Every septum 1 Minutes to restart
env.setRestartStrategy(RestartStrategies.failureRateRestart(3, Time.minutes(10),Time.minutes(1)));
DataStreamSource<String> inputStream = env.socketTextStream("localhost", 7777);
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] fields = line.split(",");
return new SensorReading(fields[0], new Long(fields[1]), new Double(fields[2]));
});
env.execute();
}
}
State consistency
- AT-MOST-ONCE( At most once ) When the mission fails , The easiest way is to do nothing , Neither restore the lost state , No replay of lost data .At-most-once The meaning of semantics is to handle an event at most once .
- AT-LEAST-ONCE( At least once ) In most real application scenarios , We hope not to lose Events . This type of protection is called atleast-once, It means that all the events have been dealt with , And some events can be handled multiple times .
- EXACTLY-ONCE( Exactly once ) Just once is the strictest guarantee , It's also the most difficult . Just dealing with semantics once doesn't just mean that no events are lost , It also means for every data , The internal state is only updated once .
@Public
public enum CheckpointingMode {
/**
* Sets the checkpointing mode to "exactly once". This mode means that the system will
* checkpoint the operator and user function state in such a way that, upon recovery,
* every record will be reflected exactly once in the operator state.
*
* <p>For example, if a user function counts the number of elements in a stream,
* this number will consistently be equal to the number of actual elements in the stream,
* regardless of failures and recovery.</p>
*
* <p>Note that this does not mean that each record flows through the streaming data flow
* only once. It means that upon recovery, the state of operators/functions is restored such
* that the resumed data streams pick up exactly at after the last modification to the state.</p>
*
* <p>Note that this mode does not guarantee exactly-once behavior in the interaction with
* external systems (only state in Flink's operators and user functions). The reason for that
* is that a certain level of "collaboration" is required between two systems to achieve
* exactly-once guarantees. However, for certain systems, connectors can be written that facilitate
* this collaboration.</p>
*
* <p>This mode sustains high throughput. Depending on the data flow graph and operations,
* this mode may increase the record latency, because operators need to align their input
* streams, in order to create a consistent snapshot point. The latency increase for simple
* dataflows (no repartitioning) is negligible. For simple dataflows with repartitioning, the average
* latency remains small, but the slowest records typically have an increased latency.</p>
*/
EXACTLY_ONCE,
/**
* Sets the checkpointing mode to "at least once". This mode means that the system will
* checkpoint the operator and user function state in a simpler way. Upon failure and recovery,
* some records may be reflected multiple times in the operator state.
*
* <p>For example, if a user function counts the number of elements in a stream,
* this number will equal to, or larger, than the actual number of elements in the stream,
* in the presence of failure and recovery.</p>
*
* <p>This mode has minimal impact on latency and may be preferable in very-low latency
* scenarios, where a sustained very-low latency (such as few milliseconds) is needed,
* and where occasional duplicate messages (on recovery) do not matter.</p>
*/
AT_LEAST_ONCE
}
- flink Internal assurance : rely on checkpoint
- source End : An external source is required to reset the read location of the data
- sink End : Need to ensure recovery from failure , Data is not repeatedly written to external systems . Two ways : Idempotent writing 、 Transaction write .

边栏推荐
猜你喜欢

Learning note 4 -- Key Technologies of high-precision map (Part 2)

WorkManager的学习二

微信核酸检测预约小程序系统毕业设计毕设(7)中期检查报告

AtCoder Beginner Contest 254「E bfs」「F st表维护差分数组gcd」

How do programmers live as they like?
![[paper reading] ckan: collaborative knowledge aware autonomous network for adviser systems](/img/6c/5b14f47503033bc2c85a259a968d94.png)
[paper reading] ckan: collaborative knowledge aware autonomous network for adviser systems
![[dark horse morning post] Luo Yonghao responded to ridicule Oriental selection; Dong Qing's husband Mi Chunlei was executed for more than 700million; Geely officially acquired Meizu; Huawei releases M](/img/d7/4671b5a74317a8f87ffd36be2b34e1.jpg)
[dark horse morning post] Luo Yonghao responded to ridicule Oriental selection; Dong Qing's husband Mi Chunlei was executed for more than 700million; Geely officially acquired Meizu; Huawei releases M

【JS】提取字符串中的分数,汇总后算出平均分,并与每个分数比较,输出

第五届 Polkadot Hackathon 创业大赛全程回顾,获胜项目揭秘!

How to plan the career of a programmer?
随机推荐
PHP solves the problems of cache avalanche, cache penetration and cache breakdown of redis
Implementation of wechat applet bottom loading and pull-down refresh
Secteur non technique, comment participer à devops?
SAP ui5 objectpagelayout control usage sharing
Web Components
Go-2-Vim IDE常用功能
Node の MongoDB Driver
双向RNN与堆叠的双向RNN
使用bat命令一键启动常用浏览器
QT implements JSON parsing
DDOS攻击原理,被ddos攻击的现象
小程序框架Taro
Error: module not found: error: can't resolve 'xxx' in 'XXXX‘
沟通的艺术III:看人之间 之倾听
C#实现获取DevExpress中GridView表格进行过滤或排序后的数据
Ad20 make logo
php解决redis的缓存雪崩,缓存穿透,缓存击穿的问题
Flink CDC cannot monitor MySQL logs. Have you ever encountered this problem?
2022年流动式起重机司机考试题库及模拟考试
Node の MongoDB Driver