当前位置:网站首页>Flink CheckPoint : Exceeded checkpoint tolerable failure threshold
Flink CheckPoint : Exceeded checkpoint tolerable failure threshold
2022-06-12 08:53:00 【//Continuous margin_ documentary】
List of articles
One 、 Problem description
The checkpoint tolerable failure threshold has been exceeded 

Two 、 Solution steps
1、 Check checkpoint Set up
obvious ,checkpoint It's overtime , therefore , I subconsciously go , Check checkpoint Set up
The settings in the code are as follows :
// Every time ** ms Start once checkpoint
env.enableCheckpointing(10*1000);
// Set the mode to precise once ( This is the default )
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
// confirm checkpoints The time between will be ** ms
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// Checkpoint It has to be done in a minute , Otherwise, they will be abandoned
env.getCheckpointConfig().setCheckpointTimeout(60000);
// Only one is allowed at a time checkpoint Conduct
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// Open in job What remains after suspension externalized checkpoints
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// Allow for closer savepoint Back to checkpoint
env.getCheckpointConfig().setPreferCheckpointForRecovery(true);
Try changing timeout Time , from 1 Change the minute to 10 minute , Repackage online .
Then check it out UI Interface , Find out checkpoint Still can't work normally , The state has always been IN_PROGRESS, No progress , Just wait 1 Minutes become 10 minute , The program finally hung up 
This is the time , Consider not checkpoint Problems setting up the , But the program has bug, Resources are not released or other problems , Cause the program to get stuck , So much so that checkpoint Overtime .
2、 Check processing logic

Data channel blocking found , After printing data, it is found that , Asynchronous in task IO from HBase Query data in , Yes key non-existent , Associated task timed out , Lead to checkpoint Failure 
Print dimension association timeout data :
3、 The problem is repeated
The cause of the problem :hbase scan Poor performance , This causes the query of dimension data to time out , Failed to create checkpoint
Normally , Dimension query will not time out without corresponding data , Just return a null value , however scan The whole scan takes a long time to query , So use get Way to accurately query .
3、 ... and 、 Solution
hbase There are only two ways to implement the query :
According to the specified rowkey Gets a unique record :get Method .
Obtain a batch of records according to the specified conditions :scan Method .
边栏推荐
- Error: ER_ NOT_ SUPPORTED_ AUTH_ MODE: Client does not support authentication protocol requested ... ...
- [new planning]
- sql中的Exists用法
- [sklearn] lightgbm
- 【指针进阶三】实现C语言快排函数qsort&回调函数
- When the uniapp page jumps with complex data parameters.
- [data storage] storage of floating point data in memory
- Background location case II
- Dynamic segment tree leetcode six hundred and ninety-nine
- Gets the number of occurrences of a character in a string
猜你喜欢
![[advanced pointer I] character array & array pointer & pointer array](/img/ea/150b2162e4e1641eee7e852935d101.png)
[advanced pointer I] character array & array pointer & pointer array

Analysis of 43 cases of MATLAB neural network: Chapter 7 regression of RBF Network -- Realization of nonlinear function regression

Summary of common character sets

深拷贝与浅拷贝的区别

svg中viewbox图解分析

《MATLAB 神经网络43个案例分析》:第8章 GRNN网络的预测----基于广义回归神经网络的货运量预测

ip、DNS、域名、URL、hosts

Box model border
![[new planning]](/img/8e/0e15e0f3ee08002eaceea1fe8948ec.jpg)
[new planning]

Background fixing effect
随机推荐
Engineers learn music theory (III) interval mode and chord
[advanced pointer III] implement C language quick sorting function qsort & callback function
[new planning]
通俗理解时域采样与频域延拓
RuntimeError:Input and parameter tensors are not at the same device, found input tensor at cuda:0 an
Analysis of 43 cases of MATLAB neural network: Chapter 7 regression of RBF Network -- Realization of nonlinear function regression
Background location case II
Background location case 1
2022.6.9-----leetcode. four hundred and ninety-seven
Background position position NOUN
第六章-包含多个段的程序
Display the remaining valid days according to the valid period
Background position - exact units
(p21-p24) unified data initialization method: List initialization, initializing objects of non aggregate type with initialization list, initializer_ Use of Lisy template class
【数据存储】浮点型数据在内存中的存储
【指针进阶三】实现C语言快排函数qsort&回调函数
【 pointeur avancé Ⅲ】 mise en œuvre de la fonction de tri rapide qsort& fonction de rappel en langage C
Union selector
Hypergeometric cumulative distribution test overlap
Building a cluster: and replacing with error