当前位置:网站首页>Flink CheckPoint : Exceeded checkpoint tolerable failure threshold
Flink CheckPoint : Exceeded checkpoint tolerable failure threshold
2022-06-12 08:53:00 【//Continuous margin_ documentary】
List of articles
One 、 Problem description
The checkpoint tolerable failure threshold has been exceeded 

Two 、 Solution steps
1、 Check checkpoint Set up
obvious ,checkpoint It's overtime , therefore , I subconsciously go , Check checkpoint Set up
The settings in the code are as follows :
// Every time ** ms Start once checkpoint
env.enableCheckpointing(10*1000);
// Set the mode to precise once ( This is the default )
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
// confirm checkpoints The time between will be ** ms
env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
// Checkpoint It has to be done in a minute , Otherwise, they will be abandoned
env.getCheckpointConfig().setCheckpointTimeout(60000);
// Only one is allowed at a time checkpoint Conduct
env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
// Open in job What remains after suspension externalized checkpoints
env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
// Allow for closer savepoint Back to checkpoint
env.getCheckpointConfig().setPreferCheckpointForRecovery(true);
Try changing timeout Time , from 1 Change the minute to 10 minute , Repackage online .
Then check it out UI Interface , Find out checkpoint Still can't work normally , The state has always been IN_PROGRESS, No progress , Just wait 1 Minutes become 10 minute , The program finally hung up 
This is the time , Consider not checkpoint Problems setting up the , But the program has bug, Resources are not released or other problems , Cause the program to get stuck , So much so that checkpoint Overtime .
2、 Check processing logic

Data channel blocking found , After printing data, it is found that , Asynchronous in task IO from HBase Query data in , Yes key non-existent , Associated task timed out , Lead to checkpoint Failure 
Print dimension association timeout data :
3、 The problem is repeated
The cause of the problem :hbase scan Poor performance , This causes the query of dimension data to time out , Failed to create checkpoint
Normally , Dimension query will not time out without corresponding data , Just return a null value , however scan The whole scan takes a long time to query , So use get Way to accurately query .
3、 ... and 、 Solution
hbase There are only two ways to implement the query :
According to the specified rowkey Gets a unique record :get Method .
Obtain a batch of records according to the specified conditions :scan Method .
边栏推荐
- ERROR 1630 (42000): FUNCTION a.avg does not exist. Check the ‘Function Name Parsing and Resolution‘
- Composition of box model
- 【动态内存管理】malloc&calloc和realloc和笔试题和柔性数组
- 2022.6.11-----leetcode. nine hundred and twenty-six
- Engineers learn music theory (III) interval mode and chord
- 进制GB和GiB的区别
- Oracle installation details (verification)
- 【字符集七】汉字的宽字符码和多字节码分别是多少
- 2022.6.11-----leetcode.926
- About weights exercise
猜你喜欢

(p19-p20) delegate constructor (proxy constructor) and inheritance constructor (using)

What is the quality traceability function of MES system pursuing?

Construction of memcached cache service under Linux:

Close asymmetric key

MFS详解(四)——MFS管理服务器安装与配置

【数据存储】浮点型数据在内存中的存储

FDA审查人员称Moderna COVID疫苗对5岁以下儿童安全有效

Error: clear the history in the search box in the website?

Background location case II

第三章 寄存器 (内存访问)
随机推荐
判断对象是否为空
2022.6.9-----leetcode. four hundred and ninety-seven
MFS详解(四)——MFS管理服务器安装与配置
【进阶指针一】字符数组&数组指针&指针数组
《MATLAB 神经网络43个案例分析》:第8章 GRNN网络的预测----基于广义回归神经网络的货运量预测
(node:22344) [DEP0123] DeprecationWarning: Setting the TLS ServerName to an IP address is not permit
API处理Android安全距离
数据库不知道哪里出问题
JVM learning notes: three local method interfaces and execution engines
【字符集九】gbk拷贝到Unicode会乱码?
Build personal blog and web.
Analysis of 43 cases of MATLAB neural network: Chapter 8 prediction of GRNN Network - Freight Volume Prediction Based on generalized regression neural network
Knee joint
第五章-[bx]和Loop指令
Is it really expensive for enterprises to launch MES software?
Webrtc series - mobile terminal hardware coding supports simulcast
【指針進階三】實現C語言快排函數qsort&回調函數
2022 safety officer-c certificate special operation certificate examination question bank and simulation examination
【新规划】
Oracle installation details (verification)