当前位置:网站首页>How Flink uses savepoint
How Flink uses savepoint
2022-07-27 23:51:00 【Yisu cloud】
Flink How do you use it? Savepoint
This article “Flink How do you use it? Savepoint” Most people don't quite understand the knowledge points of the article , So I made up the following summary for you , Detailed content , The steps are clear , It has certain reference value , I hope you can gain something after reading this article , Let's take a look at this article “Flink How do you use it? Savepoint” Article bar .
One 、 background
What is? savepoint, Why use savepoint ?
guarantee flink Work in Configuration iteration 、flink Version update 、 Data consistency in blue-green deployment , Improve fault tolerance 、 Reduce recovery time ;
Before that, several concepts are introduced :
Snapshot State snapshot
Flink Fault tolerant processing through state snapshot
Flink The state of : keyed state, operator state ..
Flink Status backend in :A. How to save status data ?B. Where does the runtime exist ?C. Where is the state snapshot saved ?

notes 1: since 1.13 After the version , Set up Working State and Set up Snapshot State Split into two interfaces , It is easier for readers to understand ;
StateBackend
CheckpointStorage
notes 2: General default use FsStateBackend, The runtime state is placed in the heap to ensure performance , During snapshot backup, the data is stored in Hdfs Ensure fault tolerance ; When the business is in a big state flink When the job exists , The status backend of user jobs can be set to RocksDBSateBackend.
Distributed snapshots
Checkpoint – a snapshot taken automatically by Flink for the purpose of being able to recover from faults. Checkpoints can be incremental, and are optimized for being restored quickly.
Alignment checkpoint

Unaligment checkpoint

Misaligned checkpoint Make sure the obstacle reaches the receiver as soon as possible .
For applications with at least one slow-moving data path , Avoid taking too long to align . However ,
Will add additional input / Output pressure , Can cause checkpoint size An increase in , When the status is back IO When there is a bottleneck , Don't fit ;
notes : General default use Alignment checkpoint; When pressed , Generally preferred
1. Optimize logic 2. Increase concurrency ;
Checkpoint & Savepoint
Checkpoint send Flink It has good fault tolerance , adopt checkpoint Mechanism ,Flink It can restore the state and calculation position of the job .
Savepoint Is based on Flink checkpointing A consistent mirror of the execution state of the stream job created by the mechanism ;
Checkpoint The main purpose of is to provide a recovery mechanism for unexpectedly failed jobs ( Such as tm/jm Process to hang ).
Checkpoint The life cycle of Flink management , namely Flink establish , Manage and delete Checkpoint - No user interaction required .
Savepoint Created by the user , Own and delete . Their use cases are planned , Manual backup and recovery .
Savepoint Application scenarios , upgrade Flink edition , Adjust user logic , Change parallelism , And red and blue deployment . Savepoint Pay more attention to portability and support for job changes mentioned earlier .
Remove these conceptual differences ,Checkpoint and Savepoint The current implementation of basically uses the same code and generates the same format (rocksDB The incremental checkpoint With the exception of , There may be more similar implementations in the future )
Two 、Flink on yarn How to use savepoint
Trigger savepoint Keep until hdfs, When rescheduling jobs , Provide users with choices .
Key points : perform savepoint You need to specify the jobId, Therefore, when designing metadata of data platform , Need to keep jobId data .
Use YARN Trigger Savepoint #$ bin/flink savepoint :jobId [:targetDirectory] -yid :yarnAppId This will trigger ID by :jobId and YARN Applications ID :yarnAppId Of Savepoint, And return to the created Savepoint The path of . Use Savepoint Cancel the assignment #$ bin/flink cancel -s [:targetDirectory] :jobId This will automatically trigger ID by :jobid Of Savepoint, And cancel the job . Besides , You can specify a target file system directory to store Savepoint . The directory needs to be able to be JobManager(s) and TaskManager(s) visit . from Savepoint recovery #$ bin/flink run -s :savepointPath [:runArgs] This will commit the job and specify the to recover from Savepoint . You can give Savepoint Directory or _metadata Path to file . Skip state recovery that cannot be mapped # By default ,resume The operation will attempt to Savepoint All States of are mapped back to the program you want to restore . If the operator is deleted , You can use the --allowNonRestoredState(short:-n) Option skip the state that cannot be mapped to the new program :$ bin/flink run -s :savepointPath -n [:runArgs] Delete Savepoint #$ bin/flink savepoint -d :savepointPath This will delete the data stored in :savepointPath Medium Savepoint.
appendix : Consistency semantics
Make sure it's accurate once (exactly once)
When an error occurs in the stream processing application , The result may be loss or duplication .Flink Configure your application according to , The following results can be produced :
Flink No recovery from snapshot (at most once)
Nothing is missing , But you may get redundant results (at least once)
No loss or redundancy (exactly once)
Flink By fallback and resend source Data flow recovers from failure , When the ideal situation is described as accurate once , This does not mean that every event will be handled exactly once . contrary , It means Every event affects Flink The state of management is accurate once .
Barrier Alignment is only required when it is necessary to provide accurate semantic guarantee once (Barrier alignment). If you don't need this semantics , Can be configured by CheckpointingMode.AT_LEAST_ONCE close Barrier Alignment to improve performance .
End to end accurate once
In order to achieve end-to-end accuracy, one time , In order to sources Each event in is accurate only once sinks take effect , The following conditions must be met :
sources Must be reproducible , also
sinks Must be transactional ( Or idempotent )
That's about “Flink How do you use it? Savepoint” The content of this article , I believe we all have a certain understanding , I hope the content shared by Xiaobian will be helpful to you , If you want to know more about it , Please pay attention to the Yisu cloud industry information channel .
边栏推荐
- [RoarCTF2019]RSA
- Redis distributed lock
- Redis hash underlying data structure
- [C language] address book (dynamic version)
- Lua basic grammar learning
- Redefine analysis - release of eventbridge real-time event analysis platform
- MapReduce (III)
- 加速IGBT国产化!比亚迪半导体将独立上市,市值或达300亿元!
- 基于mediapipe的姿态识别和简单行为识别
- C#委托用法--控制台项目,通过委托实现事件
猜你喜欢

The first activity of togaf10 standard reading club was successfully held, and the wonderful moments were reviewed!

基于mediapipe的姿态识别和简单行为识别

用3dmax做折扇的思路方法与步骤

真的很难理解?RecyclerView 缓存机制到底是几级缓存?

JUC工具包学习

远程调试 idea配置remote debug、在远程服务器的程序中,添加JVM启动参数-Xdebug
![[RoarCTF2019]RSA](/img/0e/8c8371ccf40094e5b03e502d6ae851.png)
[RoarCTF2019]RSA

Record the errors about formatc in R language

2022夏暑假每日一题(五)

突发,微信重要通知
随机推荐
疫情之下,台积电一季度增长超预期,7nm占比35%!二季度或创新高
Flutter pull_ to_ refresh-1.6.0/lib/src/internals/slivers. dart:164:13: Error: Method not found: ‘descr
7.6万人停工!东芝宣布关闭日本所有工厂
Bank marketing predicts the success rate of a customer's purchase of financial products
MapReduce (III)
Bank Marketing预测一个客户购买理财产品的成功率
MySQL data query (where)
突发,微信重要通知
史上最简明的 Tcpdump 入门指南,看这一篇就够了
BUUCTF-[BJDCTF2020]RSA1
详解分布式系统的幂等
西门子PLC能否实时无线采集多处从站模拟量数据?
JS array copy speed test 220320
BUUCTF-childRSA费马小定理
The total investment is 60billion! Foxconn semiconductor high-end package test project officially settled in Qingdao
js数组复制速度测试220320
Redis hash underlying data structure
TSMC 3nm detail exposure: transistor density as high as 250million /mm ², Greatly improved performance and energy efficiency
解决5G使用痛点,魅族17 mSmart 5G快省稳技术发布
为什么需要等待计时2MSL?