当前位置：网站首页>How Flink uses savepoint

How Flink uses savepoint

2022-07-27 23:51:00 【Yisu cloud】

Flink How do you use it? Savepoint

This article “Flink How do you use it? Savepoint” Most people don't quite understand the knowledge points of the article , So I made up the following summary for you , Detailed content , The steps are clear , It has certain reference value , I hope you can gain something after reading this article , Let's take a look at this article “Flink How do you use it? Savepoint” Article bar .

One 、 background

What is? savepoint, Why use savepoint ？

guarantee flink Work in Configuration iteration 、flink Version update 、 Data consistency in blue-green deployment , Improve fault tolerance 、 Reduce recovery time ;

Before that, several concepts are introduced ：

Snapshot State snapshot

Flink Fault tolerant processing through state snapshot

Flink The state of ： keyed state, operator state ..
Flink Status backend in ：A. How to save status data ？B. Where does the runtime exist ？C. Where is the state snapshot saved ？

Flink How do you use it? Savepoint

notes 1： since 1.13 After the version , Set up Working State and Set up Snapshot State Split into two interfaces , It is easier for readers to understand ;

StateBackend
CheckpointStorage

notes 2： General default use FsStateBackend, The runtime state is placed in the heap to ensure performance , During snapshot backup, the data is stored in Hdfs Ensure fault tolerance ; When the business is in a big state flink When the job exists , The status backend of user jobs can be set to RocksDBSateBackend.

Distributed snapshots

Checkpoint – a snapshot taken automatically by Flink for the purpose of being able to recover from faults. Checkpoints can be incremental, and are optimized for being restored quickly.

Alignment checkpoint

Flink How do you use it? Savepoint

Unaligment checkpoint

Flink How do you use it? Savepoint

Misaligned checkpoint Make sure the obstacle reaches the receiver as soon as possible .

For applications with at least one slow-moving data path , Avoid taking too long to align . However ,
Will add additional input / Output pressure , Can cause checkpoint size An increase in , When the status is back IO When there is a bottleneck , Don't fit ;

notes ： General default use Alignment checkpoint; When pressed , Generally preferred

1. Optimize logic 2. Increase concurrency ;

Checkpoint & Savepoint

Checkpoint send Flink It has good fault tolerance , adopt checkpoint Mechanism ,Flink It can restore the state and calculation position of the job .

Savepoint Is based on Flink checkpointing A consistent mirror of the execution state of the stream job created by the mechanism ;

Checkpoint The main purpose of is to provide a recovery mechanism for unexpectedly failed jobs ( Such as tm/jm Process to hang ).
Checkpoint The life cycle of Flink management , namely Flink establish , Manage and delete Checkpoint - No user interaction required .
Savepoint Created by the user , Own and delete . Their use cases are planned , Manual backup and recovery .
Savepoint Application scenarios , upgrade Flink edition , Adjust user logic , Change parallelism , And red and blue deployment . Savepoint Pay more attention to portability and support for job changes mentioned earlier .

Remove these conceptual differences ,Checkpoint and Savepoint The current implementation of basically uses the same code and generates the same format （rocksDB The incremental checkpoint With the exception of , There may be more similar implementations in the future ）

Two 、Flink on yarn How to use savepoint

Trigger savepoint Keep until hdfs, When rescheduling jobs , Provide users with choices .

Key points ： perform savepoint You need to specify the jobId, Therefore, when designing metadata of data platform , Need to keep jobId data .

 Use  YARN  Trigger  Savepoint #$ bin/flink savepoint :jobId [:targetDirectory] -yid :yarnAppId This will trigger  ID  by  :jobId  and  YARN  Applications  ID :yarnAppId  Of  Savepoint, And return to the created  Savepoint  The path of . Use  Savepoint  Cancel the assignment  #$ bin/flink cancel -s [:targetDirectory] :jobId This will automatically trigger  ID  by  :jobid  Of  Savepoint, And cancel the job . Besides , You can specify a target file system directory to store  Savepoint . The directory needs to be able to be  JobManager(s)  and  TaskManager(s)  visit . from  Savepoint  recovery  #$ bin/flink run -s :savepointPath [:runArgs] This will commit the job and specify the to recover from  Savepoint .  You can give  Savepoint  Directory or  _metadata  Path to file . Skip state recovery that cannot be mapped  # By default ,resume  The operation will attempt to  Savepoint  All States of are mapped back to the program you want to restore .  If the operator is deleted , You can use the  --allowNonRestoredState（short：-n） Option skip the state that cannot be mapped to the new program ：$ bin/flink run -s :savepointPath -n [:runArgs] Delete  Savepoint #$ bin/flink savepoint -d :savepointPath This will delete the data stored in  :savepointPath  Medium  Savepoint.

appendix ： Consistency semantics

Make sure it's accurate once （exactly once）

When an error occurs in the stream processing application , The result may be loss or duplication .Flink Configure your application according to , The following results can be produced ：

Flink No recovery from snapshot （at most once）
Nothing is missing , But you may get redundant results （at least once）
No loss or redundancy （exactly once）

Flink By fallback and resend source Data flow recovers from failure , When the ideal situation is described as accurate once , This does not mean that every event will be handled exactly once . contrary , It means Every event affects Flink The state of management is accurate once .

Barrier Alignment is only required when it is necessary to provide accurate semantic guarantee once （Barrier alignment）. If you don't need this semantics , Can be configured by CheckpointingMode.AT_LEAST_ONCE close Barrier Alignment to improve performance .

End to end accurate once

In order to achieve end-to-end accuracy, one time , In order to sources Each event in is accurate only once sinks take effect , The following conditions must be met ：

sources Must be reproducible , also
sinks Must be transactional （ Or idempotent ）

That's about “Flink How do you use it? Savepoint” The content of this article , I believe we all have a certain understanding , I hope the content shared by Xiaobian will be helpful to you , If you want to know more about it , Please pay attention to the Yisu cloud industry information channel .

原网站

版权声明
本文为[Yisu cloud]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207272035479341.html