当前位置:网站首页>[Flink] Flink learning
[Flink] Flink learning
2022-07-06 11:31:00 【kiraraLou】
Preface
flink What is it? ?
Stateful computing engine for unbounded and bounded data flows
Common data architecture
- Traditional basic data architecture
- Microservice data architecture
- Big data architecture
- Stateful flow computing architecture
The biggest advantage of stateful flow based computing : There is no need to take the original data out of external storage again , So that we can do a full calculation , Because the cost of this kind of calculation may be very high .
Users do not need to use scheduling and various batch computing tools , Get statistical results from data warehouse , Then store on the floor , Reduce the time loss and hardware storage in the process of data calculation .
Why Flink
Flink It has the following advantages :
(1) At the same time, it supports high throughput 、 Low latency 、 High performance
Spark streaming Cannot achieve low latency .
Storm Unable to meet high throughput .
(2) Support event time (Event Time) Concept
In flow calculation , Window computing is very important , At present, most frame window calculations use system time (Process Time), That is, when time is transferred to the computing framework for processing , The current time of the system host .
Flink Able to support event based time Semantic window calculation . This time driven mechanism makes events arrive in disorder , The flow system can also calculate accurate results , Keep the timing of the event when it was originally generated .
(3) Support stateful Computing
The so-called state is to save the intermediate result data of the operator in memory or file system in the process of stream computing , After the next event enters the operator, the current result can be calculated after obtaining the intermediate result from the previous state . There is no need to count the results based on all the original data every time .
(4) Support highly flexible windows (Window) operation
The data is continuous , A window is needed to aggregate the data in a certain range .
(5) Lightweight distributed snapshot (Snapshot) Implemented fault tolerance
Flink It can automatically find errors in the process of event processing , Such as node downtime 、 Network transmission, etc , Distributed snapshot based checkpoint, Persist the state information during execution .
(6) be based on JVM Achieve independent memory management
Flink Self managing memory , Reduce... As much as possible JVM GC Impact on the system .Flink By serializing data / The deserialization method converts the data object into binary and stores it in memory , Reduce data storage size .
(7)Save Point ( Save it )
The termination of application in a period of time may lead to data loss or inaccurate calculation results , For example, cluster upgrade and downtime maintenance ,Flink adopt save point Technology saves snapshots of task execution on media , When the task is restarted , You can directly engage in the first saved Save point Restore the original calculation state .
Flink vs Spark
Can support both streaming computing and batch processing .
Data processing architecture
spark Through the batch processing mode to deal with different types of data sets , For stream data, data is divided into micro batches according to batches ( Bounded data sets ) To process .
Flink Process different types of data sets through stream processing mode . Bounded data can be transformed into unbounded data for statistical streaming , Finally, batch processing and streaming are unified in a set of streaming engines .
Data model
Spark It's using RDD Model .spark streaming Of DStream It is a small batch of data RDD Set .
Flink The basic data model is Data flow , And events (Even) Sequence .
Runtime architecture
spark It's batch calculation , take DAG Divide into different stage, Only after one is completed can we proceed to the next .
flink Is the standard flow computing architecture , After an event is processed at one node, it can be directly sent to the next node for processing .
Commit mode
- session Conversational mode
- Pre-job Single operation mode
- Application Application mode
The main difference is that : Cluster life cycle and resource allocation . And the application of (main Method ) Where in the end , client cllient still jobmanager.
边栏推荐
- [Blue Bridge Cup 2017 preliminary] buns make up
- Software testing and quality learning notes 3 -- white box testing
- Windows下安装MongDB教程、Redis教程
- Knowledge Q & A based on Apache Jena
- 数据库高级学习笔记--SQL语句
- 【presto】presto 参数配置优化
- [NPUCTF2020]ReadlezPHP
- Number game
- L2-004 这是二叉搜索树吗? (25 分)
- Julia 1.6 1.7 common problem solving
猜你喜欢
Machine learning notes week02 convolutional neural network
PHP - whether the setting error displays -php xxx When PHP executes, there is no code exception prompt
error C4996: ‘strcpy‘: This function or variable may be unsafe. Consider using strcpy_ s instead
Introduction and use of automatic machine learning framework (flaml, H2O)
QT creator custom build process
Vs2019 first MFC Application
Valentine's Day flirting with girls to force a small way, one can learn
{one week summary} take you into the ocean of JS knowledge
Software testing and quality learning notes 3 -- white box testing
Picture coloring project - deoldify
随机推荐
Machine learning notes week02 convolutional neural network
L2-006 树的遍历 (25 分)
Picture coloring project - deoldify
nodejs 详解
Summary of numpy installation problems
Double to int precision loss
AcWing 242. A simple integer problem (tree array + difference)
MySQL与c语言连接(vs2019版)
Case analysis of data inconsistency caused by Pt OSC table change
[蓝桥杯2017初赛]方格分割
L2-001 emergency rescue (25 points)
[NPUCTF2020]ReadlezPHP
Knowledge Q & A based on Apache Jena
01 project demand analysis (ordering system)
误删Path变量解决
neo4j安装教程
数数字游戏
Database advanced learning notes -- SQL statement
ES6 Promise 对象
About string immutability