当前位置：网站首页>[Flink] Flink learning

[Flink] Flink learning

2022-07-06 11:31:00 【kiraraLou】

Preface

flink What is it? ？
Stateful computing engine for unbounded and bounded data flows

Common data architecture

Traditional basic data architecture
Microservice data architecture
Big data architecture
Stateful flow computing architecture

The biggest advantage of stateful flow based computing ： There is no need to take the original data out of external storage again , So that we can do a full calculation , Because the cost of this kind of calculation may be very high .
Users do not need to use scheduling and various batch computing tools , Get statistical results from data warehouse , Then store on the floor , Reduce the time loss and hardware storage in the process of data calculation .

Stateful computing architecture

Why Flink

Flink It has the following advantages ：
（1） At the same time, it supports high throughput 、 Low latency 、 High performance
Spark streaming Cannot achieve low latency .
Storm Unable to meet high throughput .

（2） Support event time （Event Time） Concept
In flow calculation , Window computing is very important , At present, most frame window calculations use system time （Process Time）, That is, when time is transferred to the computing framework for processing , The current time of the system host .

Flink Able to support event based time Semantic window calculation . This time driven mechanism makes events arrive in disorder , The flow system can also calculate accurate results , Keep the timing of the event when it was originally generated .

（3） Support stateful Computing
The so-called state is to save the intermediate result data of the operator in memory or file system in the process of stream computing , After the next event enters the operator, the current result can be calculated after obtaining the intermediate result from the previous state . There is no need to count the results based on all the original data every time .

（4） Support highly flexible windows （Window） operation
The data is continuous , A window is needed to aggregate the data in a certain range .

（5） Lightweight distributed snapshot （Snapshot） Implemented fault tolerance
Flink It can automatically find errors in the process of event processing , Such as node downtime 、 Network transmission, etc , Distributed snapshot based checkpoint, Persist the state information during execution .

（6） be based on JVM Achieve independent memory management
Flink Self managing memory , Reduce... As much as possible JVM GC Impact on the system .Flink By serializing data / The deserialization method converts the data object into binary and stores it in memory , Reduce data storage size .

（7）Save Point （ Save it ）
The termination of application in a period of time may lead to data loss or inaccurate calculation results , For example, cluster upgrade and downtime maintenance ,Flink adopt save point Technology saves snapshots of task execution on media , When the task is restarted , You can directly engage in the first saved Save point Restore the original calculation state .

Flink vs Spark

Can support both streaming computing and batch processing .

Data processing architecture
spark Through the batch processing mode to deal with different types of data sets , For stream data, data is divided into micro batches according to batches （ Bounded data sets ） To process .
Flink Process different types of data sets through stream processing mode . Bounded data can be transformed into unbounded data for statistical streaming , Finally, batch processing and streaming are unified in a set of streaming engines .

Data model
Spark It's using RDD Model .spark streaming Of DStream It is a small batch of data RDD Set .
Flink The basic data model is Data flow , And events （Even） Sequence .

Runtime architecture
spark It's batch calculation , take DAG Divide into different stage, Only after one is completed can we proceed to the next .
flink Is the standard flow computing architecture , After an event is processed at one node, it can be directly sent to the next node for processing .

Commit mode

session Conversational mode
Pre-job Single operation mode
Application Application mode

The main difference is that ： Cluster life cycle and resource allocation . And the application of （main Method ） Where in the end , client cllient still jobmanager.

原网站

版权声明
本文为[kiraraLou]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/187/202207060913062117.html