当前位置:网站首页>[Flink] Flink learning
[Flink] Flink learning
2022-07-06 11:31:00 【kiraraLou】
Preface
flink What is it? ?
Stateful computing engine for unbounded and bounded data flows
Common data architecture
- Traditional basic data architecture
- Microservice data architecture
- Big data architecture
- Stateful flow computing architecture
The biggest advantage of stateful flow based computing : There is no need to take the original data out of external storage again , So that we can do a full calculation , Because the cost of this kind of calculation may be very high .
Users do not need to use scheduling and various batch computing tools , Get statistical results from data warehouse , Then store on the floor , Reduce the time loss and hardware storage in the process of data calculation .
Why Flink
Flink It has the following advantages :
(1) At the same time, it supports high throughput 、 Low latency 、 High performance
Spark streaming Cannot achieve low latency .
Storm Unable to meet high throughput .
(2) Support event time (Event Time) Concept
In flow calculation , Window computing is very important , At present, most frame window calculations use system time (Process Time), That is, when time is transferred to the computing framework for processing , The current time of the system host .
Flink Able to support event based time Semantic window calculation . This time driven mechanism makes events arrive in disorder , The flow system can also calculate accurate results , Keep the timing of the event when it was originally generated .
(3) Support stateful Computing
The so-called state is to save the intermediate result data of the operator in memory or file system in the process of stream computing , After the next event enters the operator, the current result can be calculated after obtaining the intermediate result from the previous state . There is no need to count the results based on all the original data every time .
(4) Support highly flexible windows (Window) operation
The data is continuous , A window is needed to aggregate the data in a certain range .
(5) Lightweight distributed snapshot (Snapshot) Implemented fault tolerance
Flink It can automatically find errors in the process of event processing , Such as node downtime 、 Network transmission, etc , Distributed snapshot based checkpoint, Persist the state information during execution .
(6) be based on JVM Achieve independent memory management
Flink Self managing memory , Reduce... As much as possible JVM GC Impact on the system .Flink By serializing data / The deserialization method converts the data object into binary and stores it in memory , Reduce data storage size .
(7)Save Point ( Save it )
The termination of application in a period of time may lead to data loss or inaccurate calculation results , For example, cluster upgrade and downtime maintenance ,Flink adopt save point Technology saves snapshots of task execution on media , When the task is restarted , You can directly engage in the first saved Save point Restore the original calculation state .
Flink vs Spark
Can support both streaming computing and batch processing .
Data processing architecture
spark Through the batch processing mode to deal with different types of data sets , For stream data, data is divided into micro batches according to batches ( Bounded data sets ) To process .
Flink Process different types of data sets through stream processing mode . Bounded data can be transformed into unbounded data for statistical streaming , Finally, batch processing and streaming are unified in a set of streaming engines .
Data model
Spark It's using RDD Model .spark streaming Of DStream It is a small batch of data RDD Set .
Flink The basic data model is Data flow , And events (Even) Sequence .
Runtime architecture
spark It's batch calculation , take DAG Divide into different stage, Only after one is completed can we proceed to the next .
flink Is the standard flow computing architecture , After an event is processed at one node, it can be directly sent to the next node for processing .
Commit mode
- session Conversational mode
- Pre-job Single operation mode
- Application Application mode
The main difference is that : Cluster life cycle and resource allocation . And the application of (main Method ) Where in the end , client cllient still jobmanager.
边栏推荐
猜你喜欢
Request object and response object analysis
In the era of DFI dividends, can TGP become a new benchmark for future DFI?
Valentine's Day flirting with girls to force a small way, one can learn
MySQL与c语言连接(vs2019版)
Error connecting to MySQL database: 2059 - authentication plugin 'caching_ sha2_ The solution of 'password'
Vs2019 use wizard to generate an MFC Application
{一周总结}带你走进js知识的海洋
保姆级出题教程
Rhcsa certification exam exercise (configured on the first host)
Image recognition - pyteseract TesseractNotFoundError: tesseract is not installed or it‘s not in your path
随机推荐
【yarn】Yarn container 日志清理
Codeforces Round #771 (Div. 2)
MySQL and C language connection (vs2019 version)
Record a problem of raspberry pie DNS resolution failure
【yarn】CDP集群 Yarn配置capacity调度器批量分配
打开浏览器的同时会在主页外同时打开芒果TV,抖音等网站
Basic use of redis
[Bluebridge cup 2021 preliminary] weight weighing
How to set up voice recognition on the computer with shortcut keys
QT creator shape
Error connecting to MySQL database: 2059 - authentication plugin 'caching_ sha2_ The solution of 'password'
error C4996: ‘strcpy‘: This function or variable may be unsafe. Consider using strcpy_s instead
人脸识别 face_recognition
About string immutability
Solve the problem of installing failed building wheel for pilot
[download app for free]ineukernel OCR image data recognition and acquisition principle and product application
nodejs 详解
图像识别问题 — pytesseract.TesseractNotFoundError: tesseract is not installed or it‘s not in your path
QT creator create button
When using lambda to pass parameters in a loop, the parameters are always the same value