当前位置:网站首页>[Flink] Flink learning
[Flink] Flink learning
2022-07-06 11:31:00 【kiraraLou】
Preface
flink What is it? ?
Stateful computing engine for unbounded and bounded data flows
Common data architecture
- Traditional basic data architecture
- Microservice data architecture
- Big data architecture
- Stateful flow computing architecture
The biggest advantage of stateful flow based computing : There is no need to take the original data out of external storage again , So that we can do a full calculation , Because the cost of this kind of calculation may be very high .
Users do not need to use scheduling and various batch computing tools , Get statistical results from data warehouse , Then store on the floor , Reduce the time loss and hardware storage in the process of data calculation .

Why Flink
Flink It has the following advantages :
(1) At the same time, it supports high throughput 、 Low latency 、 High performance
Spark streaming Cannot achieve low latency .
Storm Unable to meet high throughput .
(2) Support event time (Event Time) Concept
In flow calculation , Window computing is very important , At present, most frame window calculations use system time (Process Time), That is, when time is transferred to the computing framework for processing , The current time of the system host .
Flink Able to support event based time Semantic window calculation . This time driven mechanism makes events arrive in disorder , The flow system can also calculate accurate results , Keep the timing of the event when it was originally generated .
(3) Support stateful Computing
The so-called state is to save the intermediate result data of the operator in memory or file system in the process of stream computing , After the next event enters the operator, the current result can be calculated after obtaining the intermediate result from the previous state . There is no need to count the results based on all the original data every time .
(4) Support highly flexible windows (Window) operation
The data is continuous , A window is needed to aggregate the data in a certain range .
(5) Lightweight distributed snapshot (Snapshot) Implemented fault tolerance
Flink It can automatically find errors in the process of event processing , Such as node downtime 、 Network transmission, etc , Distributed snapshot based checkpoint, Persist the state information during execution .
(6) be based on JVM Achieve independent memory management
Flink Self managing memory , Reduce... As much as possible JVM GC Impact on the system .Flink By serializing data / The deserialization method converts the data object into binary and stores it in memory , Reduce data storage size .
(7)Save Point ( Save it )
The termination of application in a period of time may lead to data loss or inaccurate calculation results , For example, cluster upgrade and downtime maintenance ,Flink adopt save point Technology saves snapshots of task execution on media , When the task is restarted , You can directly engage in the first saved Save point Restore the original calculation state .
Flink vs Spark
Can support both streaming computing and batch processing .
Data processing architecture
spark Through the batch processing mode to deal with different types of data sets , For stream data, data is divided into micro batches according to batches ( Bounded data sets ) To process .
Flink Process different types of data sets through stream processing mode . Bounded data can be transformed into unbounded data for statistical streaming , Finally, batch processing and streaming are unified in a set of streaming engines .
Data model
Spark It's using RDD Model .spark streaming Of DStream It is a small batch of data RDD Set .
Flink The basic data model is Data flow , And events (Even) Sequence .
Runtime architecture
spark It's batch calculation , take DAG Divide into different stage, Only after one is completed can we proceed to the next .
flink Is the standard flow computing architecture , After an event is processed at one node, it can be directly sent to the next node for processing .
Commit mode
- session Conversational mode
- Pre-job Single operation mode
- Application Application mode
The main difference is that : Cluster life cycle and resource allocation . And the application of (main Method ) Where in the end , client cllient still jobmanager.
边栏推荐
- Image recognition - pyteseract TesseractNotFoundError: tesseract is not installed or it‘s not in your path
- vs2019 第一个MFC应用程序
- 【CDH】CDH5.16 配置 yarn 任务集中分配设置不生效问题
- wangeditor富文本引用、表格使用问题
- Django running error: error loading mysqldb module solution
- 保姆级出题教程
- Pytorch基础
- 【Flink】CDH/CDP Flink on Yarn 日志配置
- Ansible practical series I_ introduction
- 數據庫高級學習筆記--SQL語句
猜你喜欢

安装numpy问题总结

double转int精度丢失问题
C语言读取BMP文件

自动机器学习框架介绍与使用(flaml、h2o)

{一周总结}带你走进js知识的海洋

AcWing 1298. Solution to Cao Chong's pig raising problem

When you open the browser, you will also open mango TV, Tiktok and other websites outside the home page
Reading BMP file with C language

UDS learning notes on fault codes (0x19 and 0x14 services)

Learn winpwn (2) -- GS protection from scratch
随机推荐
Pytorch基础
Software testing - interview question sharing
數據庫高級學習筆記--SQL語句
Software testing and quality learning notes 3 -- white box testing
QT creator specifies dependencies
When using lambda to pass parameters in a loop, the parameters are always the same value
Software I2C based on Hal Library
Codeforces Round #771 (Div. 2)
PyCharm中无法调用numpy,报错ModuleNotFoundError: No module named ‘numpy‘
02 staff information management after the actual project
vs2019 桌面程序快速入门
ES6 let and const commands
Are you monitored by the company for sending resumes and logging in to job search websites? Deeply convinced that the product of "behavior awareness system ba" has not been retrieved on the official w
学习问题1:127.0.0.1拒绝了我们的访问
Double to int precision loss
double转int精度丢失问题
Codeforces Round #753 (Div. 3)
AcWing 1298.曹冲养猪 题解
express框架详解
[number theory] divisor