当前位置:网站首页>Flink introduction and operation architecture
Flink introduction and operation architecture
2022-07-27 19:49:00 【Lin Musen^~^】
What is it?
- Apache Flink Is a framework and distributed processing engine , Used in Borderless and borderless Stateful computation on the data stream
Concept :
Data flow
- Any type of data can form an event stream , Credit card transactions 、 Sensors measure 、 Machine log 、 A record of user interaction on a website or mobile app , All of this data forms a stream .
What is bounded flow
- There is the beginning of a defined flow , It also defines the end of the flow . A bounded flow can compute after taking all the data . All data in a bounded flow can be sorted , So there's no need for an orderly intake . Bounded flow processing is often referred to as batch processing
What is unbounded flow
- There is the beginning of a defined flow , But the end of the flow is not defined . They produce data endlessly . The data of unbounded flow must be processed continuously , That is, the data needs to be processed immediately after being ingested . We can't wait until all the data arrives , Because the input is infinite , The input will not be completed at any time . Processing unbounded data usually requires ingesting events in a specific order , For example, the sequence of events , So that we can infer the integrity of the result .
Deployment way :
Local Local deployment , Start the process directly , Suitable for debugging
Standalone Cluster Cluster deployment ,flink Built in cluster mode
On Yarn Computing resources are unified by Hadoop YARN Manage resources for scheduling , Use on demand to improve the resource utilization of the cluster , Production environment
Operation process
- User submit Flink Program to JobClient,
- JobClient Of analysis 、 Optimization submitted to JobManager
- TaskManager function task, And report the information to JobManager
- Popular explanation
- JobManager contractor
- TaskManager Task leader
- Task solt Worker ( Do things in parallel )
Run the architecture
- Flink It's a distributed system , Computing resources need to be effectively allocated and managed to execute streaming applications
- The runtime consists of two types of processes
- One JobManager
- One or more TaskManager.
- The runtime consists of two types of processes

What is? JobManager( Big Boss, contractor )
- Coordinate Flink Distributed execution functions of applications
- It decides when to schedule the next task( Or a group task)
- Yes, the completed task Or respond to execution failure
- Coordinate checkpoint、 And coordinate recovery from failure, etc
- Coordinate Flink Distributed execution functions of applications
What is? TaskManager ( Task leader , The brick man )
- Responsible for calculation worker, And report memory 、 The task operation status is given to JobManager etc.
- At least one TaskManager, Also known as worker Execute the of the job flow task, And cache and exchange data streams
- stay TaskManager The smallest unit of resource scheduling in is task slot
Advanced
Jobmanager Advanced
- JobManager The process consists of three different components
- ResourceManager
- be responsible for Flink The resources in the cluster provide 、 Recycling 、 Distribute - It manages task slots
- Dispatcher
- Provides a REST Interface , To submit Flink Application execution
- Start a new job for each submitted job JobMaster.
- function Flink WebUI Used to provide job execution information
- JobMaster
- Responsible for managing individual JobGraph Implementation ,Flink Multiple jobs can run simultaneously in a cluster , Each assignment has its own JobMaster
- At least one JobManager, High availability (HA) There may be more than one... In the setting JobManager, One of them is always leader, Others are standby
- ResourceManager
- JobManager The process consists of three different components
TaskManager Advanced
- TaskManager in task slot The number of indicates concurrent processing task The number of
- One task slot Multiple operators can be executed in , There are multiple threads
- operator opetator
- source
- transformation
- sink
- operator opetator
- For distributed execution ,Flink Put the of the operator subtasks link become tasks, Every task Executed by a thread
- In the figure source and map Operators form a chain of operators , As a task Running on a thread
- Link the operator to task Is a useful optimization : It reduces inter thread switching 、 Buffering overhead , And reduce the delay while increasing the overall throughput

Task Slots Task slot
Task Slot yes Flink Task executor in , Every Task Slot Can run multiple subtask , Every subtask Will run as a separate thread
Every worker(TaskManager) It's a JVM process , You can execute a... In a separate thread (1 individual solt) Or more subtask
In order to control a TaskManager How many task, There's the so-called task slots( At least one )
Every task slot representative TaskManager A fixed subset of resources in
Be careful
- all Task Slot Average distribution TaskManger Of memory , TaskSolt No, CPU Isolation
- At present TaskSolt Exclusive memory space , There is no interaction between operations
- One TaskManager How many in the process taskSolt It means how many concurrent
- task solt The quantity suggestion is cpu The number of nuclear , Exclusive memory , share CPU
5 individual subtask perform , So there is 5 Parallel threads
- Task It just encapsulates a Operator perhaps Operator Chain Of parallel instance.
- Sub-Task Emphasize the same Operator perhaps Operator Chain With multiple parallel Task
- In the figure source and map Operators form a chain of operators , As a task Running on a thread
- Operators are linked into One task It reduces inter thread switching 、 Buffering overhead , And reduce the delay while increasing the overall throughput

- Task Slot yes Flink Task executor in , Every Task Slot Can run multiple task namely subtask , Every sub task Will run as a separate thread

- Flink Operators can pass through 【 one-on-one 】 Mode or 【 Redistribute 】 Mode transfer data

- A very important distinction TaskSolt and parallelism Parallelism configuration
- task slot It's a static concept , Refer to taskmanager It has the ability of concurrent execution ;
- parallelism It's a dynamic concept , Refer to The concurrency capability actually used when the program runs
- The former is the ability to, for example 100 individual , The latter is actually used for concurrency , For example, as long as 20 Just one concurrency .
边栏推荐
- Intel launched the world's smallest high-resolution lidar, priced at only $349
- C language: clion debugging method
- Fzu1669 right angled triangle
- Intel releases horse ridge chip: 22nm process, which can control multiple qubits
- 27、golang基础-互斥锁、读写锁
- GestureOverlayView(手势识别2)
- 带来高价值用户体验的低代码开发平台
- 开启和禁用hyper-v
- Flink 算子简介
- Map and set
猜你喜欢

Release Samsung 3J1 sensor: the code implies that the safety of pixel 7 face recognition will be greatly increased
Dry goods of technical practice | preliminary exploration of large-scale gbdt training

C language: 7. How to use C language multi source files

Under the heat wave of Web3.0, the ecological shock of Mensa struck

Application pool has been disabled

应用程序池已被禁用

IIS 发生未知FastCGI错误:0x80070005

Introduction to Flink operator

C language: C language code style

An unknown fastcgi error occurred in IIS: 0x80070005
随机推荐
【深度学习基础知识 - 41】深度学习快速入门学习资料
[basic knowledge of deep learning - 50] PCA dimensionality reduction principal component analysis
Low code implementation exploration (45) business parameters
三星将推多款RISC-V架构芯片,5G毫米波射频芯片会率先采用
Oppo released the first AR glasses and announced that it would invest 50billion in research and development in the next three years
Intent(有无返回值得跳转)
SystemService(系统服务)
ReferenceError: __ dirname is not defined in ES module scope
IIS 发生未知FastCGI错误:0x80070005
C language: 8. Makefile preparation
全球手机芯片设计领域首家!紫光展锐荣膺TMMi4级国际认证
英特尔未来10年工艺路线图曝光:2029年推出1.4nm工艺!如何实现?
【日常积累 - 07】cuda多版本切换
go-zero单体服务使用泛型简化注册Handler路由
A lock faster than read-write lock. Don't get to know it quickly
Fzu1669 right angled triangle
Big guys, Oracle CDC, local operation, always encounter this an exception occurred in
GridView(实现表格显示图标)
Dry goods of technical practice | preliminary exploration of large-scale gbdt training
Tab control of MFC advanced control (CTabCtrl)