当前位置:网站首页>Flume learning notes
Flume learning notes
2022-07-03 19:14:00 【Dream touch reincarnation】
function
Distributed real-time files 、 Network port data flow collection , Data from various data sources can be collected to various destinations in real time
characteristic
Real time acquisition Monitor data sources in real time , Collect data as soon as it is generated
Comprehensive function The common data sources and destinations of big data are encapsulated with corresponding interfaces
Allow custom development Java Source code of development , It provides an interface for user-defined development
Development is relatively simple Develop a configuration file , Just write the configuration
It can realize distributed collection Itself is not a distributed tool , It can realize distributed collection
framework
Agent: One flume The program is a Agent
Event:flume The collected data is encapsulated as Event Object to transmit
Source: Monitor data sources in real time , As soon as the data source generates data, it collects
Channel: Be responsible for temporarily storing the collected data , Will all Event Temporary storage
Sink: Responsible for Channel Send the data in to the destination , Initiative from Channel Count
Multi data source architecture

Design purpose : Write a copy to different destinations
Multi tier architecture

Design purpose : Prevent multiple Flume The program directly interacts with the destination , Affect destination performance
Usage mode
offline ( Collect to HDFS): To configure Source and Sink file , start-up Hive and HDFS, Submit and run on the command line
real time ( Collect to Kafka): To configure Source and Sink file , Collect to kafka, For consumption by real-time computing programs
Comparison of similar software
Sqoop The bottom is MapReduce, It is suitable for collection with a large amount of offline data
Flume Suitable for real-time collection of files , Network port
Canal Suitable for real-time acquisition MySQL database

Comparison of similar software
sqoop The bottom is MapReduce, It is suitable for collection with a large amount of offline data
Flume Suitable for real-time collection of files , Network port
Canal Suitable for real-time acquisition MySQL database

Advanced components
Interceptor: Interceptor , stay source Convert each piece of data into event When , Can be in event Head shop addition kv Or filter the data
Add data :
Timestamp Interceptor
Host Interceptor
Static Interceptor
Filtering data :
Regex Filtering Interceptor
Channel Select
Default source Data is sent to each channel One copy , According to Agent The head of the key Different , Send to different channel
Sink Processor
function
Load balancing
Multiple sink With sink group Way to work together , One of the faults , It can also collect normally
Fail over
Multiple sink, One job , Others don't work , Only at work sink It works only after failure , Ensure the normal collection
边栏推荐
- What does a really excellent CTO look like in my eyes
- 【疾病识别】基于matlab GUI机器视觉肺癌检测系统【含Matlab源码 1922期】
- Latex image rotates with title
- SSM integration - joint debugging of front and rear protocols (list function, add function, add function status processing, modify function, delete function)
- leetcode:556. Next larger element III [simulation + change as little as possible]
- Thesis study - 7 Very Deep Convolutional Networks for Large-Scale Image Recognition (3/3)
- [water quality prediction] water quality prediction based on MATLAB Fuzzy Neural Network [including Matlab source code 1923]
- Sqlalchemy - subquery in a where clause - Sqlalchemy - subquery in a where clause
- 达梦数据库的物理备份和还原简解
- DriveSeg:动态驾驶场景分割数据集
猜你喜欢
![[optics] vortex generation based on MATLAB [including Matlab source code 1927]](/img/9b/b7f462e2ecbff0cee35e7de5c80cf7.jpg)
[optics] vortex generation based on MATLAB [including Matlab source code 1927]

EGO Planner代码解析bspline_optimizer部分(2)
![[leetcode weekly race] game 300 - 6110 Number of incremental paths in the grid graph - difficult](/img/8d/0e515af6c17971ddf461e3f3b87c30.png)
[leetcode weekly race] game 300 - 6110 Number of incremental paths in the grid graph - difficult

my. INI file not found

为什么要做特征的归一化/标准化?

Simulation scheduling problem of SystemVerilog (1)

Day_ 18 IO stream system
![[leetcode] [SQL] notes](/img/8d/160a03b9176b8ccd8d52f59d4bb47f.png)
[leetcode] [SQL] notes

我眼中真正优秀的CTO长啥样

Web3 credential network project galaxy is better than nym?
随机推荐
Streaming media server (16) -- figure out the difference between live broadcast and on-demand
The installation path cannot be selected when installing MySQL 8.0.23
Pan for in-depth understanding of the attention mechanism in CV
达梦数据库的物理备份和还原简解
2022.2.14 Li Kou - daily question - single element in an ordered array
User identity used by startup script and login script in group policy
application
OSPF - detailed explanation of stub area and full stub area
Record: MySQL changes the time zone
[leetcode weekly race] game 300 - 6110 Number of incremental paths in the grid graph - difficult
Find the median of two positive arrays
How to build an efficient information warehouse
Record the errors reported when running fluent in the simulator
Sqlalchemy - subquery in a where clause - Sqlalchemy - subquery in a where clause
Bad mentality leads to different results
The online customer service system developed by PHP is fully open source without encryption, and supports wechat customer service docking
东数西算拉动千亿产业,敢啃“硬骨头”的存储厂商才更有机会
__ Weak and__ The difference between blocks
Redis master-slave synchronization, clustering, persistence
Webrtc[41] - Analysis of the establishment process of webrtc transmission channel