当前位置:网站首页>Flume learning notes
Flume learning notes
2022-07-03 19:14:00 【Dream touch reincarnation】
function
Distributed real-time files 、 Network port data flow collection , Data from various data sources can be collected to various destinations in real time
characteristic
Real time acquisition Monitor data sources in real time , Collect data as soon as it is generated
Comprehensive function The common data sources and destinations of big data are encapsulated with corresponding interfaces
Allow custom development Java Source code of development , It provides an interface for user-defined development
Development is relatively simple Develop a configuration file , Just write the configuration
It can realize distributed collection Itself is not a distributed tool , It can realize distributed collection
framework
Agent: One flume The program is a Agent
Event:flume The collected data is encapsulated as Event Object to transmit
Source: Monitor data sources in real time , As soon as the data source generates data, it collects
Channel: Be responsible for temporarily storing the collected data , Will all Event Temporary storage
Sink: Responsible for Channel Send the data in to the destination , Initiative from Channel Count
Multi data source architecture

Design purpose : Write a copy to different destinations
Multi tier architecture

Design purpose : Prevent multiple Flume The program directly interacts with the destination , Affect destination performance
Usage mode
offline ( Collect to HDFS): To configure Source and Sink file , start-up Hive and HDFS, Submit and run on the command line
real time ( Collect to Kafka): To configure Source and Sink file , Collect to kafka, For consumption by real-time computing programs
Comparison of similar software
Sqoop The bottom is MapReduce, It is suitable for collection with a large amount of offline data
Flume Suitable for real-time collection of files , Network port
Canal Suitable for real-time acquisition MySQL database

Comparison of similar software
sqoop The bottom is MapReduce, It is suitable for collection with a large amount of offline data
Flume Suitable for real-time collection of files , Network port
Canal Suitable for real-time acquisition MySQL database

Advanced components
Interceptor: Interceptor , stay source Convert each piece of data into event When , Can be in event Head shop addition kv Or filter the data
Add data :
Timestamp Interceptor
Host Interceptor
Static Interceptor
Filtering data :
Regex Filtering Interceptor
Channel Select
Default source Data is sent to each channel One copy , According to Agent The head of the key Different , Send to different channel
Sink Processor
function
Load balancing
Multiple sink With sink group Way to work together , One of the faults , It can also collect normally
Fail over
Multiple sink, One job , Others don't work , Only at work sink It works only after failure , Ensure the normal collection
边栏推荐
- Pytorch introduction to deep learning practice notes 13- advanced chapter of cyclic neural network - Classification
- 【光学】基于matlab介电常数计算【含Matlab源码 1926期】
- Webrtc[41] - Analysis of the establishment process of webrtc transmission channel
- How to design a high concurrency system
- Work Measurement - 1
- [disease identification] machine vision lung cancer detection system based on Matlab GUI [including Matlab source code 1922]
- Simulation scheduling problem of SystemVerilog (1)
- __ Weak and__ The difference between blocks
- Succession of flutter
- 2022.02.11
猜你喜欢

Recommend a GIF processing artifact less than 300K - gifsicle (free download)

Record: writing MySQL commands

The installation path cannot be selected when installing MySQL 8.0.23
![[disease identification] machine vision lung cancer detection system based on Matlab GUI [including Matlab source code 1922]](/img/fc/00835b95537cf889588502a3d13bc9.png)
[disease identification] machine vision lung cancer detection system based on Matlab GUI [including Matlab source code 1922]

Ctrip will implement a 3+2 work system in March, with 3 days on duty and 2 days at home every week

Streaming media server (16) -- figure out the difference between live broadcast and on-demand
![Free hand account sharing in September - [cream Nebula]](/img/4f/fec31778a56886585e35be87885452.jpg)
Free hand account sharing in September - [cream Nebula]

DriveSeg:动态驾驶场景分割数据集

If the warehouse management communication is not in place, what problems will occur?

【学术相关】顶级论文创新点怎么找?中国高校首次获CVPR最佳学生论文奖有感...
随机推荐
If the warehouse management communication is not in place, what problems will occur?
Foundation of ActiveMQ
235. 二叉搜索樹的最近公共祖先【lca模板 + 找路徑相同】
High concurrency Architecture - read write separation
Webrtc[41] - Analysis of the establishment process of webrtc transmission channel
Day_ 18 IO stream system
We have built an intelligent retail settlement platform
硬盘监控和分析工具:Smartctl
【光学】基于matlab介电常数计算【含Matlab源码 1926期】
“google is not defined” when using Google Maps V3 in Firefox remotely
Integrated easy to pay secondary domain name distribution system
What does a really excellent CTO look like in my eyes
SQL: special update operation
EGO Planner代码解析bspline_optimizer部分(3)
Record: pymysql is used in pycharm to connect to the database
Typescript configuration
Succession of flutter
Change is the eternal theme
Dart JSON编码器和解码器剖析
Briefly describe the quantitative analysis system of services