当前位置:网站首页>Data Lake: flume, a massive log collection engine
Data Lake: flume, a massive log collection engine
2022-07-28 03:00:00 【YoungerChina】
Special topic : Data Lake series articles
1. summary
Flume Yes, a distributed 、 High availability 、 Highly reliable massive log collection 、 Converged and transported systems , Support for customizing various data senders in the logging system , To collect data , At the same time, it provides the ability to simply process the data and write it to various data receivers .
Flume The design principle of is Data flow based , It can efficiently collect massive log data from different data sources 、 polymerization 、 Move , Finally, it is stored in a centralized data storage system .Flume It can push in near real time , And it can meet the situation that the amount of data is continuous and of great magnitude . For example, it can collect social networking site logs , And collect these huge amounts of log data from the website server , Store in HDFS or HBase Distributed database .
Flume Official website :http://flume.apache.org/
Flume Official documents :http://flume.apache.org/FlumeUserGuide.html
2. Basic framework

First, one will be deployed on each data source flume agent , This agent It is used to take data .
This agent from 3 Components :source,channel,sink. And in the flume in , The basic unit of data transmission is event.
(1)source
Used to collect data from data sources , And transmit the data on channel in .source Support multiple data source collection methods . For example, the monitoring port collects data , Collect from file , Collect from the directory , from http In service collection, etc .
(2)channel
be located source and sink Between , It is a temporary storage area of data . In general , from source The rate of data outflow and sink The rate of outgoing data will vary . So you need a space to temporarily store those that cannot be transferred to sink Data for processing . therefore channel Similar to a buffer , A line .
(3)sink
from channel get data , And write the data to the target source . The target source supports multiple , Like local files 、hdfs、kafka、 next flume agent Of source And so on .
(4)event
The transmission unit ,flume The basic unit of transmission , Include headers and body Two parts ,header You can add some header information ,body Data .
3. Flume characteristic
1) reliability
When a node fails , Logs can be delivered to other nodes without loss .Flume There are three levels of Reliability Assurance , The order from strong to weak is :
(1)end-to-end( Receive the data agent First of all, will event Write to disk , When the data transfer is successful , And then delete ; If the data delivery fails , You can resend it );
(2)Store on failure( This is also scribe Strategies adopted , When the data receiver crash when , Write the data locally , After waiting for recovery , Continue to send );
(3)Best effort( After the data is sent to the receiver , There is no confirmation ).
2) Extensibility
Flume It adopts three-tier architecture , Respectively agent,collector and storage, Each layer can horizontally expand all agent and collector from master Unified management , This makes the system easy to monitor and maintain , And master More than one is allowed ( Use ZooKeeper Manage and load balance ), This avoids a single point of failure .
3) manageability
(1) all agent and colletor from master Unified management , This makes the system easy to maintain .
(2) many master situation ,Flume utilize ZooKeeper and gossip, Ensure the consistency of dynamic configuration data .
(3) Users can go to master Check the execution of each data source or data flow on , And it can configure and load data sources dynamically .
(4)Flume Provides web and shell script command Two forms of data flow management .
4) Functional scalability
(1) Users can add their own agent,collector perhaps storage.
(2) Besides ,Flume It comes with a lot of components , Includes a variety of agent(file, syslog etc. ),collector and storage(file,HDFS etc. ).
5) The document is rich , The community is active
Flume yes Apache The next top project , Has become a Hadoop Standard configuration of ecosystem , Its documentation is relatively rich , The community is more active , It's convenient for us to study .
4. Other questions
Flume Will the collected data be lost ?
according to Flume Architecture principle of ,Flume It's impossible to lose data , It has a perfect internal transaction mechanism ,Source To Channel It's transactional , Channel To Sink It's transactional , Therefore, there will be no data loss in these two links , The only possible loss of data is Channel use memoryChannel, agent Data loss due to downtime , perhaps Channel The storage is full , Lead to Source No more writing , Data not written is lost .Flume No loss of data , But it may cause data duplication , For example, the data has been successfully generated by Sink issue , But no response was received , Sink The data will be sent again , This may cause data duplication .
5. Reference material
[01]https://blog.csdn.net/weixin_41605937/article/details/106812923
[02]https://blog.51cto.com/kinglab/2447898
————————————————
边栏推荐
- CSDN TOP1“一个处女座的程序猿“如何通过写作成为百万粉丝博主?
- 分布式事务——Senta(一)
- Eigenvalues and eigenvectors
- JS event object offsetx/y clientx y pagex y
- 别再用 offset 和 limit 分页了,性能太差!
- New infrastructure helps the transformation and development of intelligent road transportation
- How to authenticate Youxuan database client
- 初识C语言 -- 操作符和关键字,#define,指针
- 分布式 session 的4个解决方案,你觉得哪个最好?
- GBase8s如何在有外键关系的表中删除数据
猜你喜欢

trivy【1】工具扫描运用

Retainface use error: modulenotfounderror: no module named'rcnn.cyton.bbox'
![Trivy [1] tool scanning application](/img/b1/c05949f9379fcde658da64f3a0157a.png)
Trivy [1] tool scanning application

Representation of children and brothers of trees

智能工业设计软件公司天洑C轮数亿元融资

MySQL索引学习

1313_ Pyserial installation and document generation

超参数调整和实验-训练深度神经网络 | PyTorch系列(二十六)

Design of edit memory path of edit box in Gui

CNN训练循环重构——超参数测试 | PyTorch系列(二十八)
随机推荐
P6118 [JOI 2019 Final]珍しい都市 题解
[QNX Hypervisor 2.2用户手册]9.10 pass
There is no way to predict the rise and fall of tomorrow
ROS的调试经验
CSDN Top1 "how does a Virgo procedural ape" become a blogger with millions of fans through writing?
How to simply realize the function of menu dragging and sorting
初识C语言 -- 操作符和关键字,#define,指针
[TA frost wolf \u may hundred people plan] Figure 3.5 early-z and z-prepass
[software testing] - unittest framework for automated testing
Pychart shortcut key for quickly modifying all the same names on the whole page
Is it you who are not suitable for learning programming?
[wechat applet development (VI)] draw the circular progress bar of the music player
【信号去噪】基于卡尔曼滤波实现信号去噪附matlab代码
unity中物体碰撞反弹(学习)
Gbase8s how to delete data in a table with a foreign key relationship
JS 事件对象2 e.charcode字符码 e.keyCode键码 盒子上下左右移动
数据中台夯实数据基础
Superparameter adjustment and experiment - training depth neural network | pytorch series (26)
Skills in writing English IEEE papers
Typescript (zero) -- introduction, environment construction, first instance