当前位置：网站首页>Kafaka log collection

Kafaka log collection

2022-07-05 13:40:00 【[email protected]】

List of articles

1 Kafka In depth architecture
- 1.1 Kafka Workflow and file storage mechanism
2 Filebeat+Kafka+ELK

1 Kafka In depth architecture

1.1 Kafka Workflow and file storage mechanism

Kafka Chinese news is based on topic To classify , Producer production message , Consumer consumption news , It's all about topic Of .

topic It's a logical concept , and partition It's a physical concept , Every partition Corresponds to a log file , The log What is stored in the file is producer Production data .Producer Production data will be continuously added to the log End of file , And every data has its own offset. Every consumer in the consumer group , They will record in real time what they have consumed offset, So that when an error recovers , Keep spending from where you were last .

Because the information produced by the producer will continue to be added to log end of file , To prevent log Too large files lead to inefficient data location ,Kafka It adopts fragmentation and indexing mechanism , Each one partition Divided into several segment. Every segment For two files ：“.index” Document and “.log” file . These files are in a folder , The naming rule for this folder is ：topic name + Zone number . for example ,test This topic There are three divisions , Then the corresponding folder is test-0、test-1、test-2.

index and log File with current segment The first message of offset name .

“.index” Files store a lot of index information ,“.log” Files store a lot of data , The metadata in the index file points to the corresponding data file message The physical offset address of .

Data reliability assurance
To guarantee producer Data sent , Can reliably send to the specified topic,topic Each partition received producer After sending the data , All need to producer send out ack（acknowledgement Acknowledge receipt of ）, If producer received ack, The next round will be sent , Otherwise, resend the data .
Data consistency issues
LEO： Refers to the largest of each copy offset;
HW： It refers to the biggest consumer can see offset, The smallest of all copies LEO.

（1）follower fault
follower It will be temporarily kicked out after a failure ISR（Leader Maintain a and Leader Keep it in sync Follower aggregate ）, To be follower After recovery ,follower Will read the last... Of the local disk record HW, And will log The document is higher than HW Cut off the part of , from HW Began to leader To synchronize . And so on follower Of LEO Greater than or equal to Partition Of HW, namely follower Catch up leader after , You can rejoin ISR 了 .

（2）leader fault
leader After failure , From ISR Choose a new leader, after , To ensure data consistency among multiple copies , The rest follower I'll put my own log The document is higher than HW Cut off the part of , Then from the new leader Synchronous data .

notes ： This only ensures data consistency between replicas , There is no guarantee that the data will not be lost or repeated .

ack Response mechanism
For some less important data , The reliability requirement of data is not very high , Able to tolerate a small amount of data loss , So there's no need to wait ISR Medium follower All received successfully . therefore Kafka Three levels of reliability are provided for users , The user makes a trade-off choice according to the requirements of reliability and delay .

When producer towards leader When sending data , Can pass request.required.acks Parameter to set the level of data reliability ：

0： It means producer There is no need to wait for the coming from broker And continue to send the next batch of messages . In this case, the data transmission efficiency is the highest , But data reliability is the lowest . When broker It is possible to lose data in case of failure .
1（ The default configuration ）： It means producer stay ISR Medium leader The data that has been successfully received and confirmed is sent to the next message. If in follower Before synchronization succeeds leader fault , Then data will be lost .
-1（ Or is it all）：producer Need to wait ISR All in follower After all the data are confirmed to be received, the transmission is completed , Highest reliability . But if follower After synchronization ,broker send out ack Before ,leader failure , It will cause data duplication .

The performance of the three mechanisms decreases in turn , Data reliability increases in turn .

notes ： stay 0.11 Before version Kafka, There is nothing we can do about it , It can only guarantee that the data will not be lost , And then downstream consumers do the overall de duplication of data . stay 0.11 And later versions Kafka, Introduced a major feature ： Idempotency . The so-called idempotency means Producer No matter to Server How many duplicate data are sent , Server There is only one persistence on each end .

2 Filebeat+Kafka+ELK

1. Deploy Zookeeper+Kafka colony

Reference resources Zookeeper + kafka Concept and deployment

2. Deploy Filebeat

cd /usr/local/filebeat

vim filebeat.yml
filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /var/log/messages
    - /var/log/*.log
......
# Add output to  Kafka  Configuration of 
output.kafka:
  enabled: true
  hosts: ["192.168.16.10:9092","192.168.16.20:9092","192.168.16.30:9092"]    # Appoint  Kafka  Cluster configuration 
  topic: "filebeat_test"    # Appoint  Kafka  Of  topic
  
# start-up  filebeat
./filebeat -e -c filebeat.yml

3. Deploy ELK, stay Logstash Create a new node on the node where the component is located Logstash The configuration file

cd /etc/logstash/conf.d/

vim filebeat.conf
input {
    
    kafka {
    
        bootstrap_servers => "192.168.16.10:9092,192.168.16.20:9092,192.168.16.30:9092"
        topics  => "filebeat_test"
        group_id => "test123"
        auto_offset_reset => "earliest"
    }
}

output {
    
    elasticsearch {
    
        hosts => ["192.168.80.30:9200"]
        index => "filebeat_test-%{+YYYY.MM.dd}"
    }
    stdout {
    
        codec => rubydebug
    }
}

# start-up  logstash
logstash -f filebeat.conf

4. Browser access http://192.168.16.30:5601 Sign in Kibana, single click “Create Index Pattern” Button to add an index “filebeat_test-*”, single click “create” Button to create , single click “Discover” Button to view chart information and log information .

原网站

版权声明
本文为[[email protected]]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/186/202207051335384198.html