当前位置：网站首页>Distributed file system and enterprise application -- elk enterprise log analysis system

Distributed file system and enterprise application -- elk enterprise log analysis system

2022-06-11 13:57:00 【Fengjiu FJ】

ELK Components

ELK brief introduction

ELK The platform is a complete set of centralized log processing solutions , take ElasticSearch、Logstash and Kiabana Three open source tools are used together , Complete more powerful user query of logs 、 Sort 、 Statistical needs .

1.ElasticSearch（ abbreviation ES）

1） Is based on Lucene（ The architecture of a full-text search engine ） Developed distributed storage retrieval engine , Used to store all kinds of logs
2）Elasticsearch Yes, it is Java Developed , It can be done by RESTful Web Interface , So that users can communicate with Elasticsearch signal communication
3）Elasticsearch It's a real-time , Distributed and scalable search and analysis engine , The advantage is that it can store large capacity data in near real time 、 Search and analysis operations
4）Elasticsearch It can be divided into three types ： Master node 、 Data nodes and client nodes

1.master Master node ：
elasticsearch.yml:
node.master:true
node.data:false
 The main function ： Maintain metadata , Manage cluster node status ; Not responsible for data writing and query 
 Key points of configuration ： Memory can be relatively small , But the machine must be stable , Preferably an exclusive machine 
2.data Data nodes 
elasticsearch.yml:
node.master:false
node.data:true
 The main function ： Responsible for data writing and query , High pressure 
 Key points of configuration ： Large memory , Preferably an exclusive machine 
3.client Client node 
elasticsearch.yml:
node.master:true
node.data:true
 The main function ： Integrate the functions of the above three nodes .
 Key points of configuration ： Large memory , Preferably an exclusive machine .
 In particular ： This configuration is not recommended , Nodes are easy to hang up 
4. The master node is generally configured 3 Servers , The configuration ratio of data node and client node is generally in 3:1 about , Adjust according to the actual situation 

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.
      18.
      19.
      20.

2.Kiabana

Kiabana Usually with Elasticsearch Deploy together ,Kibana yes Elasticsearch A powerful data visualization Dashboard,Kibana Provide graphical web Interface to browse Elasticsearch Log data , Can be used to summarize 、 Analyze and search important data .

3.Logstash

As a data collection engine . It supports dynamic data collection from various data sources , And filter the data 、 analysis 、 Enrich 、 Unified format and other operations , Then store it in the location specified by the user , Usually sent to Elasticsearch.
Logstash from JRuby Language writing , Running on the Java virtual machine （JVM） On , It is a powerful data processing tool , Data transmission can be realized 、 Format processing 、 Format output .Logstash It has powerful plug-in function , Commonly used for log processing .

4.Filebeat：

Lightweight open source log file data collector . It is usually installed on the client that needs to collect data Filebeat, And specify the directory and log format ,Filebeat Can quickly collect data , And send it to logstash To analyze , Or send it directly to Elasticsearch Storage , Performance compared to JVM Upper logstash Obvious advantages , It's a replacement .

summary ：

Logstash Responsible for data collection , Filter data 、 Analysis and other operations , Then store it in the specified location , Send to ES;
ES Is a distributed storage retrieval engine , Used to store all kinds of logs , It allows users to communicate with ES signal communication ;
Kiabana by Logstash and ES Provide graphical log analysis Web Interface display , It can be summed up 、 Analyze and search important data logs .

filebeat combination logstash Benefits

1） adopt Logstash With disk based adaptive buffering system , The system will absorb the incoming throughput , To lessen Elasticsearch Pressure to keep writing data
2） From other data sources （ Like databases ,s3 Object store or message delivery queue ） Extract from
3） Sending data to multiple destinations , for example s3,HDFS (Hadoop distributed file system ） Or write to a file
4） Use conditional data flow logic to form more complex processing pipelines

cache / Message queue (redis、kafka、RabbitNg etc. ):
Traffic peak shaving and buffering can be carried out for high concurrency log data , Such a buffer can protect data from loss to a certain extent , You can also apply decoupling to the entire architecture .

Fluentd:

1） Is a popular open source data collector . because logstash The disadvantage of being too heavy ,Logstash Low performance 、 More resource consumption and other problems , And then there's this Fluentd Appearance . Comparison logstash,Fluentd Easier to use 、 Less resource consumption 、 Higher performance , More efficient and reliable in data processing , Welcomed by enterprises , Become logstash An alternative to , Often applied to EFK Architecture . stay Kubernetes It is also commonly used in clusters BFK As a scheme for log data collection .

2） stay Kubernetes In the cluster, it is generally through DaemonSet To run the Fluentd, So that it's in every Kubernetes You can run one on a work node Pod. It gets the container log file 、 Filter and transform log data , And then pass the data to Elasticsearch colony , Index and store it in the cluster .

Use ELK Why

Logs mainly include system logs , Application logs and security logs . The system operation and maintenance personnel and developers can understand the software and hardware information of the server through the log 、 Check for errors and errors in the configuration process . The cause of the error . Regular log analysis can help you understand the load of the server , Performance security , So as to take measures to correct mistakes in time
Often we use the log of a single machine grep、awk And other tools can basically realize simple analysis , But when logs are distributed across different devices . If you manage hundreds of servers , You're still using the traditional method of logging in each machine in turn to look up the logs . It feels cumbersome and inefficient
We need to use centralized log management , for example ∶ Open source syslog, Summarize the log collection on all servers . After centralized management of logs , Log statistics and retrieval has become a more cumbersome thing , Generally we use grep、awk and wc etc. Linux Command can realize retrieval and statistics , But for more demanding queries 、 Sorting and statistics requirements and large number of machines are still using this method, which is hard to avoid
Generally, a large system is a distributed deployment architecture , Different service modules are deployed on different servers , When problems arise , Most situations need to be based on the key information exposed by the problem , Go to specific servers and service modules , Building a centralized log system , It can improve the efficiency of location problem

The basic characteristics of complete log system

collect ∶ Be able to collect log data from multiple sources
transmission ∶ It can stably parse, filter and transmit log data to the storage system
Storage ∶ Store log data
analysis ∶ Support UI analysis
Warning ∶ Able to provide error reports , Monitoring mechanism

ELK How it works

Deploy on all servers that need to collect logs Logstash; Or you can centralize the log management on the log server , Deploy on the log server Loqstash
Logstash Collect the logs , Format the log and output it to Elasticsearch In a crowd
Elasticsearch Index and store the formatted data
Kibana from ES Query data in the cluster to generate charts , And display the front-end data

summary :

logstash As a log collector , Collect data from a data source , And filter the data , format processing , And then leave it to Elasticsearch Storage ,kibana. Visualize the log .

Deploy ELK Log analysis system

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data

host	ip	Installation package
node1 2 nucleus 4G	192.168.163.11	Elasticsearch、kibana、Elasticsearch-head（ Easy to manage ES colony ）
node2 2 nucleus 4G	192.168.163.12	Elasticsearch
apache	192.168.163.13	httpd、Logstash

1. Turn off the firewall and system security mechanism , Change host name

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_02

2. To configure elasticsearch Environmental Science

node1（192.168.163.11）
node2（192.168.163.12）

echo '192.168.163.11 node1' >> /etc/hosts
echo '192.168.163.12 node2' >> /etc/hosts

java -version    # If not installed ,yum -y install java

     
      1.
      2.
      3.
      4.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _03

3. Deploy elasticsearch Software

node1（192.168.163.11）
node2（192.168.163.12）

1） install elasticsearch—rpm package

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_04

2） Load system services

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_05

3） change elasticsearch Master profile

cp /etc/elasticsearch/elasticsearch.yml /etc/elasticsearch/elasticsearch.yml.bak

vim /etc/elasticsearch/elasticsearch.yml
#17 That's ok ; uncomment , modify ; The cluster name 
cluster.name: my-elk-cluster
#23 That's ok ; uncomment , modify ; Node name （node2 Modified into node2）
node.name: node1
#33 That's ok ; uncomment , modify ; Data storage path 
path.data: /data/elk_data
#37 That's ok ; uncomment , modify ; Log storage path 
path.logs: /var/log/elasticsearch
#43 That's ok ; uncomment , modify ; Don't lock memory at boot time 
bootstrap.memory_lock: false
#55 That's ok ; uncomment , modify ; Providing service binding IP Address ,0.0.0.0 For all addresses 
network.host: 0.0.0.0
#59 That's ok ; uncomment ; The listening port is 9200（ Default ）
http.port: 9200
#68 That's ok ; uncomment , modify ; Cluster discovery is realized by unicast , Specify the nodes to discover  node1、node2
discovery.zen.ping.unicast.hosts: ["node1", "node2"]

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.
      18.
      19.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _06

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_07

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_08

4） Verification configuration

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _09

5） Create data storage path and authorize

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _10

6） start-up elasticsearch Is it successfully opened

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _11

7） View node information

In the host machine 192.168.163.1 Visit

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_12

8） Verify cluster health status

In the host machine 192.168.163.1 Visit

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_13

9） View the cluster status

In the host machine 192.168.163.1 Visit

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_14

4. install elasticsearch-head plug-in unit

install elasticsearch-head plug-in unit , For managing clusters

1） Compilation and installation node Component dependency package

node1（192.168.163.11）
node2（192.168.163.12）

yum -y install gcc gcc-c++ make

 Upload package  node-v8.2.1.tar.gz  To /opt
cd /opt
tar xzvf node-v8.2.1.tar.gz
cd node-v8.2.1/
./configure && make && make install
 It takes a long time here  

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_15

2） install phantomjs（ The front frame ）

node1（192.168.163.11）
node2（192.168.163.12）

 Upload package  phantomjs-2.1.1-linux-x86_64.tar.bz2  To /opt Under the table of contents 
cd /opt
tar jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 -C /usr/local/src/
cd /usr/local/src/phantomjs-2.1.1-linux-x86_64/bin
cp phantomjs /usr/local/bin

     
      1.
      2.
      3.
      4.
      5.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_17

3） install elasticsearch-head（ Data visualization tool ）

node1（192.168.163.11）
node2（192.168.163.12）

 Upload package  elasticsearch-head.tar.gz  To /opt
cd /opt
tar zxvf elasticsearch-head.tar.gz -C /usr/local/src/
cd /usr/local/src/elasticsearch-head/
npm install

     
      1.
      2.
      3.
      4.
      5.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_19

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_20

4） Modify master profile

node1（192.168.163.11）
node2（192.168.163.12）

vim /etc/elasticsearch/elasticsearch.yml
......
#------- At the end of ; Add the following --------
http.cors.enabled: true
http.cors.allow-origin: "*"

#----------- Parameter interpretation -----------------------------
http.cors.enabled: true				# Enable cross domain access support , The default is  false
http.cors.allow-origin: "*"			# Specify the domain names and addresses allowed for cross domain access for all 

systemctl restart elasticsearch.service

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _21

5） start-up elasticsearch-head

node1（192.168.163.11）
node2（192.168.163.12）

 Must be after decompression  elasticsearch-head  Start the service in the directory , The process will read the  gruntfile.js  file , Otherwise, it may fail to start .
cd /usr/local/src/elasticsearch-head/
npm run start &

> [email protected] start /usr/local/src/elasticsearch-head
> grunt server

Running "connect:server" (connect) task
Waiting forever...
Started connect web server on http://localhost:9100

elasticsearch-head  The listening port is  9100
netstat -natp |grep 9100

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _ data _23

6） Use elasticsearch-head Plug in to view cluster status

In the host machine 192.168.163.1 Visit

http://192.168.163.11:9100
 stay Elasticsearch  Enter... In the following column 
http://192.168.163.11:9200

http://192.168.163.12:9100
 stay Elasticsearch  Enter... In the following column 
http://192.168.163.12:9200

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_24

7） Create index

node1（192.168.163.11）
Create index as index-demo, The type is test

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_25

8） Index information

Back to the host 192.168.163.1
Open the browser and enter the address , View index information

http://192.168.163.11:9100
 The index is partitioned by default 5 individual , And there's a copy 
 Click data to browse , Will find node1 The index created on is index-demo, The type is test,  Relevant information 

     
      1.
      2.
      3.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_26

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_27

5. install logstash

Collect logs and output to elasticsearch in

1） install Apahce service （httpd）

apache（192.168.163.13）

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_28

2） install Java Environmental Science

apache（192.168.163.13）

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_29

3） install logstash

apache（192.168.163.13）

 Upload logstash-5.5.1.rpm To /opt Under the table of contents 
cd /opt
rpm -ivh logstash-5.5.1.rpm

systemctl start logstash.service
systemctl enable logstash.service

# establish logstash Soft connection 
ln -s /usr/share/logstash/bin/logstash /usr/local/bin/

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_30

4） test logstash command

apache（192.168.163.13）

 Field description explains ：
-f   With this option you can specify logstash Configuration file for , Configure according to the configuration file logstash
-e   Followed by a string   The string can be treated as logstash Configuration of （ If it is ” ”, It is used by default stdin As input 、stdout As the output ）
-t   Test the configuration file for correctness , And then quit 

     
      1.
      2.
      3.
      4.

 Define input and output streams ：
 The input is standard input , The output is standard output （ Similar pipe ）
logstash -e 'input { stdin{} } output { stdout{} }'

     
      1.
      2.
      3.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_31

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_32

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_33

5） View index information

In the host machine 192.168.163.1 Visit

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_34

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_35

6） stay Apache Make docking configuration on the host

apache（192.168.163.13）

Logstash The configuration file consists of three parts ：input、output、filter（ According to need ）
chmod o+r /var/log/messages
ll /var/log/messages

vim /etc/logstash/conf.d/system.conf
input {
       file{
        path => "/var/log/messages"
        type => "system"
        start_position => "beginning"
        }
      }
output {
        elasticsearch {
          hosts => ["192.168.163.11:9200"]
          index => "system-%{+YYYY.MM.dd}"
          }
        }


systemctl restart logstash.service

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.
      18.
      19.
      20.
      21.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_36

7） View index information

In the host machine 192.168.163.1 Visit

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_37

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_38

6. install kibana

node1（192.168.163.11）

 Upload kibana-5.5.1-x86_64.rpm  To /opt Catalog 
cd /opt
rpm -ivh kibana-5.5.1-x86_64.rpm

cd /etc/kibana/
cp kibana.yml kibana.yml.bak

vim kibana.yml
#2 That's ok ; uncomment ;kibana Open port （ Default 5601）
server.port: 5601
#7 That's ok ; uncomment , modify ;kibana Listening address 
server.host: "0.0.0.0"
#21 That's ok ; uncomment , modify ; and elasticsearch Make connections 
elasticsearch.url: "http://192.168.163.11:9200"
#30 That's ok ; uncomment ; stay elasticsearch Add .kibana Indexes 
kibana.index: ".kibana"              				

systemctl start kibana.service 
systemctl enable kibana.service

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.
      18.
      19.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_39

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_40

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_41

1） In the host machine 192.168.163.1 Visit

192.168.163.11:5601

1. Create an index when logging in for the first time   name ：system-* （ This is the docking system log file ）
 Then point to the bottom to come out create  Button to create 
2. Then click the top left corner Discover Button   Will find system-* Information 
3. Then click the following host Lateral add  You will find that the picture on the right is only  Time  and host  Options.   This is friendly 

     
      1.
      2.
      3.
      4.
      5.
      6.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_42

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_43

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_44

2） docking Apache The host Apache Log files （ Access log 、 Error log ）

apache（192.168.163.13）

cd /etc/logstash/conf.d/

vim apache_log.conf
input {
     file{
        path => "/etc/httpd/logs/access_log"
        type => "access"
        start_position => "beginning"
      }
     file{
        path => "/etc/httpd/logs/error_log"
        type => "error"
        start_position => "beginning"
      }
}
output {
    if [type] == "access" {
        elasticsearch {
          hosts => ["192.168.163.11:9200"]
          index => "apache_access-%{+YYYY.MM.dd}"
        }
    }
    if [type] == "error" {
        elasticsearch {
          hosts => ["192.168.163.11:9200"]
          index => "apache_error-%{+YYYY.MM.dd}"
        }
    }
}

/usr/share/logstash/bin/logstash -f apache_log.conf

     
      1.
      2.
      3.
      4.
      5.
      6.
      7.
      8.
      9.
      10.
      11.
      12.
      13.
      14.
      15.
      16.
      17.
      18.
      19.
      20.
      21.
      22.
      23.
      24.
      25.
      26.
      27.
      28.
      29.
      30.
      31.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_45

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_46

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_47

3） In the host machine 192.168.163.1 Visit

1. Open the input http://192.168.163.13, Manufacturing point access record 
2. Open the browser   Input http://192.168.163.11:9100/  View index information 
 Can find apache_error-2021.03.04 and apache_access-2021.03.04
3. Open the browser   Input http://192.168.163.11:5601
 Click on the bottom left corner to have a management Options —index patterns—create index pattern
 Create separate apache_error-*  and  apache_access-*  The index of 

     
      1.
      2.
      3.
      4.
      5.
      6.

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_48

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _apache_49

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_50

Distributed file systems and enterprise applications ——ELK Enterprise log analysis system _elasticsearch_51

summary

1）ELK Three components and their workflow

Components ：ElasticSearch（ abbreviation ：ES）、Logstash and Kiabana
technological process ：
① Logstash Responsible for data collection , Filter data 、 Analysis and other operations , Then store it in the specified location , Send to ES;
② ES Is a distributed storage retrieval engine , Used to store all kinds of logs , It allows users to communicate with ES signal communication ;
③ Kibana by Logstash and ES Provide graphical log analysis Web Interface display , It can be summed up 、 Analyze and search important data logs .

2） What is commonly used in production to replace logstash？ Why? ？

① In general use Filebeat Instead of logstash
because logstash By Java Developed , Need to run on JVM On , It consumes a lot of resources , Operation occupancy

② CPU And high memory . In addition, there is no message queuing cache , There is a hidden danger of data loss ; and filebeat Is a lightweight open source log file data collector , Can quickly collect data , And send it to logstash To analyze , Performance compared to JVM Upper logstash Obvious advantages .

3）ELK What are the steps of cluster configuration

1） Generally, at least three hosts are required
2） Set the host name and name of each host IP Mapping , modify ES Master profile
3） By modifying the discovery.zen.ping term , The cluster is realized through unicast , Specify the nodes to discover .

4）ELK Treatment process

【APPServer colony 】----> 【Logstash Agent collector 】—>【Elasticsearch Cluster】—>【Kibana Server】—>【Brewser】

① The backend server cluster generates logs
②Logstash Collect 、 Filter 、 Output and other operations
③ Give the processed log to ES Cluster storage
④ES And front end Kibana Docking
⑤Kibana Visualize the log , And show it to each terminal

原网站

版权声明
本文为[Fengjiu FJ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/03/202203012045008113.html