当前位置：网站首页>Small project on log traffic monitoring and early warning | environment and foundation 1

Small project on log traffic monitoring and early warning | environment and foundation 1

2022-06-11 00:53:00 【Smelly Nian】

Environment and foundation ：

web colony ： Realization web Many machines with different functions

Each visit will be logged （access journal ）

cdn： Content distribution network （Content Delivery Network）, Cache , If so, go back to , If not, go back to the source , Cache it , The next visit will be soon

cdn The logs of can be synchronized to the logs of the headquarters regularly

Whether writing code or architecture , There are no problems that cannot be solved by joining the m-server ！！！！

Usually it is to buy cdn service （cdn Pay by traffic ）

Bandwidth 95 value ： That is, the position from the minimum bandwidth to 95% of the maximum bandwidth

if cdn error You can have multiple nodes Or visit the original site

if cdn error , Monitor the access log to see the traffic （ Too large or too small ） It is obvious that

monitor ：（ Data from MySQL Come on ） flask frame （ Connect data to pages ）

Whole project ：

Kafka Architecture diagram ：

filebeat monitor nginx acces.log Log changes

Then output to kafka colony

And we need to write python Program , consumption kafka journal , Conduct cleaning settlement , Write results to mysql

kafka Generally known as Message middleware （ middleware ： It doesn't have much to do with your main business It doesn't create intuitive value for you Just play other roles in the middle ）

Other message oriented middleware ：redis rebbitmq nsq

# producer Consumer model （ A general model in a computer ）

Add a warehouse （ middleware ）

The role of message middleware （ application ）（ Include kafka）：

1. Realize the decoupling of business

-- The failure of edge business operation will not affect the core business

-- Make it easy for a component to extend something else

2. Traffic peak clipping （ Such as double eleven Or grab red envelopes ）

3. Log collection

flask It's the frame

Below flask producers E-mail becomes a consumer （ Realize the decoupling of business ）

Why does our project introduce kafka？

1. decoupling Prevent processor errors from affecting the core business （ " web I can't visit ）

2. Centralized storage of all logs in any program Easy to view and analyze （ Such as nginx tomcat mysql etc. You don't have to search one machine at a time ）

3. Because everybody's using I want to learn

The way messages are delivered ：

Point to point ： one-on-one One to one correspondence between producers and consumers , Consumer consumption , There is no message oriented middleware

Release - subscribe ： It can be shared by multiple consumers and independent of each other

The status of each consumer will be recorded （kafka Generally used to publish - The messaging method of subscription ）

kafka Technical terms of ：

broker：kafka The cluster consists of one or more servers , The server node is called broker（ For example, there are only three projects broker）

topic： Categories of messages

partition： Partition Equivalent to a container for messages Increase throughput Increase of efficiency

（ How many do you usually have broker Just set up a few partition）

partition Support concurrent read and write

There are many. partition Words , Generally speaking, the order of messages is different from others , But on its own partition It's still in order

There's one for each division leader

producer： producer Throwing data （ When the producer writes data, there will be a ack Sign a 0 1 -1 When writing, you will get a response ）

consumer： consumer Read the data

（ Consumer groups ： Multiple consumers form a large consumption group Increase throughput General follow partition The numbers are consistent ）

replicas： copy

（kafka The most basic principle of high availability of messages ）

（ in the light of partition Of ）

（ Appoint 2 It means there is a copy besides itself ）

（ therefore partition It can also be called a copy Therefore, the replica needs to be elected through the replica election algorithm leader Consumer producers will go to leader In which data is written or read Other copies are called follower Just backup If leader There's a problem Will be re elected ）

（leader and follower Will be consistent ）

The message must be created topic Can store

Generally speaking partition Is evenly distributed

Leader Election reference website ：kafka Knowledge system -kafka leader The election - aidodoo - Blog Garden

DNS： The domain name system （Domain Name System, abbreviation ：DNS） yes Internet A service of . It will be domain name and IP Address mutual mapping One of the Distributed database , Make it easier for people to access Internet .

Why choose this project ？

Lab teacher
Elder martial brother and elder martial sister

here ack This flag bit is not called ack It's just the configuration inside the producer

0： The producer just sends Whether the server sends messages or not

1： only leader Message of successful synchronization The producer will receive a response from the server

If the message doesn't arrive leader node （ for example leader Node crash , new leader The node has not been elected ） The producer will receive an error response , To avoid data loss , The producer will resend the message

If a node that does not receive a message becomes new leader, The message will still be lost

-1： The producer needs to get all the copies （ Include follower） After the feedback is successfully written Before continuing to send

ISR：（in-sync-replica） It is equivalent to a set list Need to synchronize follower aggregate

For example there are 5 Copies ,1 individual leader 4follower（follower All put in ISR Inside ）

There's a message coming ,leader How do you know which copies to synchronize ？

according to ISR Come on , If one follower Hang up , Then delete from this list

If one follower Stuck or too slow , It will also ISR In the delete , It can even be deleted as 0, Will not recreate follower

kafka There is no guarantee that the data is absolutely consistent

kafka How to ensure high availability （ It doesn't affect the operation after hanging up ）？

Multiple broker Multiple partition Multiple replica

kafka How to ensure data consistency ？

More request.required.acks From the choice of

Consumers' offest（ Offset ）： An offset will be recorded for each consumption

[[email protected] ~]# cat install_kafka.sh

#!/bin/bash

# You need to modify the name of each server in advance , for example kafka-1 kafka-2 kafka-3

hostnamectl set-hostname kafka-1 # The first 1 The name of the machine , And so on , In the 2 Before executing the script on the stage , Change the name to kafka-2 , The first 3 The table is modified to kafka-3

# modify /etc/hosts file , add to 3 platform kafka Server's ip The corresponding name

cat >>/etc/hosts <<EOF

192.168.0.190 kafka-1

192.168.0.191 kafka-2

192.168.0.192 kafka-3

EOF

# Install the software , Resolve dependencies

yum install wget lsof vim -y

# Install time synchronization service

yum install chrony -y

systemctl enable chronyd

systemctl start chronyd

# Set time zone

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

# Turn off firewall

systemctl stop firewalld

systemctl disable firewalld

# close selinux

setenforce 0

sed -i '/^SELINUX=/ s/enforcing/disabled/' /etc/selinux/config

During installation Just see glibc Don't quit Otherwise down

Now? nginx Just provide web service Provides a static interface

client （ Qin Yu ） Connection domain name www.sc.com If the domain name address is resolved randomly IP Analyze which one you can connect to

（ If one of the servers hangs The client is just connected You may not be able to access the page Therefore, the page is not highly available at present ）

Load Balancer ：load barancer To schedule

Scheduling algorithm ： polling （round rinbin abbreviation rr） All machines are treated equally

There is a weighted polling in the polling （ Added a weight： Weight value , That is to say, they are not treated equally Weighted value ）

IP Hash ： according to IP Hash the address

Minimum connections ： Work less and give more How much work is in hand

All connections must be connected to middleware Nature is NAT Connect （full NAT）

The middleware we use this time is nginx

But the machine behind the server （backend Back end real server The server ） Don't get user Of IP Address

So we have to add a module to keep user Of IP Address To log （ stay HTTP An optional header field is added to the request message inside to retain the original source IP Address ）（ The middleware operates this process ）

scp： Remote copy file and file name

scp file （ Transfer folder plus -r） IP Address ： route Push this file to the corresponding IP The corresponding path of the address

In the middle nginx Two functions ：

Load balancing

Modify the configuration file stay http Add a load balancer in the module （upstream）:

Poll by default

Weighted polling is followed by weight （ The default is 1）

Then comment out your own page Add forwarding function

（ In this way, the source addresses obtained by the back-end service machines are all middleware IP）

You can also put load balancing on kafka On the same machine , Just configure a virtual host ：

Add one more server and upstream

If only the port settings were different

Let the back-end real server（backend） Know the front end user Of IP

rginx In the request message remote_address Give a newly added header field The name can be customized , Underscores... Cannot be used （x-real-ip）

The current architecture is still not highly available

Because if the reverse proxy machine hangs up, it will still collapse So we will use a highly available software later keepalived To achieve high availability

Nginx Now the architecture

The best in reality kafka and nginx Wrap up separately

Folder details ：

bin Executable file

conf The configuration file （.cfg It represents the master configuration file ）

docs file

lib library

logs journal

zookeeper and kafka Are two completely different services

Zookeeper Inside ：

Specify the data directory

Specify port number ：

Claim cluster 1 Host No 2 Host No 3 Host No

（.1 .2 .3 It is the unique identifier in the cluster Not to be repeated ）

The last two are ports One for data transmission One is used to test viability and election

Represents writing data to any host Data will be synchronized

Check the log ：

vim zookeeper-root-server-nginx-kafka03.out

zookeeper There will also be elections in the cluster ：

Consistency algorithm ：raft

（ Election principles ： The minority is subordinate to the majority Only those with more than half of the votes can be elected ）

（ Generally speaking, machines are singular ）

The machine must survive for more than half

Three people negotiate See who votes who The minority is subordinate to the majority Choose a candidate （ If the machine hangs up, it will be counted as one ticket So only one of the three machines can die Otherwise, more than half of the requirements will not be met ）

partition： in the light of topic The partition

tmux The effect of synchronization

CTRL+B+： Multi window synchronization

Turn on synchronization

Cancel synchronization

When it's on We need to start first. zk Start up kafka

When closing the service First off kafka Close again zk

kafka The configuration file

broker.id yes broker Unique identification of

zk The three ports of ：

2181 Provide services （client visit ）

The following is the communication between clusters 2888：3888

2888 Provide election survival detection

3888 Provide data interaction

Start with a daemon kafka：

bin/kafka-server-start.sh -daemon config/server.properties

stop it kafka：

bin/kafka-topics.sh --create --zookeeper 192.168.0.95:2181 --replication-factor（ Specify the number of copies ） 1 --partitions 1 --topic sc

The producer of this architecture is filebeat The consumer is written by our host python Program

Kafka The data will land on the disk （/tmp/kadka-logs）

Kafka Logs can be cleared according to two dimensions

By time The default is 7 God

By size

Any one of the conditions according to time or size satisfies , Can start log cleaning

segment：kafka When saving the log, it is saved by segment In this way, we can divide time and size

When naming, it is stored in the current order

The assumptions are as follows segment

00.log 11.log 22.log

00.log When saved 0-11 A log of

11.log When saved 12-22 A log of

22.log Saved hour 22 The log after the entry

Kafka The information that creates producers and consumers is zk Inside

Connect zk：

Zookeeper Is a distributed open source configuration management service

（ A file tree contains a lot of information Many hosts monitor the file tree ）

（upstream Information from zookeeper Get it inside ）

Zk The file inside is ls Absolute path

Look at the data in the file Namely get Absolute path

zk The topic information inside ：

__consumer_offsets: What is saved is the offset of the consumer

After consumers' consumption , There will be an offset setting , The offset can be saved locally by the consumer , You can also save to kafka Server side , Save in kafka Server side words , Deposit __consumer_offests In the group （ By default, I give you points 50 Zones ）

Zk effect ：

Retain kafka Metadata （topic Copies, etc ）

The election controller（ Elect a kafka The whole cluster of controller This controller To coordinate the copy leader,follower The election ）（ This controller The election is based on preemption First come first served basis /controller）

So this version of kafka Can't get away from zookeeper

When the producer sends a message, it can select any one at random , There will be negotiations , This broker Will return you a copy leader Information about , The producer follows up leader Interaction

Exit and enter directly tmux Last status

tmux ls

tmux a -t 0

beats: This is a set of tools to collect and transmit information

beats The six tools of the family ：（ All are elk A member of the architecture ）

Filebeat： Lightweight delivery tool for forwarding and centralizing log data .Filebeat Monitor the log file or location you specify , Collect log events , And forward them to the appropriate location Such as ：elasticsearch redeis kafka etc.

One will be opened for each monitored file harvester

Input Component is to tell which file to monitor , And start a for each file Harvester