当前位置:网站首页>Flannel's host GW and calico

Flannel's host GW and calico

2022-06-26 11:13:00 Famine - Yu Xi

Modify cluster as flannel host-gw Pattern

Configure the cluster to use :

modify configmap

kubectl edit -n kube-system configmaps kube-flannel-cfg

Modify the content :

...
  net-conf.json: | { "Network": "10.244.0.0/16", "Backend": { "Type": "vxlan" } }
...

"Type": "vxlan" Change it to "Type": "host-gw"

Restart the service

kubectl rollout restart -n kube-system daemonset kube-flannel-ds

Check whether the startup is successful

kubectl logs -n kube-system kube-flannel-ds-467p2|grep "host-gw"

Check the node routing table :

[[email protected] ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.88.8.254     0.0.0.0         UG    100    0        0 ens192
10.88.8.0       0.0.0.0         255.255.252.0   U     100    0        0 ens192
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.1.0      10.88.10.182    255.255.255.0   UG    0      0        0 ens192
10.244.2.0      10.88.10.183    255.255.255.0   UG    0      0        0 ens192
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

host-gw In the pattern , The subnetwork gateway in the routing table corresponds to the host of the subnetwork IP. And in the VXLAN in , In the routing table flannel The devices corresponding to the subnet are flannel.1.

host-gw

host-gw It is a three-layer cross host network solution , differ VXLAN Virtual layer-2 network of mode , He is based on IP Address to judge .

Different in the same host pod Communication directly bypasses , Start on different hosts pod signal communication .

The last... Is also used during the test pod, A packet from master Node pod(IP10.244.0.5) Sent to worker1 Node pod IP10.244.1.17.

[[email protected] ~]# kubectl get pod -o wide 
NAME                             READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
network-tools-66b6674fd9-pjpz6   1/1     Running   0          27h   10.244.0.5    master    <none>           <none>
network-tools-66b6674fd9-tmgcs   1/1     Running   0          27h   10.244.2.13   worker2   <none>           <none>
network-tools-66b6674fd9-zk58f   1/1     Running   0          27h   10.244.1.17   worker1   <none>           <none>-

First of all to enter master Node pod Check the routing table below :

[email protected]:/# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.244.0.1      0.0.0.0         UG    0      0        0 eth0
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.244.0.0      10.244.0.1      255.255.0.0     UG    0      0        0 eth0

It's no different from before ,0.0.0.0 The default rules ,10.244.0.0 gateway 0.0.0.0 Is this machine pod The subnet segment direct connection rule is also skipped , Next, we will directly follow the third route , Go to the host computer to query the network card and see if it contains this IP.

[[email protected] ~]# ifconfig |grep "10.244.0.1" -B 5 -A 3
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.0.1  netmask 255.255.255.0  broadcast 10.244.0.255
        inet6 fe80::f03d:adff:fea2:563e  prefixlen 64  scopeid 0x20<link>
        ether f2:3d:ad:a2:56:3e  txqueuelen 1000  (Ethernet)
        RX packets 1988416  bytes 269346410 (256.8 MiB)

Is still cni0.

cni0 yes Kubernetes Automatically replace docker0 A device created by the bridge . So no matter what network ,flannel Good calico Let it be , Just face cni0 This bridge , That is to say, it conforms to Kubernetes The rules can be , You don't need to care what containers are used at the bottom to match what networks .

Follow the same routine as before , Flow from cni0 It flows out to the host network , Further match the host route , View the host routing :

[[email protected] ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.88.8.254     0.0.0.0         UG    100    0        0 ens192
10.88.8.0       0.0.0.0         255.255.252.0   U     100    0        0 ens192
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.1.0      10.88.10.182    255.255.255.0   UG    0      0        0 ens192
10.244.2.0      10.88.10.183    255.255.255.0   UG    0      0        0 ens192
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

It's different from here . Of the corresponding target subnet Use Iface No more flannel.1 Virtual devices , It's the host's ens192 The network card , At the same time, the gateway has become 10.88.10.182, This is the node machine of a host network segment IP.

adopt ip route Command for more detailed results :

[[email protected] ~]# ip route
default via 10.88.8.254 dev ens192 proto static metric 100 
10.88.8.0/22 dev ens192 proto kernel scope link src 10.88.10.181 metric 100 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.88.10.182 dev ens192 
10.244.2.0/24 via 10.88.10.183 dev ens192 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 

You can see each flannel The subnet of also corresponds to a via 10.88.10.182,via That means " Next jump " The address of . And from ens192 Send out the network card .
This is a rule that the host can read directly , So packets from cni0 When the Internet comes out , It will be directly packaged by the host ,ens192 The device will use the next hop mac Address to encapsulate layer 2 data frames ( The reason why there is no IP The layer is because the packets sent from the host are IP package , So there is no need to encapsulate ), Then the data packets will come to the network card of the host through the physical network .

After the peer host receives the packet , Unpack the second layer , Direct basis IP The address matches its own route :

[[email protected] ~]# route -n 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.88.8.254     0.0.0.0         UG    100    0        0 ens192
10.88.8.0       0.0.0.0         255.255.252.0   U     100    0        0 ens192
10.244.0.0      10.88.10.181    255.255.255.0   UG    0      0        0 ens192
10.244.1.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.2.0      10.88.10.183    255.255.255.0   UG    0      0        0 ens192
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Of the destination address 10.244.1.17 More in line with the fourth rule , default gateway 0.0.0.0, This means that this is a direct connection rule , The corresponding equipment is cni0.
Packets naturally pass through cni0 Sent to the corresponding pod.

Because the host node's IP As a packet " Next jump " Address , therefore host-gw The mode requires that the host layer-2 network must be interconnected
Three layers IP Source of layer package IP And purpose IP It's actually a container IP. So this package should be written on the fourth floor worker2 Node mac Address , To forward it . And in order to make worker1 The node receives this packet , The address of the next hop is used .

Conversion to Kubernetes We have to add iptables The rules of , So the forwarding at this time is :
 Insert picture description here
Obviously, a layer is missing flannel.1 Forwarding , therefore host-gw Performance comparison VXLAN It has been improved , According to the rumor host-gw Compared with the direct transmission performance loss of the host computer, the mode has a loss of about 10%, and VXLAN It's in 20%~30% Between .

calico

calico Network switching

calico Your network is even worse . He doesn't even use the bridge , Use it directly Veth Pair equipment , Dock the container to the host ( Virtual devices created cali start ).
Or suppose master node pod(IP:10.244.235.130) visit worker1 Of pod(IP:10.244.235.129).

[[email protected] ~]# kubectl get pod -o wide 
NAME                             READY   STATUS    RESTARTS   AGE     IP               NODE      NOMINATED NODE   READINESS GATES
network-tools-66b6674fd9-kf77w   1/1     Running   0          7m37s   10.244.235.130   worker1   <none>           <none>
network-tools-66b6674fd9-rqf8x   1/1     Running   0          7m37s   10.244.189.67    worker2   <none>           <none>
network-tools-66b6674fd9-wppm4   1/1     Running   0          7m37s   10.244.235.129   worker1   <none>           <none>

After the packet is transmitted to the host node , Start routing directly according to the host node

[[email protected] ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.88.8.254     0.0.0.0         UG    100    0        0 ens192
10.88.8.0       0.0.0.0         255.255.252.0   U     100    0        0 ens192
10.244.0.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.1.0      10.88.10.182    255.255.255.255 UGH   0      0        0 tunl0
10.244.1.0      10.88.10.182    255.255.255.0   UG    0      0        0 ens192
10.244.2.0      10.88.10.183    255.255.255.255 UGH   0      0        0 tunl0
10.244.2.0      10.88.10.183    255.255.255.0   UG    0      0        0 ens192
10.244.189.64   10.88.10.183    255.255.255.192 UG    0      0        0 tunl0
10.244.219.64   0.0.0.0         255.255.255.192 U     0      0        0 *
10.244.219.65   0.0.0.0         255.255.255.255 UH    0      0        0 cali88526110d1d
10.244.235.128  10.88.10.182    255.255.255.192 UG    0      0        0 tunl0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

He obviously fits the penultimate rule :

10.244.235.128  10.88.10.182    255.255.255.192 UG    0      0        0 tunl0

His gateway is 10.88.10.182, That is to say worker1 Node host IP, This is very similar to flannel Of host-gw The way , It also configures the address of the next hop to specify .
But this package has to go through a tunl0 Send out your equipment . Here comes the bag tunl0 Then it will be encapsulated again , The container will be sent out at this time IP package As a packet , Reseal a IP layer ( Is to send out the container with the source IP And the target IP This layer, together with its own data, is treated as a packet , Reseal three layers on top of him ) But this time the package IP layer , It directly covers up the original IP layer ," camouflage " Become a slave master To worker1 Communication package of the host , from ens192 The network card sends out the data transmitted through the host network .

and worker1 After receiving it , Similarly, the host unpacks first , Next, I'll give it to tunl0 equipment , He will restore the three-layer packets sent by the original container , That is, to the target IP10.244.235.129 This floor .
Next, where will the packet be forwarded , It depends on worker1 The routing of the node host .

calico In mode , For every one created pod after calico Will create a corresponding virtual network connection to the host , At the same time, add a route to the host , Record the contents of the container IP Relationship with corresponding equipment , This is similar to a " Border gateway "

[[email protected] ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.88.8.254     0.0.0.0         UG    100    0        0 ens192
10.88.8.0       0.0.0.0         255.255.252.0   U     100    0        0 ens192
10.244.0.0      10.88.10.181    255.255.255.255 UGH   0      0        0 tunl0
10.244.0.0      10.88.10.181    255.255.255.0   UG    0      0        0 ens192
10.244.1.0      0.0.0.0         255.255.255.0   U     0      0        0 cni0
10.244.2.0      10.88.10.183    255.255.255.255 UGH   0      0        0 tunl0
10.244.2.0      10.88.10.183    255.255.255.0   UG    0      0        0 ens192
10.244.189.64   10.88.10.183    255.255.255.192 UG    0      0        0 tunl0
10.244.219.64   10.88.10.181    255.255.255.192 UG    0      0        0 tunl0
10.244.235.128  0.0.0.0         255.255.255.192 U     0      0        0 *
10.244.235.129  0.0.0.0         255.255.255.255 UH    0      0        0 calia5b07904eb8
10.244.235.130  0.0.0.0         255.255.255.255 UH    0      0        0 calif1f8fa46c64
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Next, hit the penultimate route

10.244.235.129  0.0.0.0         255.255.255.255 UH    0      0        0 calia5b07904eb8

The packets will go directly through calia5b07904eb8 This device forwards , This device is connected to the container at the other end of the host (Veth Pair equipment ), therefore , Packets are forwarded directly into the container .

This needs to be used tunl0 The mode for unpacking packets is called IPIP Pattern ( It's very figurative , One IP There is another layer behind it IP layer ), The performance and flannel Of VXLAN It's almost the same . But its advantage is that it can communicate in an environment that requires routing and forwarding ( The nodes in the cluster are distributed in two LANs ).

flannel VXLAN The design idea is to cover an existing three-layer network On the second floor The Internet , So he needs to be based on mac Address to forward packets , and host-gw The mode needs to use the peer host in the routing table IP To configure your own packets " Next jump " Address , It is also required that the layer-2 network must be unobstructed . and calico Of IPIP After the mode is encapsulated, the data packet is disguised as a communication transmission initiated by one node to another node , So his packets can be forwarded through the router . Although the performance is somewhat degraded , But more nodes can be supported ( It is necessary to ensure that two nodes in different network segments can communicate through the router ).calico There is another mode that does not require IPIP Further packets of , But it also uses the peer host IP The address is configured to " Next jump " Of IP How to address .

calico The second mode of

Node-to-Node Mesh Pattern

calico Used a **BGP(Border Gateway Protocol, Border gateway protocol )** To maintain the routing information of each node .

BGP A small program will be run at each node , They transmit their routing table information to other nodes . The program of other nodes will analyze it after receiving it , Then add it to the routing table of your own node .

For each node BGP The program and other constituent clusters synchronize their routing table information with each other

In this mode , The container's packets pass through Veth Pair The device goes directly to the host computer , No further encapsulation is required to directly match the routing table of the host , The host routing table will be directly BGP Add a container network segment to the end host , The gateway is for the end-to-end host IP Address , The sending device is a route of the physical network card of the host , It's like this :

10.244.1.0      10.88.10.182    255.255.255.0   UG    0      0        0 ens192

such , Packets directly match this route , Treat the network card of the other host as a " Router "( Next hop address ), After arriving at the opposite host, continue to match the routing table of the opposite host ,calico Each container of the current host IP And the corresponding Veth Pair Add the device name to the routing table , It's like this :

10.244.235.130  0.0.0.0         255.255.255.255 UH    0      0        0 calif1f8fa46c64

such , The packet is directly matched with this route and forwarded to pod in .

Border gateway

 Insert picture description here

Pictured , If you want to from 10.10.0.2 Host access 172.17.0.2 The host can access the past . Because when he sends this packet, he finds that the peer host and himself are not in the same network segment through mask operation , So send this packet to the gateway .
When the packet arrives at the gateway , The router unpacks and gets to the third floor IP package , In the routing table of the gateway 172.17.0.2 Route2 Indicates that the packet of this network segment is to be sent to another router Route2, So this bag is route1 Forwarded to route2, meanwhile route2 This is recorded in the routing table of IP The corresponding port . Directly forward to the corresponding host through this port .

But the other way around ,172.17.0.2 To visit 10.10.0.2, It won't work at all . Because his gateway does not have any information from another LAN . therefore as1 This LAN accesses as2 There is no problem with the host in the LAN , And the reverse is completely impassable . This kind of router connects two network segments together , Or there is a subnet of another router in the list of routers , Call it " Border gateway ".

on top calico In the network , Each node is treated as a " Border gateway ". Together they form a large network , Each border gateway passes through BGP Protocol to synchronize routing with each other . But the more nodes , The more routing information needs to be synchronized , therefore calico One more Route Reflector Pattern , In this mode calico Several nodes will be selected separately to establish contact with all border gateways and synchronize routing information . Other nodes only need to synchronize information with these nodes .

原网站

版权声明
本文为[Famine - Yu Xi]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206261024575397.html