当前位置:网站首页>5 figures illustrate the container network
5 figures illustrate the container network
2022-06-26 15:09:00 【BOGO】
Using containers always feels like using magic . For those who understand the underlying principles, containers are easy to use , But it's a nightmare for people who don't understand . Fortunately , We have been studying container technology for a long time , Even successfully uncovered containers are isolated and restricted Linux process , Running containers does not require mirroring , And another aspect , To build an image, you need to run some containers .
Now it's time to solve the container network problem . Or to be more precise , Single host container network problem . This article will answer these questions :
- How to virtualize network resources , Let the container think it has an exclusive network ?
- How to make containers coexist peacefully , Will not interfere with each other , And can communicate with each other ?
- How to access the outside world from inside the container ( such as , Internet )?
- How to access containers on a machine from the outside world ( such as , Port Publishing )?
The end result is obvious , Single host container networks are known Linux A simple combination of functions :
- Network namespace (namespace)
- fictitious Ethernet equipment (veth)
- Virtual network switch ( bridge )
- IP Routing and network address translation (NAT)
And you don't need any code to make such network magic happen ……
Prerequisite
arbitrarily Linux Any distribution is OK . All the examples in this article are in vagrant CentOS 8 Executed on a virtual machine :
$ vagrant init centos/8
$ vagrant up
$ vagrant ssh
[[email protected] ~]$ uname -a
Linux localhost.localdomain 4.18.0-147.3.1.el8_1.x86_64
For the sake of simplicity , This article uses a containerized solution ( such as ,Docker perhaps Podman). We will focus on the basic concepts , And use the simplest tools to achieve learning goals .
network Namespace isolation container
Linux What are the parts of the network stack ? obviously , It's a series of network devices . Anything else? ? It may also include a series of routing rules . And don't forget ,netfilter hook, Include iptables Rules define .
We can quickly create a script that is not complicated inspect-net-stack.sh:
#!/usr/bin/env bash
echo "> Network devices"
ip link
echo -e "\n> Route table"
ip route
echo -e "\n> Iptables rules"
iptables --list-rules
Before running the script , Let's revise iptable rule:
$ sudo iptables -N ROOT_NS
After this , Execute the above script on the machine , Output is as follows :
$ sudo ./inspect-net-stack.sh
> Network devices
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
> Route table
default via 10.0.2.2 dev eth0 proto dhcp metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
> Iptables rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N ROOT_NS
We are interested in these outputs , Because make sure that each container you are about to create has its own independent network stack .
You may already know , A for container isolation Linux Namespace is a network namespace (network namespace). from man ip-netns You can see ,“ The network namespace is another logical copy of the network stack , It has its own route , Firewall rules and network devices .” To simplify , This is the only namespace used in this article . We didn't create a completely isolated container , Instead, limit the scope to the network stack .
One way to create a network namespace is ip Tools , It is iproute2 Part of :
$ sudo ip netns add netns0
$ ip netns
netns0
How to use the namespace just created ? A good command nsenter. Enter one or more specific namespaces , Then execute the specified script :
$ sudo nsenter --net=/var/run/netns/netns0 bash
# New bash The process is in netns0 in
$ sudo ./inspect-net-stack.sh
> Network devices 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> Route table
> Iptables rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
From the output above, you can clearly see bash The process is running in netns0 Namespace , At this time, you see a completely different network stack . There are no routing rules , There is no custom iptables chain, only one loopback Network devices .
Use virtual Ethernet equipment (veth) Connect the host to the container
If we can't communicate with a proprietary network stack , Then it seems useless . Fortunately, ,Linux Provides easy-to-use tools —— fictitious Ethernet equipment . from man veth You can see ,“veth The device is virtual Ethernet equipment . They can act as channels between network namespaces (tunnel), This creates a bridge to connect to physical network devices in another namespace , But it can also be used as an independent network device .”
fictitious Ethernet Devices usually appear in pairs . Never mind , Let's take a look at the script created :
$ sudo ip link add veth0 type veth peer name ceth0
With this simple command , We can create a pair of interconnected virtual Ethernet equipment . Default selection veth0 and ceth0 These two names .
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
5: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 66:2d:24:e3:49:3f brd ff:ff:ff:ff:ff:ff
6: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:e8:de:1d:22:e0 brd ff:ff:ff:ff:ff:ff
Created veth0 and ceth0 All on the host's network stack ( Also known as root Network namespace ) On . take netns0 Namespace connected to root Namespace , Need to leave a device in root Namespace , The other moved to netns0 in :
$ sudo ip link set ceth0 netns netns0
# List all devices , You can see ceth0 Has gone from root Disappeared from the stack
$ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
6: [email protected]: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:e8:de:1d:22:e0 brd ff:ff:ff:ff:ff:ff link-netns netns0
Once the device is enabled and the appropriate IP Address , The packet generated on one of the devices will immediately appear in its paired device , Thus connecting the two namespaces . from root Namespace start :
$ sudo ip link set veth0 up
$ sudo ip addr add 172.18.0.11/16 dev veth0
And then there was netns0:
$ sudo nsenter --net=/var/run/netns/netns0
$ ip link set lo up
$ ip link set ceth0 up
$ ip addr add 172.18.0.10/16 dev ceth0
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 66:2d:24:e3:49:3f brd ff:ff:ff:ff:ff:ff link-netnsid 0
Check connectivity :
# stay netns0 in ping root Of veth0
$ ping -c 2 172.18.0.11
PING 172.18.0.11 (172.18.0.11) 56(84) bytes of data.
64 bytes from 172.18.0.11: icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from 172.18.0.11: icmp_seq=2 ttl=64 time=0.040 ms
--- 172.18.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 58ms
rtt min/avg/max/mdev = 0.038/0.039/0.040/0.001 ms
# Leave netns0
$ exit
# stay root In namespace ping ceth0
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.073 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.046 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 0.046/0.059/0.073/0.015 ms
meanwhile , If you try from netns0 Namespace access to other addresses , It cannot succeed :
# stay root Namespace
$ ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
valid_lft 84057sec preferred_lft 84057sec
inet6 fe80::5054:ff:fee3:2777/64 scope link
valid_lft forever preferred_lft forever
# Remember here IP yes 10.0.2.15
$ sudo nsenter --net=/var/run/netns/netns0
# Try ping The host eth0
$ ping 10.0.2.15
connect: Network is unreachable
# Try connecting to the Internet
$ ping 8.8.8.8
connect: Network is unreachable
It's easy to understand . stay netns0 There is no route for such packets in the routing table . Unique entry How to get to 172.18.0.0/16 The Internet :
# stay netns0 Namespace :
$ ip route
172.18.0.0/16 dev ceth0 proto kernel scope link src 172.18.0.10
Linux There are several ways to establish routing tables . One is to extract the route directly from the network interface . remember , After the namespace is created , netns0 The routing table in is empty . But then we added ceth0 Equipment and assigned IP Address 172.18.0.0/16. Because we don't use simple IP Address , It's a combination of address and subnet mask , The network stack can extract routing information from it . The destination is 172.18.0.0/16 Every network packet will pass through ceth0 equipment . But other packets will be discarded . Allied ,root Namespace also has a new route :
# stay root Namespace :
$ ip route
# ... Ignore irrelevant lines ...
172.18.0.0/16 dev veth0 proto kernel scope link src 172.18.0.11
here , You can answer the first question . We learned how to isolate , Virtualize and connect Linux Network stack .
Use virtual networks switch( bridge ) Connect the container
The driving force of containerization is efficient resource sharing . therefore , It is not common to run only one container on a machine . contrary , The ultimate goal is to run as many isolated processes as possible in a shared environment . therefore , If according to the above veth programme , What happens when multiple containers are placed on the same host ? Let's try adding a second container .
# from root Namespace
$ sudo ip netns add netns1
$ sudo ip link add veth1 type veth peer name ceth1
$ sudo ip link set ceth1 netns netns1
$ sudo ip link set veth1 up
$ sudo ip addr add 172.18.0.21/16 dev veth1
$ sudo nsenter --net=/var/run/netns/netns1
$ ip link set lo up
$ ip link set ceth1 up
$ ip addr add 172.18.0.20/16 dev ceth1
Check connectivity :
# from netns1 Unable to connect root Namespace !
$ ping -c 2 172.18.0.21
PING 172.18.0.21 (172.18.0.21) 56(84) bytes of data.
From 172.18.0.20 icmp_seq=1 Destination Host Unreachable
From 172.18.0.20 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.21 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 55ms pipe 2
# But routing exists !
$ ip route
172.18.0.0/16 dev ceth1 proto kernel scope link src 172.18.0.20
# Leave netns1
$ exit
# from root Namespace cannot be connected netns1
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 172.18.0.11 icmp_seq=1 Destination Host Unreachable
From 172.18.0.11 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 23ms pipe 2
# from netns0 Can connect veth1
$ sudo nsenter --net=/var/run/netns/netns0
$ ping -c 2 172.18.0.21
PING 172.18.0.21 (172.18.0.21) 56(84) bytes of data.
64 bytes from 172.18.0.21: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 172.18.0.21: icmp_seq=2 ttl=64 time=0.046 ms
--- 172.18.0.21 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 33ms
rtt min/avg/max/mdev = 0.037/0.041/0.046/0.007 ms
# But it's still not connected netns1
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 172.18.0.10 icmp_seq=1 Destination Host Unreachable
From 172.18.0.10 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 63ms pipe 2
dizzy ! Something went wrong ……netns1 There is a problem . It cannot connect to root, And from root It can't be accessed in the namespace . however , Because both containers are in the same IP Network segment 172.18.0.0/16 in , from netns0 The container can access the host veth1.
It took some time to find out why , However, it is obvious that there is a routing problem . Check first root Namespace routing table :
$ ip route
# ... Ignore irrelevant lines ... #
172.18.0.0/16 dev veth0 proto kernel scope link src 172.18.0.11
172.18.0.0/16 dev veth1 proto kernel scope link src 172.18.0.21
A second... Has been added veth Yes, after that ,root The network stack knows the new route 172.18.0.0/16 dev veth1 proto kernel scope link src 172.18.0.21, But the routing of the network already exists before . When the second container tries ping veth1 when , The first selected routing rule is , This causes the network to be disconnected . If we delete the first route sudo ip route delete 172.18.0.0/16 dev veth0 proto kernel scope link src 172.18.0.11, Then recheck the connectivity , There should be no problem .netns1 Can connect , however netns0 No way. .
If we netns1 Select another network segment , It should be connected . however , Multiple containers in the same IP There should be reasonable usage scenarios on the network segment . therefore , We need to adjust veth programme .
Don't forget there's still Linux bridge —— Another virtualization network technology !Linux A bridge acts like a network switch. It forwards network packets between the interfaces connected to it . And because it is switch, It's in L2 The layer completes these forwarding .
Try this tool . But first of all , You need to clear the existing settings , Because some of the previous configurations are no longer needed . Delete network namespace :
$ sudo ip netns delete netns0
$ sudo ip netns delete netns1
$ sudo ip link delete veth0
$ sudo ip link delete ceth0
$ sudo ip link delete veth1
$ sudo ip link delete ceth1
Quickly rebuild two containers . Be careful , We didn't give new veth0 and veth1 The device allocates any IP Address :
$ sudo ip netns add netns0
$ sudo ip link add veth0 type veth peer name ceth0
$ sudo ip link set veth0 up
$ sudo ip link set ceth0 netns netns0
$ sudo nsenter --net=/var/run/netns/netns0
$ ip link set lo up
$ ip link set ceth0 up
$ ip addr add 172.18.0.10/16 dev ceth0
$ exit
$ sudo ip netns add netns1
$ sudo ip link add veth1 type veth peer name ceth1
$ sudo ip link set veth1 up
$ sudo ip link set ceth1 netns netns1
$ sudo nsenter --net=/var/run/netns/netns1
$ ip link set lo up
$ ip link set ceth1 up
$ ip addr add 172.18.0.20/16 dev ceth1
$ exit
Make sure there are no new routes on the host :
$ ip route
default via 10.0.2.2 dev eth0 proto dhcp metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
Finally, create the bridge interface :
$ sudo ip link add br0 type bridge
$ sudo ip link set br0 up
take veth0 and veth1 Connect to the bridge :
$ sudo ip link set veth0 master br0
$ sudo ip link set veth1 master br0
Check the connectivity between containers :
$ sudo nsenter --net=/var/run/netns/netns0
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
64 bytes from 172.18.0.20: icmp_seq=1 ttl=64 time=0.259 ms
64 bytes from 172.18.0.20: icmp_seq=2 ttl=64 time=0.051 ms
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 0.051/0.155/0.259/0.104 ms
$ sudo nsenter --net=/var/run/netns/netns1
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.089 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 36ms
rtt min/avg/max/mdev = 0.037/0.063/0.089/0.026 ms
Great ! Well done . With this new scheme , We don't need to configure at all veth0 and veth1. Only need ceth0 and ceth1 The endpoint is assigned two IP Address . But because they are all connected to the same Ethernet On ( remember , They are connected to virtual machines switch On ), Between L2 Layers are connected :
$ sudo nsenter --net=/var/run/netns/netns0
$ ip neigh
172.18.0.20 dev ceth0 lladdr 6e:9c:ae:02:60:de STALE
$ exit
$ sudo nsenter --net=/var/run/netns/netns1
$ ip neigh
172.18.0.10 dev ceth1 lladdr 66:f3:8c:75:09:29 STALE
$ exit
Great , We learned how to turn containers into friends , Let them not interfere with each other , But it can be connected .
Connect the outside world ( IP Routing and address camouflage (masquerading))
Containers can communicate with each other . But they can communicate with the host , such as root Namespace , Correspondence ?
$ sudo nsenter --net=/var/run/netns/netns0
$ ping 10.0.2.15 # eth0 address
connect: Network is unreachable
It's obvious here ,netns0 There is no route :
$ ip route
172.18.0.0/16 dev ceth0 proto kernel scope link src 172.18.0.10
root Namespace cannot communicate with container :
# use first exit Leave netns0:
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
From 213.51.1.123 icmp_seq=1 Destination Net Unreachable
From 213.51.1.123 icmp_seq=2 Destination Net Unreachable
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3ms
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 213.51.1.123 icmp_seq=1 Destination Net Unreachable
From 213.51.1.123 icmp_seq=2 Destination Net Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3ms
To build root Connectivity with container namespaces , We need to assign... To the bridge network interface IP Address :
$ sudo ip addr add 172.18.0.1/16 dev br0
Once the bridge network interface is assigned IP Address , There will be one more route in the host's routing table :
$ ip route
# ... Ignore irrelevant lines ...
172.18.0.0/16 dev br0 proto kernel scope link src 172.18.0.1
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.049 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 11ms
rtt min/avg/max/mdev = 0.036/0.042/0.049/0.009 ms
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
64 bytes from 172.18.0.20: icmp_seq=1 ttl=64 time=0.059 ms
64 bytes from 172.18.0.20: icmp_seq=2 ttl=64 time=0.056 ms
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 4ms
rtt min/avg/max/mdev = 0.056/0.057/0.059/0.007 ms
Containers may also work ping bridge interface , But they still can't connect to the host eth0. You need to add a default route for the container :
$ sudo nsenter --net=/var/run/netns/netns0
$ ip route add default via 172.18.0.1
$ ping -c 2 10.0.2.15
PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
64 bytes from 10.0.2.15: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from 10.0.2.15: icmp_seq=2 ttl=64 time=0.053 ms
--- 10.0.2.15 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 14ms
rtt min/avg/max/mdev = 0.036/0.044/0.053/0.010 ms
# by `netns1` Also do the above configuration
This change basically turns the host into a router , And the bridge interface becomes the default gateway between containers .
very good , We connect the container to root Namespace . Now? , Continue trying to connect them to the outside world .Linux By default disable Network packet forwarding ( such as , Routing functions ). We need to enable this function first :
# stay root Namespace
sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
Check connectivity again :
$ sudo nsenter --net=/var/run/netns/netns0
$ ping 8.8.8.8
# hung Live in the ...
Still not working . What's wrong ? If the container can be contracted out , Then the target server cannot send the package back to the container , Because of the container IP The address is private , That particular IP Only the local network knows the routing rules . And many containers share exactly the same private IP Address 172.18.0.10. The solution to this problem is called network address translation (NAT). Before reaching the external network , The package sent by the container will send the source IP Replace the address with the external network address of the host . The host also keeps track of all existing mappings , Will recover the previously replaced before forwarding the packet back to the container IP Address . Sounds complicated , But there's good news !iptables Module allows us to do all this with just one command :
$ sudo iptables -t nat -A POSTROUTING -s 172.18.0.0/16 ! -o br0 -j MASQUERADE
The order is very simple . stay nat A new entry has been added to the list POSTROUTING chain The new route of , Will replace all camouflage from 172.18.0.0/16 Packet of network , But not through the bridge interface .
Check connectivity :
$ sudo nsenter --net=/var/run/netns/netns0
$ ping -c 2 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=61 time=43.2 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=61 time=36.8 ms
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 36.815/40.008/43.202/3.199 ms
You know the default policy we use here —— Allow all traffic , This is very dangerous in the real environment . The default of the host iptables Strategy is ACCEPT:
sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
Docker All traffic is limited by default , Then enable routing only for known paths .
Here's how CentOS 8 On the machine , A single container exposes ports 5005 when , from Docker daemon Generated rules :
$ sudo iptables -t filter --list-rules
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 5000 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
$ sudo iptables -t nat --list-rules
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P POSTROUTING ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 5000 -j MASQUERADE
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A DOCKER -i docker0 -j RETURN
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 5005 -j DNAT --to-destination 172.17.0.2:5000
$ sudo iptables -t mangle --list-rules
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
$ sudo iptables -t raw --list-rules
-P PREROUTING ACCEPT
-P OUTPUT ACCEPT
Make containers accessible to the outside world ( Port Publishing )
As we all know, you can publish container ports to some ( Or all ) Host interface . But what exactly does port publishing mean ?
Suppose a server is running inside the container :
$ sudo nsenter --net=/var/run/netns/netns0
$ python3 -m http.server --bind 172.18.0.10 5000
If we try to send a message from the host HTTP Request to this server , Everything works well (root There are links between namespace and all container interfaces , Of course, you can connect successfully ):
# from root Namespace
$ curl 172.18.0.10:5000
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
# ... Ignore irrelevant lines ...
however , If you want to access this server from outside , Which should I use IP Well ? The only thing we know IP Is the external interface address of the host eth0:
$ curl 10.0.2.15:5000
curl: (7) Failed to connect to 10.0.2.15 port 5000: Connection refused
therefore , We need to find a way , Able to reach the host eth0 5000 All packets of the port are forwarded to the destination 172.18.0.10:5000. again i ptables To help. !
# External flow
sudo iptables -t nat -A PREROUTING -d 10.0.2.15 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 172.18.0.10:5000
# Local traffic ( Because it didn't pass PREROUTING chain)
sudo iptables -t nat -A OUTPUT -d 10.0.2.15 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 172.18.0.10:5000
in addition , Need make iptables It can intercept traffic on the bridge network :
sudo modprobe br_netfilter
test :
curl 10.0.2.15:5000
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
# ... Ignore irrelevant lines ...
understand Docker Network driven
How can we use this knowledge ? such as , Try to understand Docke r Network mode [1].
from --network host Mode start . Try comparing commands ip link and sudo docker run -it --rm --network host alpine ip link Output . They are almost the same ! stay host In mode ,Docker Simply do not use network namespace isolation , The container is just root Work in the network namespace , And share the network stack with the host .
The next pattern is --network none.sudo docker run -it --rm --network host alpine ip link There is only one output loopback Network interface . This is the same as the previously created network namespace , Does not add veth The front of the equipment is very similar .
And finally --network bridge( Default ) Pattern . This is the pattern we tried to create earlier . You can try ip and iptables command , Look at the network stack from the perspective of host and container respectively .
rootless Containers and networks
Podman A good feature of container manager is that it focuses on rootless Containers . however , You might notice , This article uses a lot of sudo command . explain , No, root Permissions failed to configure network .Podman stay root Solutions on the network [2] and Docker Very similar . But in rootless On the container ,Podman Used slirp4netns[3] project :
from Linux 3.8 Start , Non privileged users can create user_namespaces(7) At the same time create network_namespaces(7). however , Nonprivileged network namespace is not very useful , Because between the host and the network namespace veth(4) Still need root jurisdiction
slirp4netns The network namespace can be connected to... In a completely non privileged way Internet On , Through a in the network namespace TAP The device is connected to the user interface TCP/IP Stack (slirp).
rootless The network is very limited :“ Technically speaking , The container itself doesn't have IP Address , Because no root jurisdiction , The association of network devices cannot be realized . in addition , from rootless Containers ping It won't work , Because it lacks CAP_NET_RAW Safety capability , And this is ping The command is required .” But it's still better than no connection at all .
Conclusion
The scheme of organizing container network introduced in this paper is only one of the possible schemes ( Probably the most widely used ). There are many other ways , Implemented by official or third-party plug-ins , But all these schemes rely heavily on Linux Network virtualization technology [4]. therefore , Containerization can be considered as a virtualization technology .
Related links :
- https://docs.docker.com/network/#network-drivers
- https://www.redhat.com/sysadmin/container-networking-podman
- https://github.com/rootless-containers/slirp4netns
- https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/
Link to the original text :https://iximiuz.com/en/posts/container-networking-is-simple/
边栏推荐
- The engine "node" is inconsistent with this module
- 一篇抄十篇,CVPR Oral被指大量抄袭,大会最后一天曝光!
- Informatics Olympiad 1405: sum and product of prime numbers (thinking problem)
- SAP gui 770 下载
- Program analysis and Optimization - 8 register allocation
- English语法_形容词/副词3级 - 原级句型
- 数据库-视图
- Pod scheduling of kubernetes
- Is it safe to open a stock account through the account opening link given by the broker manager? I want to open an account
- Practical website recommendations worth collecting for College Students
猜你喜欢

1.会计基础--会计的几大要素(会计总论、会计科目和账户)
![[cloud native] codeless IVX editor programmable by](/img/10/7c56e46df69be6be522a477b00ec05.png)
[cloud native] codeless IVX editor programmable by "everyone"

Minister of investment of Indonesia: Hon Hai is considering establishing electric bus system and urban Internet of things in its new capital

Redis cluster

TCP congestion control details | 1 summary

View触摸分析

Program analysis and Optimization - 8 register allocation

Unity 利用Skybox Panoramic着色器制作全景图预览有条缝隙问题解决办法

设计人员拿到的工程坐标系等高线CAD图如何加载进图新地球

Notes on writing questions in C language -- table tennis competition
随机推荐
TCP拥塞控制详解 | 1. 概述
Notes on writing questions in C language -- table tennis competition
网上找客户经理办理股票开户安全吗??
获取两个dataframe的交并差集
打新债注册开户安全吗,有没有什么风险?
北京银行x华为:网络智能运维夯实数字化转型服务底座
R language uses GLM function to build Poisson logarithm linear regression model, processes three-dimensional contingency table data to build saturation model, uses step function to realize stepwise re
Restcloud ETL extracting dynamic library table data
Is it safe to open a stock account with the account manager online??
Unity uses skybox panoramic shader to make panorama preview. There is a gap. Solution
【TcaplusDB知识库】TcaplusDB系统管理介绍
程序分析与优化 - 8 寄存器分配
R language dplyr package summary_ The at function calculates the mean and median of multiple data columns (specified by vectors) in the dataframe data, and specifies na RM parameter configuration dele
Restcloud ETL extraction de données de table de base de données dynamique
Use abp Zero builds a third-party login module (I): Principles
Informatics Olympiad 1405: sum and product of prime numbers (thinking problem)
一篇抄十篇,CVPR Oral被指大量抄袭,大会最后一天曝光!
使用 Abp.Zero 搭建第三方登录模块(一):原理篇
R语言glm函数逻辑回归模型、使用epiDisplay包logistic.display函数获取模型汇总统计信息(自变量初始和调整后的优势比及置信区间,回归系数的Wald检验的p值)、结果保存到csv
5张图诠释了容器网络