当前位置:网站首页>5 figures illustrate the container network
5 figures illustrate the container network
2022-06-26 15:09:00 【BOGO】
Using containers always feels like using magic . For those who understand the underlying principles, containers are easy to use , But it's a nightmare for people who don't understand . Fortunately , We have been studying container technology for a long time , Even successfully uncovered containers are isolated and restricted Linux process , Running containers does not require mirroring , And another aspect , To build an image, you need to run some containers .
Now it's time to solve the container network problem . Or to be more precise , Single host container network problem . This article will answer these questions :
- How to virtualize network resources , Let the container think it has an exclusive network ?
- How to make containers coexist peacefully , Will not interfere with each other , And can communicate with each other ?
- How to access the outside world from inside the container ( such as , Internet )?
- How to access containers on a machine from the outside world ( such as , Port Publishing )?
The end result is obvious , Single host container networks are known Linux A simple combination of functions :
- Network namespace (namespace)
- fictitious Ethernet equipment (veth)
- Virtual network switch ( bridge )
- IP Routing and network address translation (NAT)
And you don't need any code to make such network magic happen ……
Prerequisite
arbitrarily Linux Any distribution is OK . All the examples in this article are in vagrant CentOS 8 Executed on a virtual machine :
$ vagrant init centos/8
$ vagrant up
$ vagrant ssh
[[email protected] ~]$ uname -a
Linux localhost.localdomain 4.18.0-147.3.1.el8_1.x86_64
For the sake of simplicity , This article uses a containerized solution ( such as ,Docker perhaps Podman). We will focus on the basic concepts , And use the simplest tools to achieve learning goals .
network Namespace isolation container
Linux What are the parts of the network stack ? obviously , It's a series of network devices . Anything else? ? It may also include a series of routing rules . And don't forget ,netfilter hook, Include iptables Rules define .
We can quickly create a script that is not complicated inspect-net-stack.sh:
#!/usr/bin/env bash
echo "> Network devices"
ip link
echo -e "\n> Route table"
ip route
echo -e "\n> Iptables rules"
iptables --list-rules
Before running the script , Let's revise iptable rule:
$ sudo iptables -N ROOT_NS
After this , Execute the above script on the machine , Output is as follows :
$ sudo ./inspect-net-stack.sh
> Network devices
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
> Route table
default via 10.0.2.2 dev eth0 proto dhcp metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
> Iptables rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-N ROOT_NS
We are interested in these outputs , Because make sure that each container you are about to create has its own independent network stack .
You may already know , A for container isolation Linux Namespace is a network namespace (network namespace). from man ip-netns You can see ,“ The network namespace is another logical copy of the network stack , It has its own route , Firewall rules and network devices .” To simplify , This is the only namespace used in this article . We didn't create a completely isolated container , Instead, limit the scope to the network stack .
One way to create a network namespace is ip Tools , It is iproute2 Part of :
$ sudo ip netns add netns0
$ ip netns
netns0
How to use the namespace just created ? A good command nsenter. Enter one or more specific namespaces , Then execute the specified script :
$ sudo nsenter --net=/var/run/netns/netns0 bash
# New bash The process is in netns0 in
$ sudo ./inspect-net-stack.sh
> Network devices 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> Route table
> Iptables rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
From the output above, you can clearly see bash The process is running in netns0 Namespace , At this time, you see a completely different network stack . There are no routing rules , There is no custom iptables chain, only one loopback Network devices .
Use virtual Ethernet equipment (veth) Connect the host to the container
If we can't communicate with a proprietary network stack , Then it seems useless . Fortunately, ,Linux Provides easy-to-use tools —— fictitious Ethernet equipment . from man veth You can see ,“veth The device is virtual Ethernet equipment . They can act as channels between network namespaces (tunnel), This creates a bridge to connect to physical network devices in another namespace , But it can also be used as an independent network device .”
fictitious Ethernet Devices usually appear in pairs . Never mind , Let's take a look at the script created :
$ sudo ip link add veth0 type veth peer name ceth0
With this simple command , We can create a pair of interconnected virtual Ethernet equipment . Default selection veth0 and ceth0 These two names .
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
5: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 66:2d:24:e3:49:3f brd ff:ff:ff:ff:ff:ff
6: [email protected]: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:e8:de:1d:22:e0 brd ff:ff:ff:ff:ff:ff
Created veth0 and ceth0 All on the host's network stack ( Also known as root Network namespace ) On . take netns0 Namespace connected to root Namespace , Need to leave a device in root Namespace , The other moved to netns0 in :
$ sudo ip link set ceth0 netns netns0
# List all devices , You can see ceth0 Has gone from root Disappeared from the stack
$ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
6: [email protected]: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 96:e8:de:1d:22:e0 brd ff:ff:ff:ff:ff:ff link-netns netns0
Once the device is enabled and the appropriate IP Address , The packet generated on one of the devices will immediately appear in its paired device , Thus connecting the two namespaces . from root Namespace start :
$ sudo ip link set veth0 up
$ sudo ip addr add 172.18.0.11/16 dev veth0
And then there was netns0:
$ sudo nsenter --net=/var/run/netns/netns0
$ ip link set lo up
$ ip link set ceth0 up
$ ip addr add 172.18.0.10/16 dev ceth0
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: [email protected]: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether 66:2d:24:e3:49:3f brd ff:ff:ff:ff:ff:ff link-netnsid 0
Check connectivity :
# stay netns0 in ping root Of veth0
$ ping -c 2 172.18.0.11
PING 172.18.0.11 (172.18.0.11) 56(84) bytes of data.
64 bytes from 172.18.0.11: icmp_seq=1 ttl=64 time=0.038 ms
64 bytes from 172.18.0.11: icmp_seq=2 ttl=64 time=0.040 ms
--- 172.18.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 58ms
rtt min/avg/max/mdev = 0.038/0.039/0.040/0.001 ms
# Leave netns0
$ exit
# stay root In namespace ping ceth0
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.073 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.046 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 3ms
rtt min/avg/max/mdev = 0.046/0.059/0.073/0.015 ms
meanwhile , If you try from netns0 Namespace access to other addresses , It cannot succeed :
# stay root Namespace
$ ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:e3:27:77 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
valid_lft 84057sec preferred_lft 84057sec
inet6 fe80::5054:ff:fee3:2777/64 scope link
valid_lft forever preferred_lft forever
# Remember here IP yes 10.0.2.15
$ sudo nsenter --net=/var/run/netns/netns0
# Try ping The host eth0
$ ping 10.0.2.15
connect: Network is unreachable
# Try connecting to the Internet
$ ping 8.8.8.8
connect: Network is unreachable
It's easy to understand . stay netns0 There is no route for such packets in the routing table . Unique entry How to get to 172.18.0.0/16 The Internet :
# stay netns0 Namespace :
$ ip route
172.18.0.0/16 dev ceth0 proto kernel scope link src 172.18.0.10
Linux There are several ways to establish routing tables . One is to extract the route directly from the network interface . remember , After the namespace is created , netns0 The routing table in is empty . But then we added ceth0 Equipment and assigned IP Address 172.18.0.0/16. Because we don't use simple IP Address , It's a combination of address and subnet mask , The network stack can extract routing information from it . The destination is 172.18.0.0/16 Every network packet will pass through ceth0 equipment . But other packets will be discarded . Allied ,root Namespace also has a new route :
# stay root Namespace :
$ ip route
# ... Ignore irrelevant lines ...
172.18.0.0/16 dev veth0 proto kernel scope link src 172.18.0.11
here , You can answer the first question . We learned how to isolate , Virtualize and connect Linux Network stack .
Use virtual networks switch( bridge ) Connect the container
The driving force of containerization is efficient resource sharing . therefore , It is not common to run only one container on a machine . contrary , The ultimate goal is to run as many isolated processes as possible in a shared environment . therefore , If according to the above veth programme , What happens when multiple containers are placed on the same host ? Let's try adding a second container .
# from root Namespace
$ sudo ip netns add netns1
$ sudo ip link add veth1 type veth peer name ceth1
$ sudo ip link set ceth1 netns netns1
$ sudo ip link set veth1 up
$ sudo ip addr add 172.18.0.21/16 dev veth1
$ sudo nsenter --net=/var/run/netns/netns1
$ ip link set lo up
$ ip link set ceth1 up
$ ip addr add 172.18.0.20/16 dev ceth1
Check connectivity :
# from netns1 Unable to connect root Namespace !
$ ping -c 2 172.18.0.21
PING 172.18.0.21 (172.18.0.21) 56(84) bytes of data.
From 172.18.0.20 icmp_seq=1 Destination Host Unreachable
From 172.18.0.20 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.21 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 55ms pipe 2
# But routing exists !
$ ip route
172.18.0.0/16 dev ceth1 proto kernel scope link src 172.18.0.20
# Leave netns1
$ exit
# from root Namespace cannot be connected netns1
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 172.18.0.11 icmp_seq=1 Destination Host Unreachable
From 172.18.0.11 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 23ms pipe 2
# from netns0 Can connect veth1
$ sudo nsenter --net=/var/run/netns/netns0
$ ping -c 2 172.18.0.21
PING 172.18.0.21 (172.18.0.21) 56(84) bytes of data.
64 bytes from 172.18.0.21: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 172.18.0.21: icmp_seq=2 ttl=64 time=0.046 ms
--- 172.18.0.21 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 33ms
rtt min/avg/max/mdev = 0.037/0.041/0.046/0.007 ms
# But it's still not connected netns1
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 172.18.0.10 icmp_seq=1 Destination Host Unreachable
From 172.18.0.10 icmp_seq=2 Destination Host Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 63ms pipe 2
dizzy ! Something went wrong ……netns1 There is a problem . It cannot connect to root, And from root It can't be accessed in the namespace . however , Because both containers are in the same IP Network segment 172.18.0.0/16 in , from netns0 The container can access the host veth1.
It took some time to find out why , However, it is obvious that there is a routing problem . Check first root Namespace routing table :
$ ip route
# ... Ignore irrelevant lines ... #
172.18.0.0/16 dev veth0 proto kernel scope link src 172.18.0.11
172.18.0.0/16 dev veth1 proto kernel scope link src 172.18.0.21
A second... Has been added veth Yes, after that ,root The network stack knows the new route 172.18.0.0/16 dev veth1 proto kernel scope link src 172.18.0.21, But the routing of the network already exists before . When the second container tries ping veth1 when , The first selected routing rule is , This causes the network to be disconnected . If we delete the first route sudo ip route delete 172.18.0.0/16 dev veth0 proto kernel scope link src 172.18.0.11, Then recheck the connectivity , There should be no problem .netns1 Can connect , however netns0 No way. .
If we netns1 Select another network segment , It should be connected . however , Multiple containers in the same IP There should be reasonable usage scenarios on the network segment . therefore , We need to adjust veth programme .
Don't forget there's still Linux bridge —— Another virtualization network technology !Linux A bridge acts like a network switch. It forwards network packets between the interfaces connected to it . And because it is switch, It's in L2 The layer completes these forwarding .
Try this tool . But first of all , You need to clear the existing settings , Because some of the previous configurations are no longer needed . Delete network namespace :
$ sudo ip netns delete netns0
$ sudo ip netns delete netns1
$ sudo ip link delete veth0
$ sudo ip link delete ceth0
$ sudo ip link delete veth1
$ sudo ip link delete ceth1
Quickly rebuild two containers . Be careful , We didn't give new veth0 and veth1 The device allocates any IP Address :
$ sudo ip netns add netns0
$ sudo ip link add veth0 type veth peer name ceth0
$ sudo ip link set veth0 up
$ sudo ip link set ceth0 netns netns0
$ sudo nsenter --net=/var/run/netns/netns0
$ ip link set lo up
$ ip link set ceth0 up
$ ip addr add 172.18.0.10/16 dev ceth0
$ exit
$ sudo ip netns add netns1
$ sudo ip link add veth1 type veth peer name ceth1
$ sudo ip link set veth1 up
$ sudo ip link set ceth1 netns netns1
$ sudo nsenter --net=/var/run/netns/netns1
$ ip link set lo up
$ ip link set ceth1 up
$ ip addr add 172.18.0.20/16 dev ceth1
$ exit
Make sure there are no new routes on the host :
$ ip route
default via 10.0.2.2 dev eth0 proto dhcp metric 100
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
Finally, create the bridge interface :
$ sudo ip link add br0 type bridge
$ sudo ip link set br0 up
take veth0 and veth1 Connect to the bridge :
$ sudo ip link set veth0 master br0
$ sudo ip link set veth1 master br0
Check the connectivity between containers :
$ sudo nsenter --net=/var/run/netns/netns0
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
64 bytes from 172.18.0.20: icmp_seq=1 ttl=64 time=0.259 ms
64 bytes from 172.18.0.20: icmp_seq=2 ttl=64 time=0.051 ms
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 0.051/0.155/0.259/0.104 ms
$ sudo nsenter --net=/var/run/netns/netns1
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.037 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.089 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 36ms
rtt min/avg/max/mdev = 0.037/0.063/0.089/0.026 ms
Great ! Well done . With this new scheme , We don't need to configure at all veth0 and veth1. Only need ceth0 and ceth1 The endpoint is assigned two IP Address . But because they are all connected to the same Ethernet On ( remember , They are connected to virtual machines switch On ), Between L2 Layers are connected :
$ sudo nsenter --net=/var/run/netns/netns0
$ ip neigh
172.18.0.20 dev ceth0 lladdr 6e:9c:ae:02:60:de STALE
$ exit
$ sudo nsenter --net=/var/run/netns/netns1
$ ip neigh
172.18.0.10 dev ceth1 lladdr 66:f3:8c:75:09:29 STALE
$ exit
Great , We learned how to turn containers into friends , Let them not interfere with each other , But it can be connected .
Connect the outside world ( IP Routing and address camouflage (masquerading))
Containers can communicate with each other . But they can communicate with the host , such as root Namespace , Correspondence ?
$ sudo nsenter --net=/var/run/netns/netns0
$ ping 10.0.2.15 # eth0 address
connect: Network is unreachable
It's obvious here ,netns0 There is no route :
$ ip route
172.18.0.0/16 dev ceth0 proto kernel scope link src 172.18.0.10
root Namespace cannot communicate with container :
# use first exit Leave netns0:
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
From 213.51.1.123 icmp_seq=1 Destination Net Unreachable
From 213.51.1.123 icmp_seq=2 Destination Net Unreachable
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3ms
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
From 213.51.1.123 icmp_seq=1 Destination Net Unreachable
From 213.51.1.123 icmp_seq=2 Destination Net Unreachable
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 3ms
To build root Connectivity with container namespaces , We need to assign... To the bridge network interface IP Address :
$ sudo ip addr add 172.18.0.1/16 dev br0
Once the bridge network interface is assigned IP Address , There will be one more route in the host's routing table :
$ ip route
# ... Ignore irrelevant lines ...
172.18.0.0/16 dev br0 proto kernel scope link src 172.18.0.1
$ ping -c 2 172.18.0.10
PING 172.18.0.10 (172.18.0.10) 56(84) bytes of data.
64 bytes from 172.18.0.10: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from 172.18.0.10: icmp_seq=2 ttl=64 time=0.049 ms
--- 172.18.0.10 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 11ms
rtt min/avg/max/mdev = 0.036/0.042/0.049/0.009 ms
$ ping -c 2 172.18.0.20
PING 172.18.0.20 (172.18.0.20) 56(84) bytes of data.
64 bytes from 172.18.0.20: icmp_seq=1 ttl=64 time=0.059 ms
64 bytes from 172.18.0.20: icmp_seq=2 ttl=64 time=0.056 ms
--- 172.18.0.20 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 4ms
rtt min/avg/max/mdev = 0.056/0.057/0.059/0.007 ms
Containers may also work ping bridge interface , But they still can't connect to the host eth0. You need to add a default route for the container :
$ sudo nsenter --net=/var/run/netns/netns0
$ ip route add default via 172.18.0.1
$ ping -c 2 10.0.2.15
PING 10.0.2.15 (10.0.2.15) 56(84) bytes of data.
64 bytes from 10.0.2.15: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from 10.0.2.15: icmp_seq=2 ttl=64 time=0.053 ms
--- 10.0.2.15 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 14ms
rtt min/avg/max/mdev = 0.036/0.044/0.053/0.010 ms
# by `netns1` Also do the above configuration
This change basically turns the host into a router , And the bridge interface becomes the default gateway between containers .
very good , We connect the container to root Namespace . Now? , Continue trying to connect them to the outside world .Linux By default disable Network packet forwarding ( such as , Routing functions ). We need to enable this function first :
# stay root Namespace
sudo bash -c 'echo 1 > /proc/sys/net/ipv4/ip_forward'
Check connectivity again :
$ sudo nsenter --net=/var/run/netns/netns0
$ ping 8.8.8.8
# hung Live in the ...
Still not working . What's wrong ? If the container can be contracted out , Then the target server cannot send the package back to the container , Because of the container IP The address is private , That particular IP Only the local network knows the routing rules . And many containers share exactly the same private IP Address 172.18.0.10. The solution to this problem is called network address translation (NAT). Before reaching the external network , The package sent by the container will send the source IP Replace the address with the external network address of the host . The host also keeps track of all existing mappings , Will recover the previously replaced before forwarding the packet back to the container IP Address . Sounds complicated , But there's good news !iptables Module allows us to do all this with just one command :
$ sudo iptables -t nat -A POSTROUTING -s 172.18.0.0/16 ! -o br0 -j MASQUERADE
The order is very simple . stay nat A new entry has been added to the list POSTROUTING chain The new route of , Will replace all camouflage from 172.18.0.0/16 Packet of network , But not through the bridge interface .
Check connectivity :
$ sudo nsenter --net=/var/run/netns/netns0
$ ping -c 2 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=61 time=43.2 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=61 time=36.8 ms
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 36.815/40.008/43.202/3.199 ms
You know the default policy we use here —— Allow all traffic , This is very dangerous in the real environment . The default of the host iptables Strategy is ACCEPT:
sudo iptables -S
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
Docker All traffic is limited by default , Then enable routing only for known paths .
Here's how CentOS 8 On the machine , A single container exposes ports 5005 when , from Docker daemon Generated rules :
$ sudo iptables -t filter --list-rules
-P INPUT ACCEPT
-P FORWARD DROP
-P OUTPUT ACCEPT
-N DOCKER
-N DOCKER-ISOLATION-STAGE-1
-N DOCKER-ISOLATION-STAGE-2
-N DOCKER-USER
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 5000 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
$ sudo iptables -t nat --list-rules
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P POSTROUTING ACCEPT
-P OUTPUT ACCEPT
-N DOCKER
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 5000 -j MASQUERADE
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A DOCKER -i docker0 -j RETURN
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 5005 -j DNAT --to-destination 172.17.0.2:5000
$ sudo iptables -t mangle --list-rules
-P PREROUTING ACCEPT
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-P POSTROUTING ACCEPT
$ sudo iptables -t raw --list-rules
-P PREROUTING ACCEPT
-P OUTPUT ACCEPT
Make containers accessible to the outside world ( Port Publishing )
As we all know, you can publish container ports to some ( Or all ) Host interface . But what exactly does port publishing mean ?
Suppose a server is running inside the container :
$ sudo nsenter --net=/var/run/netns/netns0
$ python3 -m http.server --bind 172.18.0.10 5000
If we try to send a message from the host HTTP Request to this server , Everything works well (root There are links between namespace and all container interfaces , Of course, you can connect successfully ):
# from root Namespace
$ curl 172.18.0.10:5000
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
# ... Ignore irrelevant lines ...
however , If you want to access this server from outside , Which should I use IP Well ? The only thing we know IP Is the external interface address of the host eth0:
$ curl 10.0.2.15:5000
curl: (7) Failed to connect to 10.0.2.15 port 5000: Connection refused
therefore , We need to find a way , Able to reach the host eth0 5000 All packets of the port are forwarded to the destination 172.18.0.10:5000. again i ptables To help. !
# External flow
sudo iptables -t nat -A PREROUTING -d 10.0.2.15 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 172.18.0.10:5000
# Local traffic ( Because it didn't pass PREROUTING chain)
sudo iptables -t nat -A OUTPUT -d 10.0.2.15 -p tcp -m tcp --dport 5000 -j DNAT --to-destination 172.18.0.10:5000
in addition , Need make iptables It can intercept traffic on the bridge network :
sudo modprobe br_netfilter
test :
curl 10.0.2.15:5000
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
# ... Ignore irrelevant lines ...
understand Docker Network driven
How can we use this knowledge ? such as , Try to understand Docke r Network mode [1].
from --network host Mode start . Try comparing commands ip link and sudo docker run -it --rm --network host alpine ip link Output . They are almost the same ! stay host In mode ,Docker Simply do not use network namespace isolation , The container is just root Work in the network namespace , And share the network stack with the host .
The next pattern is --network none.sudo docker run -it --rm --network host alpine ip link There is only one output loopback Network interface . This is the same as the previously created network namespace , Does not add veth The front of the equipment is very similar .
And finally --network bridge( Default ) Pattern . This is the pattern we tried to create earlier . You can try ip and iptables command , Look at the network stack from the perspective of host and container respectively .
rootless Containers and networks
Podman A good feature of container manager is that it focuses on rootless Containers . however , You might notice , This article uses a lot of sudo command . explain , No, root Permissions failed to configure network .Podman stay root Solutions on the network [2] and Docker Very similar . But in rootless On the container ,Podman Used slirp4netns[3] project :
from Linux 3.8 Start , Non privileged users can create user_namespaces(7) At the same time create network_namespaces(7). however , Nonprivileged network namespace is not very useful , Because between the host and the network namespace veth(4) Still need root jurisdiction
slirp4netns The network namespace can be connected to... In a completely non privileged way Internet On , Through a in the network namespace TAP The device is connected to the user interface TCP/IP Stack (slirp).
rootless The network is very limited :“ Technically speaking , The container itself doesn't have IP Address , Because no root jurisdiction , The association of network devices cannot be realized . in addition , from rootless Containers ping It won't work , Because it lacks CAP_NET_RAW Safety capability , And this is ping The command is required .” But it's still better than no connection at all .
Conclusion
The scheme of organizing container network introduced in this paper is only one of the possible schemes ( Probably the most widely used ). There are many other ways , Implemented by official or third-party plug-ins , But all these schemes rely heavily on Linux Network virtualization technology [4]. therefore , Containerization can be considered as a virtualization technology .
Related links :
- https://docs.docker.com/network/#network-drivers
- https://www.redhat.com/sysadmin/container-networking-podman
- https://github.com/rootless-containers/slirp4netns
- https://developers.redhat.com/blog/2018/10/22/introduction-to-linux-interfaces-for-virtual-networking/
Link to the original text :https://iximiuz.com/en/posts/container-networking-is-simple/
边栏推荐
- teamviewer显示设备数量上限解决方法
- View touch analysis
- Go变量的声明与赋值
- Bank of Beijing x Huawei: network intelligent operation and maintenance tamps the base of digital transformation service
- Unity 利用Skybox Panoramic着色器制作全景图预览有条缝隙问题解决办法
- Pytorch深度学习代码技巧
- feil_uVission4左侧工目录消失
- 数据库-序列
- 【 Native cloud】 Éditeur ivx Programmable par tout le monde
- 使用 Abp.Zero 搭建第三方登录模块(二):服务端开发
猜你喜欢

Attention meets Geometry:几何引导的时空注意一致性自监督单目深度估计

TCP congestion control details | 1 summary

Mark: unity3d cannot select resources in the inspector, that is, project locking

使用卷积对数据进行平滑处理

数据库-完整性约束

The engine "node" is inconsistent with this module

TS常用数据类型总结

RestCloud ETL与Kettle对比分析

数据库-视图

Unity uses skybox panoramic shader to make panorama preview. There is a gap. Solution
随机推荐
Detailed explanation of C language programming problem: can any three sides form a triangle, output the area of the triangle and judge its type
详解C语言编程题:任意三条边能否构成三角形,输出该三角形面积并判断其类型
同花顺注册开户安全吗,有没有什么风险?
Idea shortcut key
Program analysis and Optimization - 8 register allocation
Notes on writing questions in C language -- table tennis competition
Restcloud ETL extraction de données de table de base de données dynamique
SAP gui 770 下载
Cache page keepalive use in Vue
功能:crypto-js加密解密
Unity uses skybox panoramic shader to make panorama preview. There is a gap. Solution
子查询的使用
一键安装gcc脚本
Attention meets Geometry:几何引导的时空注意一致性自监督单目深度估计
Cluster addslots establish a cluster
1.会计基础--会计的几大要素(会计总论、会计科目和账户)
Mark: unity3d cannot select resources in the inspector, that is, project locking
使用 Abp.Zero 搭建第三方登录模块(一):原理篇
杜老师说网站更新图解
Pytoch deep learning code skills