Exploring LXC Networking

Daily Dilemma

I work for two startup companies and recently I’ve been finding myself in various conversations about Docker and Linux Containers (LXC). Most of the time the conversations eventually end up with one and the same question:

“Should we use Docker or LXC to run our production systems ?”

Grumpy and cautious System Administrator inside me is screaming “NO !!”, but my other (forever young) tech-geek alter ego is craving for resolute “HELL, YES !!”. Experience has taught me to be more careful when adopting new technology. Docker is still very young, though dotCloud, who have apparently pivoted to Docker, have been using it to power their PaaS offering. After playing with Docker for a while I must admit it is awesome. There is no doubt about it. I personally haven’t seen as much excitement about any IT technology in past several years I’ve been in this industry as I’m seeing now surrounding Docker. LXC is the technology which is used by Docker and which has been around for much longer. Some believe that the containers are the tool to fix the failure of operating systems.

Anyways, considering all the factors which come into play, the answer to the above question and to any decision made comes down to how well do you understand the technology you are going to bet your business on, so that when the trouble strikes, you will be able to operate and fix the issues in effective and timely manner without shooting in every direction like a cowboy. Though, I do admit that often, when the panic strikes, that happens nevertheless.

Very often we break down under the hype of “new and awesome” and start deploying all kinds of technologies without really understanding them, which then leads to a lot of unnecessary frustrations and hatred. Yet, as the earlier link shows it doesn’t have to – if we think before we do. I’m NOT against the technological progress. I’m all up for it, but let’s do it wisely and let’s understand the new technologies properly before we set out to use them.

One of the (many) things which are not yet entirely clear to me and to the people I speak with about this topic almost on daily basis is how the networking can be done and configured when using LXC. Hopefully the first blog post on this topic is going to shed some more light on this matter and hopefully it will inspire further posts on various other topics related to the containers.

Setup

Whenever I’m looking for answers on my technical questions or learning new technology I always prefer the “hands-on” blog posts, so that’s the approach I’ve decided to take in this post as well. You can follow it step by step and feel free to experiment as you work through the examples provided in this post.

All you need to follow the guide in this post is Vagrant – I used version 1.3.2, but for the purposes of this post any version 1.2+ will do (possibly even lower versions, but I’ve encountered some issues with older versions, so I don’t recommend them). In order to run Vagrant you need to put together a Vagrantfile. The one I used for this guide looks like this:

Vagrantfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
pkg_cmd = "apt-get update -qq && apt-get install -y vim curl python-software-properties; "
pkg_cmd << "add-apt-repository -y ppa:ubuntu-lxc/daily; "
pkg_cmd << "apt-get update -qq; apt-get install -y lxc"

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "precise64"
  config.vm.box_url = "http://files.vagrantup.com/precise64.box"

  config.vm.provision :shell, :inline => pkg_cmd

  # Create a private network
  config.vm.network :private_network, ip: "10.0.4.2"

  # Create a public network
  config.vm.network :public_network
end

I decided to use Ubuntu Precise Pangolin version as that’s the latest Ubuntu LTS version released and that’s the OS which the companies I work for are running on their production servers.

As for the LXC tools, I went a bit (only seemingly) “crazy” and decided to install the latest available LXC tools from development package PPA which contains daily LXC package builds. Place your Vagrantfile into some working directory, fire up vagrant up and off we go.

If you used the same Vagrantfile as myself then before the Virtual Machine (VM) boots you will be queried by vagrant about which network interface on your workstation you want to bridge the Virtual Box one with. Pick any active interface on the workstation you are running this i.e. the one you’re using for networking (which is hopefully on private network). vagrant up downloads the Vagrant box image (which is an OS image) if it’s not already present on your workstation, boots it and runs the commands specified in pkg_cmd once the VM is up and running.

Once the whole set up has finished, you can run vagrant ssh in the same directory you have run vagrant up in. This will log you on the newly created VM via ssh and you should see the latest LXC packages installed and ready to be used:

1
2
3
4
5
6
vagrant@precise64:~$ dpkg -l|grep -i lxc
ii  liblxc0                         1.0.0~alpha2+master~20131108-2200-0ubuntu1~ppa1~precise1 Linux Containers userspace tools (library)
ii  lxc                             1.0.0~alpha2+master~20131108-2200-0ubuntu1~ppa1~precise1 Linux Containers userspace tools
ii  lxc-templates                   1.0.0~alpha2+master~20131108-2200-0ubuntu1~ppa1~precise1 Linux Containers userspace tools (templates)
ii  python3-lxc                     1.0.0~alpha2+master~20131108-2200-0ubuntu1~ppa1~precise1 Linux Containers userspace tools (Python 3.x bindings)
vagrant@precise64:~$

The actual version tags (such as the timestamp) of the packages above may differ for you as you’ll be installing the latest ones at the time you’ll be following this guide. As for the networking part of the setup, you should end up with something like below, though obviously IP and MAC addresses will be different on your workstation (for easier readability I’ve added extra new line after each interface output):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
vagrant@precise64:~$ sudo ip address list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:88:0c:a6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0
    inet6 fe80::a00:27ff:fe88:ca6/64 scope link
       valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:2e:8a:7a brd ff:ff:ff:ff:ff:ff
    inet 10.0.4.2/24 brd 10.0.4.255 scope global eth1
    inet6 fe80::a00:27ff:fe2e:8a7a/64 scope link
       valid_lft forever preferred_lft forever

4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:f9:8f:2e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.135/24 brd 192.168.1.255 scope global eth2
    inet6 fe80::a00:27ff:fef9:8f2e/64 scope link
       valid_lft forever preferred_lft forever

5: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether ee:5d:90:3b:26:d0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 brd 10.0.3.255 scope global lxcbr0
    inet6 fe80::85f:69ff:fe8e:9df8/64 scope link
       valid_lft forever preferred_lft forever
vagrant@precise64:~$

Quick explanation of the above:

  • lo – loopback interfaces (more on this later)
  • eth0 – default network interface created by Vagrant
  • eth1 – private network interface which Vagrant created as instructed by our Vagrantfile configuration :private_network
  • eth2 – public network interface which Vagrant created as instructed by our Vagrantfile configuration :public_network
  • lxcbr0 – network bridge which was created automatically when the lxc tools have been installed

Installation of LXC tools hides a small subtlety you might want to be aware of and that is a creation of iptables rule. The rule basically masquerades all the traffic leaving the containers which are bridged with lxcbr0, i.e. which are on 10.0.3.0/24 LAN so that you can reach “outside” world from inside these containers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
vagrant@precise64:~$ sudo iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  10.0.3.0/24         !10.0.3.0/24
vagrant@precise64:~$

The above iptables rule is also automatically created on boot via lxc-net upstart job which you can easily verify by looking into particular upstart configuration file: /etc/init/lxc-net.conf.

LXC Network Types

As the official LXC Ubuntu manual pages mention there are 5 network virtualization types available to be used with LXC:

  • empty
  • veth
  • macvlan
  • vlan
  • phys

Each of them works with different concepts and requires some knowledge of Linux networking. I hope my explanation with practical examples will give you a better understanding of these networking types. Enough talking, let’s get party started !

Empty

Let’s create our first container to explore the empty network type. We will create a new container by running lxc-create command which can use ubuntu template script shipped with lxc-templates package. LXC templates are bash scripts which make creation of the containers supereasy. You can have a look at what’s “hiding” inside the ubuntu one (which is the one we will be using in this guide) in the following path: /usr/share/lxc/templates/lxc-ubuntu. You can specify a particular Ubuntu version via -r command line switch passed to the template script as you can see below:

1
vagrant@precise64:~$ sudo lxc-create -t ubuntu -n empty01 -- -r precise

The above command uses debootstrap package to download and install particular Ubuntu OS image into /var/lib/lxc directory on the host machine. Once the above command has successfully finished you should be able to see that the new image has been installed and is ready to be used:

1
2
3
4
5
6
7
8
vagrant@precise64:~$ sudo ls -1 /var/lib/lxc/
empty01
vagrant@precise64:~$ sudo ls -l /var/lib/lxc/empty01/
total 12
-rw-r--r--  1 root root 1516 Nov 10 19:37 config
-rw-r--r--  1 root root  329 Nov 10 19:29 fstab
drwxr-xr-x 22 root root 4096 Nov 10 19:38 rootfs
vagrant@precise64:~$

Every container created using lxc-create utility creates a configuration file in the container’s path (/var/lib/lxc/CONTAINERNAME/config). The configuration file is then used by another utility called lxc-start for starting the container. We need to modify this file to use empty network type. The new network related configuration looks like this:

1
2
3
4
5
vagrant@precise64:~$ sudo grep network /var/lib/lxc/empty01/config
lxc.network.type = empty
lxc.network.hwaddr = 00:16:3e:67:4f:a5
lxc.network.flags = up
vagrant@precise64:~$

Now that we have our networking set up ready, let’s start the conatiner and after a bit of time (give it a short time to boot) check if it’s running (note the -d switch which starts the container in the background – otherwise you would see all the Linux boot logs streaming in your standard output):

1
2
3
4
5
6
vagrant@precise64:~$ sudo lxc-start -n empty01 -d
vagrant@precise64:~$ sudo lxc-ls --fancy
NAME     STATE    IPV4  IPV6  AUTOSTART
---------------------------------------
empty01  RUNNING  -     -     NO
vagrant@precise64:~$

As you can see in the above output, the container is up and running, but it doesn’t have any IP address assigned. This is exactly what we should expect when we specified empty network type as per documentation which says: “empty network type creates only the loopback interface”. The newly created loopback interface is not visible on the host when running sudo ip link list as it’s in a different network namespace.

I will talk more about Linux Namespacing in another blog post, but if you are impatient and want to know more now, there is already a great blog post written by Jérôme Petazzoni which explains them very well. Short explanation is that the namespacing is a functionality implemented in Linux Kernel by adding just 3 new system calls: clone(), unshare(), setns() and which allows for an isolation of various Kernel subsystems on the host.

You can verify that the new network namespace has been created for the container by checking the contents of its init process’ proc namespace entry. Make sure you are checking the container’s init process PID entry (2nd column on the 2nd line in the output below) and NOT the lxc-start process one as lxc-start is a parent process which forks the container’s init process:

1
2
3
4
5
6
7
vagrant@precise64:~$ ps faux | grep -A 1 "lxc-start -n empty01 -d"
root      20465  0.0  0.3  31952  1172 ?        Ss   19:38   0:00 lxc-start -n empty01 -d
root      20469  0.0  0.5  24076  2040 ?        Ss   19:38   0:00  \_ /sbin/init
vagrant@precise64:~$
vagrant@precise64:~$ sudo ls -l /proc/20469/ns/net
-r-------- 1 root root 0 Nov 10 19:39 /proc/20469/ns/net
vagrant@precise64:~$

We can log on the container using lxc-console which attaches to one of the container’s ttys. user/password is ubuntu/ubuntu. We can now verify that the loopback device has been created correctly:

1
vagrant@precise64:~$ sudo lxc-console -n empty01 -t 2

Inside the container just run simply ip address list:

1
2
3
4
5
6
7
ubuntu@empty01:~$ sudo ip address list
6: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
ubuntu@empty01:~$ 

We didn’t actually have to even log on to the container to verify this. We could just simply switch into the container’s network namespace directly on the host, but let’s leave that for later. Alternatively, we could run lxc-attach -- /sbin/ip address list but due to some namespace implementation issues on the Kernel I’m running this guide on, lxc-attach does not seem to work properly, hence I’m using lxc-console. Stéphane Graber, one of the LXC project core developers recommends in the comments below to use the Kernel version >= 3.8 and lxc-attach will work just fine.

Before we dive into discussion about how useful the “empty” network type in combination with container is I’m going to talk a bit about what loopback device actually is as a lot of people I come in touch somehow accept its existence but don’t really know even the basic implementation details.

Loopback device is a virtual network device implemented entirely in software (userspace) which the host uses to communicate with itself. Device (interface) is assigned a special non-routable IP address block 127.0.0.0/8 that is the IP addresses 127.0.0.1 through 127.255.255.254 can be used to communicate with the host itself. Linux distributions also “alias” one of these IP addresses in /etc/hosts with entry called localhost.

How does the actual communication work when we know that the assigned IP address block is non-routable? It works simply by creating a special entry in the host’s routing table which is routing all packets destined to 127.0.0.0/8 back to the loopback device lo:

1
2
3
4
5
6
ubuntu@empty01:~$ sudo ip route show table local
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1
ubuntu@empty01:~$

How is this useful? Mostly for diagnostics, troubleshooting, testing and when you don’t want your service to be exposed on any network. Imagine you want to test different Postgresql configurations which bind to the localhost IP address on the same TCP port. Now, you can argue that you can copy the configuration files and run it on the different TCP port on the same host, but why all that “hassle”, when you can just simply run the Postgresql daemon in the container and avoid the “interference” with other Postgresql instances bound to the same IP address and port. Obviously, this is just one of many use cases. I’m sure you will be able to come up with many other. I’d like to hear about these in the comments below.

Veth

In order to explore the veth network type we will create a new container, but with different network settings. Let’s not waste any more time and fire up another lxc-create command:

1
2
3
4
vagrant@precise64:~$ sudo lxc-create -t ubuntu -n veth01 -- -r precise
Checking cache download in /var/cache/lxc/precise/rootfs-amd64 ...
Copy /var/cache/lxc/precise/rootfs-amd64 to /var/lib/lxc/veth01/rootfs ...
Copying rootfs to /var/lib/lxc/veth01/rootfs ...

As you must’ve noticed the container creation takes much shorter time than when we created our first container as debootstrap caches the previously downloaded image in the directory you can see in the output above and then just simply copies it over to the new container’s filesystem. Of course, if the second container was created with a different template than the first one, it would take longer as the whole new OS image would have to be downloaded.

Now, before we look into the container’s network configuration, let’s talk a little bit about what the veth network type actually is. LXC documentation says:

a peer network device is created with one side assigned to the container and the other side is attached to a bridge specified by the lxc.network.link

The peer device can be understood as a pair of fake Ethernet devices that act as a pipe, i.e. traffic sent via one interface comes out the other one. As these devices are Ethernet devices and not point to point devices you can handle broadcast traffic on these interfaces and use protocols other than IP – you are basically protocol independent on top of Ethernet. For the purpose of this guide we will only focus on the IP protocol.

Armed with this knowledge, what we should expect when running the container with the veth network type enabled, is to have one network interface created on the host and the other one in the container, where the container interface will be “hidden” in the container’s network namespace. The host’s interface will then be bridged to the bridge created on the host if so configured.

Let’s proceed with the container’s network configuration modifications so that it looks as below:

1
2
3
4
5
6
vagrant@precise64:~$ sudo grep network /var/lib/lxc/veth01/config
lxc.network.type = veth
lxc.network.hwaddr = 00:16:3e:7e:11:ac
lxc.network.flags = up
lxc.network.link = lxcbr0
vagrant@precise64:~$ 

As you’ve probably noticed, lxc-create default generated container configuration file is already set to use veth network type so you shouldn’t need to make any modifications to the veth01 container’s configuration if you followed this guide carefully. Now let’s start our new container!

1
2
3
4
5
6
7
vagrant@precise64:~$ sudo lxc-start -n veth01 -d
vagrant@precise64:~$ sudo lxc-ls --fancy
NAME     STATE    IPV4        IPV6  AUTOSTART
---------------------------------------------
empty01  RUNNING  -           -     NO
veth01   RUNNING  10.0.3.118  -     NO
vagrant@precise64:~$ 

Brilliant! The container is running and it has been assigned 10.0.3.118 IP address automatically. How is the IP address assigned ? In order to understand that we need to understand how is the container actually created. What is happening in terms of networking on the host is – in very simplified terms – the following:

  1. A pair of veth devices is created on the host. Future container’s network device is then configured via DHCP server which is running on the configured bridge’s interface (in this case that’s lxcbr0). You can verify this by running sudo netstat -ntlp|grep 53 on the host and you will see dnsmasq DHCP server listening on the lxcbr0 IP address. The bridge’s IP address will serve as the container’s default gateway as well as its nameserver
  2. The host part of the veth device pair is bridged to the configured bridge as per container configuration – as I said in this case that is lxcbr0
  3. “Slave” part of the pair is then “sent” to the container (including its configuration) and it’s renamed to eth0
  4. Once the container’s init process is started it brings up the eth0 interface in the container and we can start networking!

In other words, the above 4 steps serve to bridge the container’s network namespace with the host network stack via VETH device pair, so you should be able to communicate with the container directly from the host. Let’s send it couple of pings from the host and see if that’s the case:

1
2
3
4
5
6
7
8
9
vagrant@precise64:~$ ping -c 2 10.0.3.118
PING 10.0.3.118 (10.0.3.118) 56(84) bytes of data.
64 bytes from 10.0.3.118: icmp_req=1 ttl=64 time=0.220 ms
64 bytes from 10.0.3.118: icmp_req=2 ttl=64 time=0.130 ms

--- 10.0.3.118 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.130/0.175/0.220/0.045 ms
vagrant@precise64:~$

Awesome! Life is beautiful! But because we are curious we will log on to the veth01 container and have a poke around and verify whether the theory we spoke earlier about have practical backing by checking the network configuration of the container. You can log on to the veth01 container either via ssh which runs on the container (ssh is installed by default when you create the container by following the steps in this guide) or you can log on via lxc-console as we did when we introduced empty networking type:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
ubuntu@veth01:~$ sudo ip address list
7: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:16:3e:7e:11:ac brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.118/24 brd 10.0.3.255 scope global eth0
    inet6 fe80::216:3eff:fe7e:11ac/64 scope link
       valid_lft forever preferred_lft forever

9: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
ubuntu@veth01:~$
ubuntu@veth01:~$ sudo ip route show
default via 10.0.3.1 dev eth0  metric 100
10.0.3.0/24 dev eth0  proto kernel  scope link  src 10.0.3.118
ubuntu@veth01:~$ grep nameserver /etc/resolv.conf
nameserver 10.0.3.1
nameserver 10.0.2.3
nameserver 192.168.1.254
ubuntu@veth01:~$

Container’s veth pair side looks as we expected. eth0 has the correct IP assigned, and its network is configured to use the IP address of lxcbr0 bridge. Let’s check the Host’s side:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
vagrant@precise64:~$ sudo ip address list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:88:0c:a6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.15/24 brd 10.0.2.255 scope global eth0
    inet6 fe80::a00:27ff:fe88:ca6/64 scope link
       valid_lft forever preferred_lft forever

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:2e:8a:7a brd ff:ff:ff:ff:ff:ff
    inet 10.0.4.2/24 brd 10.0.4.255 scope global eth1
    inet6 fe80::a00:27ff:fe2e:8a7a/64 scope link
       valid_lft forever preferred_lft forever

4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:f9:8f:2e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.135/24 brd 192.168.1.255 scope global eth2
    inet6 fe80::a00:27ff:fef9:8f2e/64 scope link
       valid_lft forever preferred_lft forever

5: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether ee:5d:90:3b:26:d0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 brd 10.0.3.255 scope global lxcbr0
    inet6 fe80::85f:69ff:fe8e:9df8/64 scope link
       valid_lft forever preferred_lft forever

8: vethD9YPJ0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master lxcbr0 state UP qlen 1000
    link/ether fe:25:26:02:77:25 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc25:26ff:fe02:7725/64 scope link
       valid_lft forever preferred_lft forever
vagrant@precise64:~$

vethD9YPJ0 device has been created as the Host’s side device of the veth network pair (remember the other side of the pair is in the container).

Let’s see if the Host’s side veth device is bridged to the lxcbr0 bridge:

1
2
3
4
vagrant@precise64:~$ sudo brctl show
bridge name   bridge id       STP enabled interfaces
lxcbr0        8000.febc614cdc21   no      vethD9YPJ0
vagrant@precise64:~$ 

BINGO! Looks like all is as we expected it to be and we can sleep well at night knowing that we learnt something new. Well at least I hope we did :–)

Now, let’s take this a bit further and explore Network Namespaces for a bit and try to simulate something similar to what is happening on the network level when the veth01 container is created. We won’t dive into actual Kernel namespacing implementation, but we will play around with the userspace tools a bit.

The task we will work on now is to create a veth pair of devices, one of which will be in a different network namespace than the other one and then bridge them to the same bridge as the veth01 container is bridged to. In other words, let’s create a separate network stack in a separate Network Namespace from the Host one without all the “boiler plate” which comes with the creation of container. Let’s pretend we are network experts and we need to perform some network activities and we don’t need any containers – just separate network stacks on the host.

One way of completing this task is simply to perform the following steps:

  1. create new network namespace
  2. create veth pair of network devices in the new namespace
  3. configure the “host isolated” device
  4. pass the other side of the pair back to the Host’s namespace
  5. bridge the Host’s veth pair device to the lxcbr0 bridge

Simple! So let’s roll our sleves up and start working through the above plan. The result should be a pingable IP address in a separate network namespace. All the following steps are performed on the Host.

Let’s create a directory where the network namespaces are read from:

1
vagrant@precise64:~$ sudo mkdir -p /var/run/netns

Let’s create a new network namespace and verify it was created (ip command has a special flags for listing network namespaces):

1
2
3
4
5
6
7
vagrant@precise64:~$ sudo ip netns add mynamespace
vagrant@precise64:~$ sudo ip netns list
mynamespace
vagrant@precise64:~$ sudo ls -l /var/run/netns/
total 0
-r-------- 1 root root 0 Nov 10 22:24 mynamespace
vagrant@precise64:~$ 

Awesome! We are set for some good Linux Network ride! We can check the box next to step (1) in the plan above.

Now Let’s switch to the newly created namespace, create a pair of veth devices and configure one of them to use 10.0.3.78/24 IP address:

1
2
3
4
5
6
7
8
9
10
11
12
vagrant@precise64:~$ sudo ip netns exec mynamespace bash
root@precise64:~# ip link add vethMYTEST type veth peer name eth0
root@precise64:~# ip link list
10: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

11: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 82:6b:b3:08:36:34 brd ff:ff:ff:ff:ff:ff

12: vethMYTEST: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether ea:6a:c3:f0:62:d7 brd ff:ff:ff:ff:ff:ff
root@precise64:~#

As you can see, 3 separate network devices have been created. lo device which is a loopback device interface, and a pair of veth network devices. None of them has been assigned an IP address yet:

1
2
3
4
5
6
7
8
9
10
root@precise64:~# ip address list
10: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

11: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 82:6b:b3:08:36:34 brd ff:ff:ff:ff:ff:ff

12: vethMYTEST: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether ea:6a:c3:f0:62:d7 brd ff:ff:ff:ff:ff:ff
root@precise64:~#

Let’s assign an IP address to eth0 interface from the same IP range as lxcbr0 is on and bring it up to life:

1
2
3
4
5
6
7
8
9
10
11
12
13
root@precise64:~# ip address add 10.0.3.78/24 dev eth0
root@precise64:~# ip link set eth0 up
root@precise64:~# ip address list
10: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

11: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000
    link/ether 82:6b:b3:08:36:34 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.78/24 scope global eth0

12: vethMYTEST: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether ea:6a:c3:f0:62:d7 brd ff:ff:ff:ff:ff:ff
root@precise64:~#

Brilliant! We can now check the boxes next to steps (2) and (3). Let’s proceed with step (4) and move the vethMYTEST device to the Host’s namespace. We can do that like so:

1
root@precise64:~# ip link set vethMYTEST netns 1

The vethMYTEST device should now be present in the Host’s network namespace so we should be able to bring it up on the Host, but first let’s exit the shell in the “mynamespace” network namespace (I know I’ve picked a horrible name for it) and then bring the device to life:

1
2
3
4
5
6
7
root@precise64:~# exit
exit
vagrant@precise64:~$ sudo ip link set vethMYTEST up
vagrant@precise64:~$ sudo ip link list vethMYTEST
12: vethMYTEST: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether ea:6a:c3:f0:62:d7 brd ff:ff:ff:ff:ff:ff
vagrant@precise64:~$

Until we bridge the vethMYTEST device with the lxcbr0 bridge, the “network isolated” IP address should not be accessible. We can verify that very easily:

1
2
3
4
5
6
7
8
9
vagrant@precise64:~$ ping -c 2 10.0.3.78
PING 10.0.3.78 (10.0.3.78) 56(84) bytes of data.
From 10.0.3.1 icmp_seq=1 Destination Host Unreachable
From 10.0.3.1 icmp_seq=2 Destination Host Unreachable

--- 10.0.3.78 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 1009ms
pipe 2
vagrant@precise64:~$ 

Now let’s do some bridging! We have an awesome brctl utility for this so let’s use it:

1
2
3
4
5
6
vagrant@precise64:~$ sudo brctl addif lxcbr0 vethMYTEST
vagrant@precise64:~$ sudo brctl show
bridge name   bridge id       STP enabled interfaces
lxcbr0        8000.3234ea7e8ace   no      vethD9YPJ0
                                                    vethMYTEST
vagrant@precise64:~$

Now that we can check the box next to step (5) we should be able to access the “mynamespace” isolated IP address. So let’s verify that claim:

1
2
3
4
5
6
7
8
9
vagrant@precise64:~$ ping -c 2 10.0.3.78
PING 10.0.3.78 (10.0.3.78) 56(84) bytes of data.
64 bytes from 10.0.3.78: icmp_req=1 ttl=64 time=0.094 ms
64 bytes from 10.0.3.78: icmp_req=2 ttl=64 time=0.080 ms

--- 10.0.3.78 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.080/0.087/0.094/0.007 ms
vagrant@precise64:~$

Boom! We are done! Great job everyone!

Before we conclude this part of the post, I will list a few use cases veth interface can be used for:

  • create virtual networks between containers – by linking them via various different bridges
  • provide a routed link for a container – for routing the packets to “outside” world with the help of iptables as mentioned earlier in this post
  • emulate bridged networks
  • test pretty much almost any network topology

As always, I’m going to leave this to your imagination and creativity. Feel free to leave the suggestions in the comments. And now, let’s move on to the next network type available in containers – macvlan.

Macvlan

Another network type I will talk about in this guide is macvlan. But before we get to work, let’s get familiar with a little bit of theory first. macvlan – or for easier understanding MAC VLAN – is a way to take a single network interface and create multiple virtual network interfaces with different MAC addresses asigned to them.

This is a “many-to-one” mapping. Linux VLANs (i.e. not MAC VLANs) which are capable to use single network interface and map it to multiple virtual networks provide “one-to-many” mapping. We can obviously combine Linux VLANs with MAC VLANs if we want to.

MAC VLAN allows each configured “slave” device to be in one of three modes:

  • PRIVATE – the device never communicates with any other device on the “upper_dev” (ie “master” device) which means that all incoming packets on the “slave” virtual interface are dropped if their source MAC Address matches one of the MAC VLAN interfaces – i.e. the “slaves” can’t communicate with each other

  • VEPA – Virtual Ethernet Port Aggregator is a MAC VLAN mode that aggregates virtual machine packets on the server before the resulting single stream is transmitted to the switch. When using VEPA we assume that the the adjacent bridge returns all frames where both source and destination are local to the macvlan port, i.e. the bridge is set up as a reflective relay. This mode of operation is called hairpin mode” and it must be supported by upstream switch, not the Linux kernel itself. As such, we forward all traffic out to the switch even if it is destined for us and we rely on the switch at the other end to send it back, so the isolation is done on the switch, not the Linux kernel.

  • BRIDGE – provides the behavior of a simple bridge between different macvlan interfaces on the same port. Frames from one interface to another one get delivered directly and are not sent out externally.

What do the above modes mean for LXC networking ? Let’s have a look

  • PRIVATE mode disallows any communication between LXC containers

  • VEPA mode isolates the containers from one another – UNLESS you have an upstream switch configured to work as reflective relay – in that case, you CAN address containers directly. On top of that, traffic destined to a different MAC VLAN on the same interface (i.e. on the same bridge) will travel through the physical (“master”) interface twice – once to leave (egress) the interface where it is switched and sent back and enters (ingresses) via the SAME interface, which means that this can affect available physical bandwidth and also restricts inter MAC VLAN traffic to the speed of the physical connection

  • BRIDGE mode creates a special bridge (“pseudo” bridge – not the same bridge as a standard Linux bridge!) which allows the containers to talk to one another but isolates “pseudo-bridged” interfaces from the host

As VEPA requires a specially configured (reflective relay) switch to demonstrate its functionality and I could not find any way how to configure Linux bridge in reflective relay mode I will not deal with this mode in this guide. PRIVATE mode doesn’t provide much fun to play with, so I won’t be touching on this mode either. This leaves us with BRIDGE mode so let’s hurry up and have some fun.

Bridge mode

To demonstrate how this mode works, we will create 2 LXC containers, which we will bridge over manually created bridge using the LXC bridge network mode provided by macvlan network type. Let’s go ahead and create a new bridge on the Host – we don’t need to assign any IP address to the bridge as assigning an IP address to the bridge will not affect the results of this test if you read the above theory carefully:

1
2
3
4
5
6
7
8
vagrant@precise64:~$ sudo brctl addbr lxcbr1
vagrant@precise64:~$ sudo ifconfig lxcbr1 up
vagrant@precise64:~$ sudo brctl show
bridge name   bridge id       STP enabled interfaces
lxcbr0        8000.3234ea7e8ace   no      vethD9YPJ0
                                          vethMYTEST
lxcbr1        8000.000000000000   no      
vagrant@precise64:~$ 

Let’s create the containers now and configure them to use macvlan network type in bridge macvlan mode. Let’s also assign each of them an IP address so they can communicate with each other:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
vagrant@precise64:~$ sudo lxc-create -t ubuntu -n macvlanbridge01 -- -r precise
vagrant@precise64:~$ sudo lxc-create -t ubuntu -n macvlanbridge02 -- -r precise
vagrant@precise64:~$ sudo grep network /var/lib/lxc/macvlanbridge01/config
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.hwaddr = 00:16:3e:48:35:d2
lxc.network.flags = up
lxc.network.link = lxcbr1
lxc.network.ipv4 = 10.0.5.3/24
vagrant@precise64:~$
vagrant@precise64:~$ sudo grep network /var/lib/lxc/macvlanbridge02/config
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.hwaddr = 00:16:3e:69:b3:4d
lxc.network.flags = up
lxc.network.link = lxcbr1
lxc.network.ipv4 = 10.0.5.4/24
vagrant@precise64:~$

We can now fire them up and start playing with them. As you can see below, both containers are now up and running and have been assigned IP addresses as per the configuration above:

1
2
3
4
5
6
7
8
9
10
vagrant@precise64:~$ sudo lxc-start -n macvlanbridge01 -d
vagrant@precise64:~$ sudo lxc-start -n macvlanbridge02 -d
vagrant@precise64:~$ sudo lxc-ls --fancy
NAME             STATE    IPV4           IPV6  AUTOSTART
--------------------------------------------------------
empty01          RUNNING  -              -     NO
macvlanbridge01  RUNNING  10.0.5.3       -     NO
macvlanbridge02  RUNNING  10.0.5.4       -     NO
veth01           RUNNING  10.0.3.118     -     NO
vagrant@precise64:~$

If the theory discussed at the beginning of this subchapter is correct we should not be able to access any of the newly created containers from the host. So let’s go ahead and verify this by simple ping tests:

1
2
3
4
5
6
7
8
9
10
11
12
13
vagrant@precise64:~$ ping -c 2 10.0.5.3
PING 10.0.5.3 (10.0.5.3) 56(84) bytes of data.

--- 10.0.5.3 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1000ms

vagrant@precise64:~$ ping -c 2 10.0.5.4
PING 10.0.5.4 (10.0.5.4) 56(84) bytes of data.

--- 10.0.5.4 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1008ms

vagrant@precise64:~$

On the other hand, containers should be able to communicate with each other. So let’s go ahead and make a few tests from inside the containers.

Test performed on the first container:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ubuntu@macvlanbridge01:~$ sudo ip address list
14: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:16:3e:48:35:d2 brd ff:ff:ff:ff:ff:ff
    inet 10.0.5.3/24 brd 10.0.5.255 scope global eth0
    inet6 fe80::216:3eff:fe48:35d2/64 scope link
       valid_lft forever preferred_lft forever

15: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
ubuntu@macvlanbridge01:~$
ubuntu@macvlanbridge01:~$ ping -c 3 10.0.5.4
PING 10.0.5.4 (10.0.5.4) 56(84) bytes of data.
64 bytes from 10.0.5.4: icmp_req=1 ttl=64 time=0.106 ms
64 bytes from 10.0.5.4: icmp_req=2 ttl=64 time=0.080 ms
64 bytes from 10.0.5.4: icmp_req=3 ttl=64 time=0.118 ms

--- 10.0.5.4 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.080/0.101/0.118/0.017 ms
ubuntu@macvlanbridge01:~$

Test performed on the second container:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ubuntu@macvlanbridge02:~$ sudo ip address list
16: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:16:3e:69:b3:4d brd ff:ff:ff:ff:ff:ff
    inet 10.0.5.4/24 brd 10.0.5.255 scope global eth0
    inet6 fe80::216:3eff:fe69:b34d/64 scope link
       valid_lft forever preferred_lft forever

17: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
ubuntu@macvlanbridge02:~$
ubuntu@macvlanbridge02:~$ ping -c 3 10.0.5.3
PING 10.0.5.3 (10.0.5.3) 56(84) bytes of data.
64 bytes from 10.0.5.3: icmp_req=1 ttl=64 time=0.061 ms
64 bytes from 10.0.5.3: icmp_req=2 ttl=64 time=0.058 ms
64 bytes from 10.0.5.3: icmp_req=3 ttl=64 time=0.061 ms

--- 10.0.5.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.058/0.060/0.061/0.001 ms
ubuntu@macvlanbridge02:~$

Perfect! It looks like everything is working as expected! But, let’s test one more thing.

One of the obvious things I have not mentioned until now is that Linux containers can be configured with more than just ONE network interface. In fact you can have various network configurations in one single container each applied to different network interface. Armed with this knowledge we can create a container which will have:

  • one VETH network interface linked to lxcbr0 bridge – we can communicate with it directly from the host on which the container is running
  • one MAC VLAN network interface in bridge mode linked to lxcbr1 so the container can communicate with macvlan01 and macvlan02 containers – remember these 2 containers are NOT accessible directly from the host (not accessible is via network – you can always access the containers from the host via lxc-console indeed)

This will give us a “management” container acessible on the host via network through from which we can access the other two containers – think DMZ. So let’s get started and let’s create a new container which has these capabilities:

1
vagrant@precise64:~$ sudo lxc-create -t ubuntu -n dmzmaster01 -- -r precise

Network configuration should follow the above mentioned requirements – container should be accessible from the host and should have access to the MAC VLAN-ed containers:

1
2
3
4
5
6
7
8
9
10
11
vagrant@precise64:~$ sudo grep network /var/lib/lxc/dmzmaster01/config
lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = lxcbr0
# MAC VLAN network
lxc.network.type = macvlan
lxc.network.macvlan.mode = bridge
lxc.network.flags = up
lxc.network.link = lxcbr1
lxc.network.ipv4 = 10.0.5.5/24
vagrant@precise64:~$ 

Let’s start the container and verify that the requirements we set ourselves have been satisfied. The newly created container should have 2 IP addresses assigned on two separate network interfaces. Interface linked to lxcbr0 should have an IP from 10.0.3.0/24 IP range and interface linked to lxcbr1 should have a manually assigned IP address as per our configuration above on the same LAN as containers macvlanbridge01 and macvlanbridge02:

1
2
3
4
5
6
7
8
9
10
vagrant@precise64:~$ sudo lxc-start -n dmzmaster01 -d
vagrant@precise64:~$ sudo lxc-ls --fancy
NAME             STATE    IPV4                  IPV6  AUTOSTART
---------------------------------------------------------------
dmzmaster01      RUNNING  10.0.3.251, 10.0.5.5  -     NO
empty01          RUNNING  -                     -     NO
macvlanbridge01  RUNNING  10.0.5.3              -     NO
macvlanbridge02  RUNNING  10.0.5.4              -     NO
veth01           RUNNING  10.0.3.118            -     NO
vagrant@precise64:~$

Perfect! Looks like the IP assignment has worked as expected! Let’s see if 10.0.3.251 is accessible from the host and let’s confirm that 10.0.5.5 is NOT:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
vagrant@precise64:~$ ping -c 3 10.0.3.251
PING 10.0.3.251 (10.0.3.251) 56(84) bytes of data.
64 bytes from 10.0.3.251: icmp_req=1 ttl=64 time=0.078 ms
64 bytes from 10.0.3.251: icmp_req=2 ttl=64 time=0.057 ms
64 bytes from 10.0.3.251: icmp_req=3 ttl=64 time=0.073 ms

--- 10.0.3.251 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.057/0.069/0.078/0.011 ms
vagrant@precise64:~$ ping -c 3 10.0.5.5
PING 10.0.5.5 (10.0.5.5) 56(84) bytes of data.

--- 10.0.5.5 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1998ms

vagrant@precise64:~$

Brilliant! Now, let’s ssh to the dmzmaster01 container, verify that the container has 2 separate network interfaces with particular IP addresses assigned and see if the macvlanbridge01 and macvlanbridge02 containers are accessible from it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
vagrant@precise64:~$ ssh ubuntu@10.0.3.251
The authenticity of host '10.0.3.251 (10.0.3.251)' can't be established.
ECDSA key fingerprint is 87:01:4e:04:51:3e:db:98:71:e2:3b:c5:59:fd:1b:51.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.3.251' (ECDSA) to the list of known hosts.
ubuntu@10.0.3.251's password:
ubuntu@dmzmaster01:~$ sudo ip address list
18: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether ce:b6:ff:7d:8a:23 brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.251/24 brd 10.0.3.255 scope global eth0
    inet6 fe80::ccb6:ffff:fe7d:8a23/64 scope link
       valid_lft forever preferred_lft forever

20: eth1@if13: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 2e:e2:37:06:56:b9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.5.5/24 brd 10.0.5.255 scope global eth1
    inet6 fe80::2ce2:37ff:fe06:56b9/64 scope link
       valid_lft forever preferred_lft forever

21: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
ubuntu@dmzmaster01:~$ 

And the container-connectivity test:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ubuntu@dmzmaster01:~$ ping -c 2 10.0.5.3
PING 10.0.5.3 (10.0.5.3) 56(84) bytes of data.
64 bytes from 10.0.5.3: icmp_req=1 ttl=64 time=0.116 ms
64 bytes from 10.0.5.3: icmp_req=2 ttl=64 time=0.061 ms

--- 10.0.5.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.061/0.088/0.116/0.029 ms
ubuntu@dmzmaster01:~$ ping -c 2 10.0.5.4
PING 10.0.5.4 (10.0.5.4) 56(84) bytes of data.
64 bytes from 10.0.5.4: icmp_req=1 ttl=64 time=0.107 ms
64 bytes from 10.0.5.4: icmp_req=2 ttl=64 time=0.063 ms

--- 10.0.5.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.063/0.085/0.107/0.022 ms
ubuntu@dmzmaster01:~$ 

With the last test we have concluded a walk through the LXC MAC VLAN networking mode by providing a practical example about how to use it to create a simple DMZ network. As with the other examples in this guide, possibilities are almost endless. We can, for example, create separate private Postgresql replication VLAN to avoid interferring with the application VLAN, we can have private VLAN for corosync intra cluster communication etc. I hope these examples will inspire your creativity and I’m looking forward to hearing the use cases you will come up with – leave them in the comments.

Phys

Finally, we have reached the last available network type in LXC configuration – phys. This network type is the least complicated to understand in my opinion. So what does LXC documentation say about the phys network type ?

an already existing interface specified by the lxc.network.link is assigned to the container

What this means in practice is – in very simplified terms – that you basically literally rip off the physical network card from the Virtual Box (Virtual Box is what vagrant uses to run virtual machines on which this guide is performed) or any other host you’re running this guide on and plug it into the container.

The interface “moves” from one network namespace to the new one, but should be accessible on the same network the original interface existed on i.e. you have basically isolated physical interface on the host (i.e. new network stack has been created in the container) in a similar way to what we were doing at the end of the veth subchapter when we pretended to be network experts.

Let’s get back to the Vagrantfile – the reason why I have specified :public_network config directive in it was so that Vagrant creates public network interface – “public” meaning the IP address assigned to this interface is on the same network as my laptop on which I’m running vagrant commands.

By applying the above theory and moving the “public” interface into the container we should be able to simply access the container directly from our workstatsions or from ANY other host on the same network on which the public interface has been created. Let’s see if we can back our theory in practice! Let’s create a new container and configure it to use the phys network type.

Couple of points about the LXC network configuration before we proceed. MAKE SURE that lxc.network.hwaddr is the same as the original one created by vagrant – I have come across some ARP madness when I didn’t reuse the original MAC address as the new one was generated randomly by lxc-create command. You can find the MAC address of the public interface by running sudo ip link list eth2 on the host:

1
2
3
4
vagrant@precise64:~$ sudo ip link list eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:f9:8f:2e brd ff:ff:ff:ff:ff:ff
vagrant@precise64:~$

Now that you know the MAC address (link/ether field in the output above) of the physical device, you can proceed with the container’s network configuration. Notice that you MUST specify the correct gateway – it’s the default gateway of the network on which the original public interface has been created. In my case simple, the public interface is on the same network as my laptop so simple check of Network Preferences gives me 192.168.1.254.

Cool, we have all the tools, now we can start doing some work! Let’s create the container and modify the network configuration as below:

1
2
3
4
5
6
7
8
9
vagrant@precise64:~$ sudo lxc-create -t ubuntu -n phys01 -- -r precise
vagrant@precise64:~$ sudo grep network /var/lib/lxc/phys01/config
lxc.network.type = phys
lxc.network.hwaddr = 08:00:27:f9:8f:2e
lxc.network.flags = up
lxc.network.link = eth2
lxc.network.ipv4 = 192.168.1.135/24
lxc.network.ipv4.gateway = 192.168.1.254
vagrant@precise64:~$

Now, let’s start the container and log on to it via lxc-console:

1
2
3
4
5
6
7
8
9
10
11
vagrant@precise64:~$ sudo lxc-start -n phys01 -d
vagrant@precise64:~$ sudo lxc-ls --fancy
NAME             STATE    IPV4                  IPV6  AUTOSTART
---------------------------------------------------------------
dmzmaster01      RUNNING  10.0.3.251, 10.0.5.5  -     NO
empty01          RUNNING  -                     -     NO
macvlanbridge01  RUNNING  10.0.5.3              -     NO
macvlanbridge02  RUNNING  10.0.5.4              -     NO
phys01           RUNNING  192.168.1.135         -     NO
veth01           RUNNING  10.0.3.118            -     NO
vagrant@precise64:~$ sudo lxc-console -n phys01 -t 2

From the above, we can see that phys01 container has been assigned correct IP. If our theory is correct and if the physical network interface has been “ripped off” and “moved” inside the container we should no longer see in on the host. So let’s have a look if that’s the case:

1
2
3
vagrant@precise64:~$ sudo ip link list eth2
Device "eth2" does not exist.
vagrant@precise64:~$

Bingo! It looks like the interface no longer exists on the Host. Let’s have a look if all looks good inside the container. We need to check the routing and whether we can ping the default gateway. Let’s fire up our well known lxc-console command and check the network set up of the container:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
ubuntu@phys01:~$ sudo ip address list
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 08:00:27:f9:8f:2e brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.135/24 brd 192.168.1.255 scope global eth2
    inet6 fe80::a00:27ff:fef9:8f2e/64 scope link
       valid_lft forever preferred_lft forever

22: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
ubuntu@phys01:~$
ubuntu@phys01:~$ sudo ip route show
default via 192.168.1.254 dev eth2
192.168.1.0/24 dev eth2  proto kernel  scope link  src 192.168.1.135
ubuntu@phys01:~$ 
ubuntu@phys01:~$ ping -c 2 192.168.1.254
PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
64 bytes from 192.168.1.254: icmp_req=1 ttl=64 time=260 ms
64 bytes from 192.168.1.254: icmp_req=2 ttl=64 time=1.78 ms

--- 192.168.1.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 1.785/131.265/260.745/129.480 ms
ubuntu@phys01:~$

Brilliant! Route to default gateway has been created and we can successfully ping it. That means that out packets should be routable across the whole 192.168.1.0/24 network! Which also means – in my case – that I should be able to:

  • ping my laptop from inside the container
  • ping the container’s IP from my laptop
  • given that the SSH is running in the container on publicly accessible interface which we have just moved into it and if access to the listening IP and ssh port is allowed, I should be able to ssh to it straight from my laptop

So let’s verify these claims. The IP address of my laptop is:

1
2
3
4
milos@dingops:~/Vagrant/blogpost$ ifconfig |grep -A 5 en0|grep inet
  inet6 fe80::7ed1:c3ff:fef4:da13%en0 prefixlen 64 scopeid 0x5
  inet 192.168.1.116 netmask 0xffffff00 broadcast 192.168.1.255
milos@dingops:~/Vagrant/blogpost$

Let’s ping it from inside the container:

1
2
3
4
5
6
7
8
9
10
ubuntu@phys01:~$ ping -c 3 192.168.1.116
PING 192.168.1.116 (192.168.1.116) 56(84) bytes of data.
64 bytes from 192.168.1.116: icmp_req=1 ttl=64 time=0.170 ms
64 bytes from 192.168.1.116: icmp_req=2 ttl=64 time=0.245 ms
64 bytes from 192.168.1.116: icmp_req=3 ttl=64 time=0.464 ms

--- 192.168.1.116 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.170/0.293/0.464/0.124 ms
ubuntu@phys01:~$

Perfect! Let’s try to ping the container from my laptop. As we know the container’s IP address is 192.168.1.135

1
2
3
4
5
6
7
8
9
10
milos@dingops:~/Vagrant/blogpost$ ping -c 3 192.168.1.135
PING 192.168.1.135 (192.168.1.135): 56 data bytes
64 bytes from 192.168.1.135: icmp_seq=0 ttl=64 time=0.535 ms
64 bytes from 192.168.1.135: icmp_seq=1 ttl=64 time=0.471 ms
64 bytes from 192.168.1.135: icmp_seq=2 ttl=64 time=0.289 ms

--- 192.168.1.135 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.289/0.432/0.535/0.104 ms
milos@dingops:~/Vagrant/blogpost$

Awesome! Now, I have installed ssh on the container and it’s listening on all container’s interfaces. Also there are no iptables rules present on the container so I should be able to ssh to it from anywhere on the same network:

1
2
3
4
ubuntu@phys01:~$ sudo netstat -ntlp|grep 22
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      140/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      140/sshd
ubuntu@phys01:~$ 

Let’s try to SSH into the container directly from my laptop:

1
2
3
4
5
6
7
8
9
10
11
milos@dingops:~/Vagrant/blogpost$ ssh ubuntu@192.168.1.135
The authenticity of host '192.168.1.135 (192.168.1.135)' can't be established.
RSA key fingerprint is de:03:8a:23:df:10:56:bc:77:1b:8e:4e:d0:13:ab:97.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.135' (RSA) to the list of known hosts.
ubuntu@192.168.1.135's password:
Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.2.0-23-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Mon Nov 11 00:01:40 2013
ubuntu@phys01:~$ 

Excellent! With the last example we have now concluded our walk through all possible LXC network type configurations!

Conclusion

I hope that this blog post helped you to understand the LXC networking at least a little bit more and that it will also motivate you to explore the LXC world even further, including the technologies backed by it such as Docker. If this blog post got you interested in this topic, I would recommend you to check out pipework, which is an awesome tool to automate creation of extra interfaces for containers (for “raw” LXC as well as Docker containers) written by Docker guru Jérôme Petazzoni.

Linux containers have been here for a long time and they are here to stay – in one form or another. Possibilities they offer are endless. Recently they have been mostly spoken about as a synonym of future of software/application delivery. However, we should not stop just there!

Grumpy ops guy in me is seeing containers not only as application delivery tool but also as a missing piece of Infrastructure delivery. We can test our Chef cookbooks/Puppet modules or what have you, we can model networks just like this blog post showed – we can literally model the full stack infrastructure with unquestionable ease and speed on a small budget. Recently I have attended DevOps days London where John Willis gave a presentation about Software Defined Networking (SDN). LXC can be an awesome helping piece in what John is suggesting to be the next step in Infrastructure-as-a-code.

So, to sum up. Let’s not be afraid of containers, let’s embrace them, understand them and I’m confident this will pay off in long term. Hopefully this site will help us on the road to container love….