19 October 2010

IPMP and 802.1q on Linux

Described herein is the practice of using IPMP (IP MultiPathing) and VLAN
tagged (802.1q trunking) interfaces layered together for the purpose of
port considerations (read: consolidation) and HA (high availability).
The following points should be noted relevant to the host used in the
following examples:

        OS:             Linux (CentOS 5.2)
        Kernel:         2.6.18-92.el5
        Host:           hostA
        shell prompt:   hostA [0]
        phys net ints:  eth4, eth5
        primary phys:   eth4
        logical ints:   bond0, bond0.219, bond0.348
        VLANs:          219 (10.0.1.0/24), 348 (10.4.3.0/24)
        ipaddr:         10.0.1.23 (vlan 219), 10.4.3.89 (vlan 348)
        channel mode:   active-backup

The layered layout of VLAN tagged IPMP interfaces follows:

        3:  IP addr (network access)            10.0.1.23, 10.4.3.89
        2:  vlan tagged interfaces              bond0.219, bond0.348
        1:  channel bonded interface            bond0
        0:  ethernet interfaces                 eth4, eth5

Configuration of the above setup can be done with or without a reboot,
however it may be simpler with a reboot as it ensures proper ordering of
the loading of the relevant kernel modules.  For the purpose of this doc,
configuration will be done under the pretense of a reboot.  To begin,
/etc/modprobe.conf will need to be updated, adding the following two
lines:

        alias bond0 bonding
        options bond0 mode=1 miimon=100 primary=eth4

The above tells the Linux kernel to enable channel bonding on bootup and
create the logical bond interface of bond0.  The second line sets options
for the logical interface to mode=1 (active-backup policy), miimon=100
(check the MII link every 100 milliseconds for availability of the
active interface), and primary=eth4 (sets the primary interface to eth4).
In order to have everything represented appropriately in services like
SNMP, such as IP addr to interface association, the bonding alias must
come before the ethernet aliases in modprobe.conf.  The options would
fall, subsequently, after the ethernet aliases, as at least one interface
is being specified, thus the module for that interface must already be
loaded by the kernel.  The following is a sample /etc/modprobe.conf,
in full:

        hostA [0] cat /etc/modprobe.conf
        alias bond0 bonding
        alias eth0 e1000e
        alias eth1 e1000e
        alias eth2 e1000e
        alias eth3 e1000e
        alias eth4 forcedeth
        alias eth5 forcedeth
        options bond0 mode=1 miimon=100 primary=eth4
        alias scsi_hostadapter 3w-9xxx
        alias scsi_hostadapter1 sata_nv
        alias scsi_hostadapter2 usb-storage

Each involved network interface must now be configured.  As the
intention is a VLAN tagged IPMP connection with no further complexities,
both physical network interfaces will receive minimal configuration.
To retain this across reboots, the following two files will need to be
modified, respective to the interface:

        /etc/sysconfig/network-scripts/ifcfg-eth4
        /etc/sysconfig/network-scripts/ifcfg-eth5

The contents of each are as follows:

        hostA [0] cat /etc/sysconfig/network-scripts/ifcfg-eth4
        # nVidia Corporation MCP55 Ethernet
        DEVICE=eth4
        HWADDR=00:30:48:7E:ED:54
        ONBOOT=yes
        BOOTPROTO=none
        TYPE=Ethernet
        MASTER=bond0
        SLAVE=yes

        hostA [0] cat /etc/sysconfig/network-scripts/ifcfg-eth5
        # nVidia Corporation MCP55 Ethernet
        DEVICE=eth5
        HWADDR=00:30:48:7E:ED:55
        ONBOOT=yes
        BOOTPROTO=none
        TYPE=Ethernet
        MASTER=bond0
        SLAVE=yes

Of note, the interfaces are configured to be brought online at bootup,
though not configured with an IP addr or to use DHCP.  This is
necessary to support our channel bonded (IPMP) logical interface.
The two parameters necessary for bonding are 'MASTER=bond0' (specifying
which logical bonded interface the physical interface supports) and
'SLAVE=yes' (to detail the role of this physical interface relative to
the bonded interface).  The bonded interface also must be configured:

        hostA [0] cat /etc/sysconfig/network-scripts/ifcfg-bond0
        # bond0, coupling of eth4 and eth5
        DEVICE=bond0
        ONBOOT=yes
        BOOTPROTO=none
        TYPE=Ethernet

Similar to the physical interfaces, the bonded interface (bond0) is
set to online at bootup, though not configured with an IP addr or to
use DHCP.  That is because VLAN tagged interfaces will be created on
top of this.  Of note, a HWADDR statement is not included because bonded
interfaces utilize the HWADDR of one of the underlying supporting physical
interfaces, typically the first attached.  Now that the IPMP interface
(bond0) has be configured, VLAN tagged virtual interfaces can be layered
over bond0, just as they would normally for a physical interface.
The configuration for such would appear as follows:

        hostA [0] cat /etc/sysconfig/network-scripts/ifcfg-bond0.219
        # vlan 219, 10.0.1.0/24
        DEVICE=bond0.219
        ONBOOT=yes
        TYPE=Ethernet
        IPADDR=10.0.1.23
        VLAN=yes
        NETWORK=10.0.1.0
        NETMASK=255.255.255.0
        BROADCAST=10.0.1.255
        IPV6INIT=no

        hostA [0] cat /etc/sysconfig/network-scripts/ifcfg-bond0.348
        # vlan 348, 10.4.3.0/24
        DEVICE=bond0.348
        ONBOOT=yes
        TYPE=Ethernet
        IPADDR=10.4.3.89
        VLAN=yes
        NETWORK=10.4.3.0
        NETMASK=255.255.255.0
        BROADCAST=10.4.3.255
        IPV6INIT=no

The above configurations relevant to VLANs 219 and 348 are no different
that any other VLAN tagged physical interface, aside from specifying
the DEVICE as a bond device.

Now that all configuration has been done, the only thing necessary to
still do is simply reboot the host and begin using the new VLAN tagged
IPMP interfaces, bond0.219 and bond0.348.  Following the reboot, the
output from 'ifconfig' regarding the involved interfaces should appear
similar to:

hostA [0] ifconfig -a
bond0     Link encap:Ethernet  HWaddr 00:30:48:7E:ED:54
          inet6 addr: fe80::230:48ff:fe7e:ed54/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:5802545 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1259 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2159497822 (2.0 GiB)  TX bytes:153407 (149.8 KiB)

bond0.219 Link encap:Ethernet  HWaddr 00:30:48:7E:ED:54
          inet addr:10.0.1.23  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fe7e:ed54/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:297071 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:52265326 (49.8 MiB)  TX bytes:140243 (136.9 KiB)

bond0.348 Link encap:Ethernet  HWaddr 00:30:48:7E:ED:54
          inet addr:10.4.3.89  Bcast:10.4.3.255  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fe7e:ed54/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:297071 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:52265326 (49.8 MiB)  TX bytes:140243 (136.9 KiB)

[snip...]

eth4      Link encap:Ethernet  HWaddr 00:30:48:7E:ED:54
          inet6 addr: fe80::230:48ff:fe7e:ed54/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2894853 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1180 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1084599554 (1.0 GiB)  TX bytes:147569 (144.1 KiB)
          Interrupt:114 Base address:0x6000

eth5      Link encap:Ethernet  HWaddr 00:30:48:7E:ED:54
          inet6 addr: fe80::230:48ff:fe7e:ed54/64 Scope:Link
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2907701 errors:0 dropped:0 overruns:0 frame:0
          TX packets:100 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1074898808 (1.0 GiB)  TX bytes:10246 (10.0 KiB)
          Interrupt:122 Base address:0x8000

A check of /var/log/messages after bootup should show that the kernel
had no issues with the above setup as well:

        Feb 24 16:12:49 hostA kernel: Ethernet Channel Bonding Driver: v3.2.4 (January 28, 2008)
        Feb 24 16:12:49 hostA kernel: bonding: MII link monitoring set to 100 ms
        Feb 24 16:12:49 hostA kernel: bonding: bond0: Adding slave eth4.
        Feb 24 16:12:49 hostA kernel: bonding: bond0: making interface eth4 the new active one.
        Feb 24 16:12:49 hostA kernel: bonding: bond0: first active interface up!
        Feb 24 16:12:49 hostA kernel: bonding: bond0: enslaving eth4 as an active interface with an up link.
        Feb 24 16:12:49 hostA kernel: bonding: bond0: Adding slave eth5.
        Feb 24 16:12:49 hostA kernel: bonding: bond0: enslaving eth5 as a backup interface with an up link.
        Feb 24 16:12:49 hostA kernel: 802.1Q VLAN Support v1.8 Ben Greear [greearb@candelatech.com]

Link transitions can also be seen within /var/log/messages.  The following
logs are the result of physically removing the ethernet cable from eth4,
forcing a link transition and then reconnecting it.  The same was also
done for eth5, as can be seen as well:

        # removal of ethernet cable from eth4
                Feb 24 16:15:26 hostA kernel: eth4: link down.
                Feb 24 16:15:26 hostA kernel: bonding: bond0: link status definitely down for interface eth4, disabling it
                Feb 24 16:15:26 hostA kernel: bonding: bond0: making interface eth5 the new active one.

        # reconnecting of ethernet cable to eth4
                Feb 24 16:15:31 hostA kernel: eth4: link up.
                Feb 24 16:15:31 hostA kernel: bonding: bond0: link status definitely up for interface eth4.
                Feb 24 16:15:31 hostA kernel: bonding: bond0: making interface eth4 the new active one.

        # removal of ethernet cable from eth5
                Feb 24 16:16:37 hostA kernel: eth5: link down.
                Feb 24 16:16:37 hostA kernel: bonding: bond0: link status definitely down for interface eth5, disabling it

        # reconnecting of ethernet cable to eth5
                Feb 24 16:16:54 hostA kernel: eth5: link up.
                Feb 24 16:16:54 hostA kernel: bonding: bond0: link status definitely up for interface eth5.

While the above described solution is great towards port consolidation
and availability, one possible drawback must be noted.  Due to nature of
Spanning Tree Protocol (STP) (at the network switch) as well as due to
the usage of 802.1q trunking on the switch ports supporting the above,
it can take between 30 - 60 seconds before traffic resumes following a
link transition on the host.  This is due to the potential network issues
in using 'portfast' on these particular ports.  Also of note, without the
usage of VLAN tagging, portfast could otherwise be enabled allowing link
transitions to appear instantaneous.  Regardless, link transitions on the
Linux host side occur immediately.  Should instantaneous link transitions
be desirable for the above documented setup, a request needs to
be made of your friendly neighborhood network engineering (NetEng)
team to enable Spanning Tree's 'portfast trunk' configuration on the
particular switch ports in question.  NetEng may be hesitant to do this,
though for good reason, and will only consider the request based upon
the fact of a single server connecting to the two ports in question.
Please keep this in mind during your request.  Also note, it is your
responsibility to inform NetEng of physical changes regarding those
switch ports, such as the host being removed, being moved to another
rack, thus possibly new network connections, or a new host re-using the
existing network connections.  This is to allow NetEng to appropriately
configure / unconfigure the switch ports configured with 'portfast trunk'
so that network issues don't occur (such as loops).

3 comments:

Anonymous said...

Dude, it's not IPMP! It just bonding and VLAN tagging.

troy said...

Anonymous,

You are absolutely correct. This is called channel bonding under Linux, whereas under Solaris it is call IPMP (IP MultiPathing), and under ESX it is called nic teaming, etc. Regardless of naming, we're basically saying the same thing. Both channel bonding and IPMP combine multiple network interfaces into a logical aggregation allowing for redundancy and / or load balancing. Given my initial familiarity with this type of setup under Solaris, I still tend to refer to this as IPMP, even though in Linux it is technically called channel bonding.

-troy

chris scott said...

NO ipmp and bonding are not the same thing at all. Bonding is purely a layer two thing where as IPMP works at layer 3 as well. Link aggregation can only detect local link failures and deal with them, not link failures further up in the switch fabric, eg between the access and distribution switch layers. IPMP can do this. In a simple setup you would have 3 ips. One bound to each NIC and a floating one. The ones bound to the nic are usually deprecated (so apps dont send traffic out with them as a source). The floating one can be bound as an alias to either physical NIC. The default route (or nominated ip) is then pinged via each interface. This is to test each switching route. If it cant be reached via siad interface, the floating ip(s) are moved to another interface that has passed the test. The failed interface is still tested and will be marked good in future if it the ping test starts working again. This can of course be stacked onto of link aggregation and vlan trunking