Differences between revisions 8 and 9
Revision 8 as of 2011-10-14 09:12:16
Size: 6628
Revision 9 as of 2011-10-14 09:52:25
Size: 6819
Deletions are marked like this. Additions are marked like this.
Line 30: Line 30:
For RHEL6 read `Using Channel Bonding <http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/sec-Using_Channel_Bonding.html>`_.

Our current instructions are:

Using Multiple Ethernet Cards

Linux port bonding

Some machines, especially servers, are equipped with dual Ethernet ports on the motherboard. In order to use both ports for increased bandwidth and/or redundancy, Linux must be configured appropriately.

You should consult this very nice overview of the Linux bonding driver and the Linux Ethernet Bonding Driver HOWTO. The kernel-doc RPM also documents port bonding in the file /usr/share/doc/kernel-doc-*/Documentation/networking/bonding.txt or in http://www.kernel.org/doc/Documentation/networking/bonding.txt.

For CentOS5 Linux this is documented in 14.2.3 Channel_Bonding_Interfaces.

Loading the bonding kernel module

Read the Channel_Bonding_Interfaces manual and bonding_Module_Directives for the parameter values. Apparently it is preferred to enter bonding parameters in the file /etc/sysconfig/network-scripts/ifcfg-bond0.

For RHEL6 read Using Channel Bonding.

Our current instructions are: Add this line to /etc/modprobe.conf (not /etc/modules.conf):

alias bond0 bonding
options bond0 mode=6 miimon=100 updelay=200

The mode=6 refers to:

Sets an Active Load Balancing (ALB) policy for fault tolerance and load balancing.
Includes transmit and receive load balancing for IPV4 traffic.
Receive load balancing is achieved through ARP negotiation.

The miimon=100 refers to:

Specifies the MII link monitoring frequency in milliseconds.
This determines how often the link state of each slave is
inspected for link failures.  A value of zero disables MII
link monitoring.  A value of 100 is a good starting point.
The use_carrier option, below, affects how the link state is
determined.  See the High Availability section for additional
information.  The default value is 0.

The updelay=200 refers to:

Specifies the time, in milliseconds, to wait before enabling a
slave after a link recovery has been detected.  This option is
only valid for the miimon link monitor.  The updelay value
should be a multiple of the miimon value; if not, it will be
rounded down to the nearest multiple.  The default value is 0.

If you do not set the updelay parameter, the syslog may show this warning:

kernel: bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch

We have seen a few cases where the network becomes unreliable without the updelay parameter.

Modifying network scripts

In /etc/sysconfig/network-scripts/ new script files should be created:

  1. Create a new bonding device script file ifcfg-bond0 containing:

  2. The normal Ethernet interface scripts ifcfg-ethN should turn eth0 and eth1 into slave devices:

and similarly for eth1.

When using systemimager to clone the nodes these steps can be performed automatically using post-install scripts, e.g., /var/lib/systemimager/scripts/post-install/20q.eth_bonding_config script for the step 2.:


# Get the Systemimager variables
. /tmp/post-install/variables.txt

# Name of the central server on this network

# Correct the SystemImager eth0 config, turning eth0 into an Ethernet bonding device (bond0=eth0+eth1)
cp -p /etc/sysconfig/network-scripts/ifcfg-eth0 /tmp/ifcfg-eth0.BAK
cat <<EOF > /etc/sysconfig/network-scripts/ifcfg-eth0

# Finished

Restart network services

At this stage the network should be restarted by service network restart, or the system should be rebooted, in order to activate the bond0 device in stead of the normal eth0 device.

Port bonding troubleshooting

No DHCP response for the bond0 device

If you've set up the bond0 device for DHCP by BOOTPROTO=dhcp and you don't get a DHCP response from the server, then it may be because bond0 uses the first Ethernet device (usually eth0) for DHCP. If your DHCP server is configured with the Ethernet MAC-address of another device (for example, eth1), then DHCP will fail.

This scenario happens when the Linux kernel has swapped around the Ethernet devices eth0 and eth1 opposite to what the hardware thinks. Check this by:

ifconfig -a

to see the MAC-addresses of the network interfaces.

SystemImager can correct this problem by explicit naming of network interfaces as described in the Troubleshooting section A possible solution to fix network interface naming.

You learn the PCI device names and their MAC-addresses by, for example:

udevinfo -a -p /sys/class/net/eth0

and then add appropriate configuration lines to the file /etc/udev/rules.d/60-net.rules.

To implement this we have made a SystemImager post-install script for the SL2x170zG6 nodes in /var/lib/systemimager/scripts/post-install/15d.eth_device_names with the essential content:

# Create PCI device name to ethX names for HP SL2x170zG6:
ACTION=="add", SUBSYSTEM=="net", BUS=="pci", ID=="0000:05:00.1", NAME="eth0"
ACTION=="add", SUBSYSTEM=="net", BUS=="pci", ID=="0000:05:00.0", NAME="eth1"
# Append original device rules
# Write new device rules file (with backup)

The PCI device addresses 0000:05:00.x will vary depending on the hardware.

Niflheim: MultipleEthernetCards (last edited 2017-01-06 09:15:53 by OleHolmNielsen)