Old System administration
This page contains some older and possibly obsolete information, and it's kept here as a reference only.
- Cluster installation software
- Cloning of nodes with SystemImager
- Turning PC nodes into servers
- Troubleshooting of node hardware
- Optimizing Linux services
There are many ways to install Linux on a number of nodes, and many toolkits exist for this purpose. The NIFLHEIM cluster uses the SystemImager toolkit (see below).
Cluster installation toolkits which we have seen over the years include the following:
The NIFLHEIM cluster uses the SystemImager toolkit on a central server to create an image of a Golden Client node that has been installed in the usual way using a distribution on CD-ROM (CentOS Linux in our case). The SystemImager is subsequently used to install identical images of the Golden Client on all of the nodes (changing of course hostname and network parameters).
We have some notes on SystemImager_Installation.
When you have downloaded the Golden Client disk image to the image server you can find a suitable Linux kernel and initial ram-disk, the so-called UYOK (Use Your Own Kernel) files in SystemImager slang, in this directory:
Copy the files kernel and initrd.img to the image server directory /tftpboot, possibly renaming these files so that they describe the type of golden client on which they were generated (you may end up with a number of such kernel and initrd.img files over time).
SystemImager allows you to boot and install nodes using the nodes' Ethernet network interface. You will be using PXE, the Intel-defined Pre-Boot eXecution Environment which is implemented in all modern Ethernet chips. The following advice works correctly for Ethernet chips with built-in PXE, but for older version of PXE you may have to install a pxe daemon RPM on the server (not discussed any further here). The pxe daemon is not necessary with modern PXE versions.
Please consult this page for detailed information about PXE:
With the above setup you're now ready to boot and install a fresh node across the network using SystemImager.
Make sure that the PC BIOS has been set up for a boot order where network/PXE boot precedes booting from hard disk. Use a screen to monitor the installation process (for the first node or two, at least). Monitor the DHCP server's /var/log/messages file to ensure that the client node actually requests and is assigned a proper IP address, and that the client downloads the kernel and initrd.img files successfully by TFTP.
The client node's PXE firmware will now transfer the small Linux kernel and ram-disk and begin the installation process by transferring the Golden Client disk image using rsync.
After the installation is completed, and if you don't use the Automated network installation below, you must change the boot order so that network/PXE booting no longer preceeds the booting from hard disk. Reboot the node, and watch it boot Linux from its own hard disk. The IP address should be assigned correctly by the DHCP server.
When you install brand new node hardware for the first time, the hardware real-time clock is probably off by a few hours. It is mandatory to have a correct system date on all compute nodes and servers, otherwise you'll see the Torque batch system having problems, and NFS may be broken as well.
Use pdsh (see the section on parallel commands below) to examine the system date on all nodes in question, for example nodes a001-a140:
pdsh -w a[001-140] 'date +%R' | dshbak -c ---------------- a[001-140] ---------------- 14:20
Here the clocks are OK.
To synchronize all hardware clocks you need an NTP time server, let's assume it is ntpserver. Update all clocks on the desired nodes by, for example:
pdsh -w a[001-140] 'service ntpd stop; ntpdate ntpserver; service ntpd start'
If the Torque client daemon is already running, you need to restart it after fixing the system date, for example:
pdsh -w a[001-140] 'service pbs_mom restart'
Many PCs can be turned into compute cluster nodes by configuring their BIOS to operate without keyboard and mouse, and perform network/PXE booting at power-up time. It is also important to be able to save and restore BIOS configuration to removable media (such as diskette) for reliable replication of BIOS setup.
When selecting PCs as cluster nodes, the hardware ought to be suitable for mounting on shelves. Therefore we use these conditions for selecting appropriate PC hardware:
- Cabinet must be small enough, but not so small as to prevent efficient cooling. With modern fast and hot PCs, cooling is the single most critical factor for reliable operation ! The cabinet must be able to stand on its side without a floor stand (because of the physical space required).
- Air flow must be front to back, as true servers do it, and not a mish-mash of fans blowing air at a number of places around the cabinet. The HP/Compaq EVO d530 coms to mind...
With a substantial number of nodes in a cluster, hardware failures are inevitable and must be dealt with efficiently. We list some useful tools below.
The primary disk analysis tool for HP/Compaq PCs is available from the BIOS menus (press F10 at boot) under the item Storage->IDE DPS Self-test. This built-in diagnostics can scan a disk for errors.
In addition, the various disk vendors who supply harddisks have their own diagnostics tools. We refer to some of their home pages:
Seagate: SeaTools Suite 2002.
Hitachi/IBM: Drive Fitness Test (DFT).
Hint: If you want the default DFT boot to select ATA only, modify MENUDEFAULT at the top of the diskette's CONFIG.SYS to read:[menu] MENUCOLOR=7,1 SUBMENU=SCSI_MENU, SCSI and ATA support MENUITEM=ATA, ATA support only MENUDEFAULT=ATA, 5 ...
Western Digital: Data Lifeguard.
Fujitsu: Hard Drives - Software Utilities.
Both memory errors and CPU errors can be detected by a very useful tool Memtest86 - A Stand-alone Memory Diagnostic which is available under the Gnu Public License. The most modern version of this tool is Memtest86+.
This tool is usually booted from a diskette or CD-ROM drive. The memory tester will run for a long time with numerous tests, and will loop indefinitely. Typically, serious errors show up immediately, whereas some errors have shown up only intermittently after testing for 12-24 hours. The Memtest86 tool is much better than the vendor's diagnostics memory testing.
It is possible to boot up a PC using PXE network-booting and run immediately the Memtest86 executable from the network. First, please refer to our SystemImager page for how to set up PXE booting and possibly automate the selection of boot-images from the central server. Second, define a new Memtest86 PXE-boot method by creating the file /tftpboot/pxelinux.cfg/default.memtest with the following content:
default memtest label memtest kernel memtest86
The memtest86 kernel to be booted should be copied from the Memtest86+ source tarball available at http:/www.memtest.org. Unpack the tarball and copy the file precomp.bin to the file /tftpboot/memtest86. In some versions of Memtest86+ the precomp.bin file is outdated and you need to do a make and copy the file memtest.bin in stead.
Now you use the PXE-booting tools described above to let the central server determine which image will be booted when the PC does a PXE-boot. Basically, the hex-encoded IP-address of the PXE-client must be a soft-link to the file default.memtest, thus causing the PXE-client to boot into Memtest86.
The standard Linux desktop/server services provided by your Linux installation should be pruned so that services not strictly required on a compute node are disabled. This will ensure stability of the software, and improve node performance because daemon processes won't interfere with the system operation and possibly be causes of operating system "jitter".
On a CentOS5/RHEL5 compute node we recommend to disable the following standard services:
chkconfig hidd off chkconfig avahi-daemon off chkconfig haldaemon off chkconfig bluetooth off chkconfig cups off chkconfig ip6tables off chkconfig iptables off chkconfig xfs off chkconfig yum-updatesd off
Another standard Linux service to consider is the cpuspeed daemon. cpuspeed dynamically controls CPUFreq, slowing down the CPU to conserve power and reduce heat when the system is idle, on battery power or overheating, and speeding up the CPU when the system is busy and more processing power is needed.
chkconfig cpuspeed on