This configuration has been tested with torque-2.3.6 and maui-3.2.6p21
if not yet done, go to configuring rpmbuild,
download torque-2.3.6.tar.gz, and:
cd /tmp tar zxf ~/torque-2.3.6.tar.gz chown -R root.root torque-2.3.6 # security chmod -R go-w torque-2.3.6 # security cd torque-2.3.6 ./configure --disable-rpp -disable-gui --without-tcl
Note that this will configure torque to install under /usr/local.
build RPMs by doing:
which will create the RPMs under /root/RPMS/i386/torque-*.
Skip this step if not installing on "dulak-server": copy the RPMs to the dulak-server:/home/dulak-server/rpm directory, and:
install RPMS with you so that you get an installation log:
cd /root/RPMS/i386/ yum localinstall --nogpgcheck torque-2*.rpm torque-mom-2*.rpm torque-client-2*.rpm \ torque-docs-2*.rpm torque-devel-2*.rpm torque-server-2*.rpm
Skip this step if not installing on "dulak-server": Make sure that /var/spool/torque/server_name contains dulak-server.dulak-cluster.fysik.dtu.dk.
chmod go+r /etc/profile.d/torque.*
Skip this step if not installing on "dulak-server": copy to the "Golden Client":
scp /etc/profile.d/torque.* n001:/etc/profile.d
Initialize the PBS/Torque server once only with:
pbs_server -t create
qmgr < qmgr_print_server
Note: this will setup the following queues: hour (default), day, halfday, twodays, and week.
In case of workstation installation remove line with *dulak-cluster* references, and replace dulak-server with localhost.
download node definitions file nodes to /var/spool/torque/server_priv/.
Note: that "Golden client" does participate in the node pool, if you prefer to remove it from the pool make sure that pbs_mom is not running on it ssh n001 "service pbs_mom stop".
The nodes have pentium property (properties are useful if the cluster needs to be extended by other type of nodes to avoid load-balancing problems).
Download (you have to register) maui-3.2.6p21.tar.gz to ~/rpmbuild/SOURCES, change the following in the maui specfile (~/rpmbuild/SPECS/maui-*.spec):
and keep is secret.
cd ~/rpmbuild/SPECS rpmbuild -bb maui-3.2.6p21.spec
Skip this step if not installing on "dulak-server": copy the RPMs to the dulak-server:/home/dulak-server/rpm directory.
yum localinstall --nogpgcheck /root/RPMS/*/maui*.rpm
make sure that the following files/directories exist:
cd /var/spool/maui/traces/; touch Resource.Trace1 Workload.Trace1 mkdir /var/spool/maui/log
change all occurences of localhost in /var/spool/maui/maui.cfg into dulak-server.dulak-cluster.fysik.dtu.dk (use your hostname if installing a workstation):
sed -i 's/localhost/dulak-server.dulak-cluster.fysik.dtu.dk/g' /var/spool/maui/maui.cfg
change /var/log/maui.log in /var/spool/maui/maui.cfg into /var/spool/maui/log/maui.log:
sed -i 's#/var/log/maui.log#/var/spool/maui/log/maui.log#' /var/spool/maui/maui.cfg
reflect (only if needed - check the file /etc/init.d/maui first) the installation under /usr/local in /etc/init.d/maui:
sed -i 's#/usr/sbin#/usr/local/sbin#g' /etc/init.d/maui sed -i 's#/usr/bin/schedctl#/usr/local/bin/schedctl#' /etc/init.d/maui
On Golden Client
cd /home/dulak-server/rpm rpm -ivh torque-2*.rpm torque-mom-2*.rpm torque-client-2*.rpm
Make sure that /var/spool/torque/server_name contains dulak-server.dulak-cluster.fysik.dtu.dk.
synchronize ntp with "dulak-server" (needed by maui) by adding:
service ntpd restart
wait ~30 minutes (until you see time reset in /var/log/messages).
chmod u+x config epilogue health_check_script scp config epilogue health_check_script n001:/var/spool/torque/mom_priv/
$usecp *:/home/dulak-server /home/dulak-server
in config - it's necessary for the batch system to be able to copy jobs output files back. If your nodes have more than one core you must change $ideal_load and $max_load variables.
Remove the $usecp line when installing on workstation, use also $pbsserver localhost and change arch to match you architecture. In case of workstation installation copy the modified config directly to workstation's /var/spool/torque/mom_priv/. Two other files do not need modifications.
Warning: in torque-2.3.6 /etc/rc.d/init.d/pbs_mom reload case contains a bug: replace SIGHUP with HUP,
Skip this step if not installing on "dulak-server": copy the fixed file to the "Golden Client":
scp /etc/rc.d/init.d/pbs_mom n001:/etc/rc.d/init.d/
start PBS mom on the "Golden Client" (do `service pbs_mom restart` in case of installing workstation)::
ssh n001 "service pbs_mom restart"
Note that on a production system, after changing config file on compute nodes pbs_mom must not be restarted with service pbs_mom restart while the pbs_mom is still running a batch job!.
Instead (in principle, but it seems not working!) a restart can be scheduled:
momctl -q enablemomrestart=1 -h :ALL
so an immediate reload need to be performed:
service pbs_mom reload
qterm -t quick pbs_server
check node status:
Make sure that all the running nodes (on dulak-cluster: currently n001 - the "Golden Client") have free status.
service maui start
Go to installing software.