Using SGE
Our grid hardware and queues are scheduled via
Sun Grid Engine. This
software provides a robust scheduling system supporting a broad range
of architectures and hardware configurations.
How to use the Grid
To submit jobs, you need to
ssh into the machine "sge". Once there, go to
the folder
/com/sge/default/common and source either
settings.csh or
settings.sh depending on which shell you're
using. This will setup a few environment variables needed for sge to run
correctly.
The main way to submit jobs is through the qsub command. It requires
an executable name as an argument, but this script can take command line
arguments and pass those along to your executable. You can read the man page
for qsub once you source the appropriate settings file (instructions above.)
You can find some examples under /com/sge/examples/jobs.
Useful options
-
To determine requestable resources, use qconf -sc.
-
To request that your job only be run on 32 bit or 64 bit machines, use
the commands qsub -l arch=lx24-x86 <script> or
qsub -l arch=lx24-amd64 <script>, respectively.
-
To request a specific queue, use the command
qsub -q <queuename> <script>.
-
To change the circumstances under which mail is sent to the job owner,
use the "-m" option. In particular, to find out when your jobs
are suspended or aborted, use the options "-m as". This options
work with qsub, qsh, qrsh, qlogin, and qalter only.
MPI
MPI is a library specification for interprocess message-passing. It is designed
for high performance on both massively parallel machines, grids, and on workstation
clusters. Our grid currently supports the LAM/MPI library from the Indiana University
(
http://www.lam-mpi.org).
Running an MPI Job
Probably the best way to learn how to run an MPI job on the grid is through an
example.
- Download the sample mpihello.c
source code and the tester.mpi.sh
job script.
- Compile the source by running mpicc -lm -o mpihello mpihello.c. Make sure you
compile on a machine running the appropriate architecture; the majority of the
grid nodes are running a 64 bit OS (try using bell or babbage).
- Make sure the job script is executable and edit it to reflect the mpihello
executable you just compiled.
- Run the job on the grid by issuing qsub -cwd -l arch= -pe lam <# nodes> tester_mpi.sh;
where is the appropriate architecture and <# nodes> is the number of nodes you
want the job to run on.
This should create two output files tester_mpi.o and tester_mpi.sh.po.
The ".po" version should have the output from the job control and the ".o" version should
have the output from the actual program. In our example case, it should contain a
"Hello World" printout from each of the the nodes.
Interactive Jobs
The commands qrsh and qlogin, allow for interactive programs
to be run on the grid. The use of these commands is discouraged because
running an interactive job does require a compute slot, thus preventing other
jobs from running. If you do need to run an interactive job, make sure you
close your session at the end of the job.
-
qrsh
allows for scripts to be run interactively (without X forwarding).
-
qlogin
allows a user to actually log in graphically to a grid
machine with X forwarding enabled.
By default, both of these commands will try to schedule the login immediately and
fail if they are unsuccessful. If you would rather have the command wait for the
next available slot, then add -now n to the command line. As yet, the
command qsh has not been implemented.
Getting help
You can find the sge reference doc at /com/sge/doc/user-guide.pdf In
order to access the manual pages of the various sge-related commands, you need
to source the correct settings file in /com/sge/default/common. If you
use tcsh or csh, source settings.csh. If you use bash source
settings.sh. If you need further help, you can try subscribing to the
compute mailing
list, and posting a question. If you're still having trouble, send mail to
problem.