skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS

Using SGE

Grid

Our grid hardware and queues are scheduled via Sun Grid Engine. This software provides a robust scheduling system supporting a broad range of architectures and hardware configurations.

How to use the Grid

To submit jobs, you need to ssh into the machine "sge". Once there, go to the folder /com/sge/default/common and source either settings.csh or settings.sh depending on which shell you're using. This will setup a few environment variables needed for sge to run correctly.

The main way to submit jobs is through the qsub command. It requires an executable name as an argument, but this script can take command line arguments and pass those along to your executable. You can read the man page for qsub once you source the appropriate settings file (instructions above.) You can find some examples under /com/sge/examples/jobs.

Useful options

  • To determine requestable resources, use qconf -sc.
  • To request that your job only be run on 32 bit or 64 bit machines, use the commands qsub -l arch=lx24-x86 <script> or qsub -l arch=lx24-amd64 <script>, respectively.
  • To request a specific queue, use the command qsub -q <queuename> <script>.
  • To change the circumstances under which mail is sent to the job owner, use the "-m" option. In particular, to find out when your jobs are suspended or aborted, use the options "-m as". This options work with qsub, qsh, qrsh, qlogin, and qalter only.

MPI

MPI is a library specification for interprocess message-passing. It is designed for high performance on both massively parallel machines, grids, and on workstation clusters. Our grid currently supports the LAM/MPI library from the Indiana University (http://www.lam-mpi.org).

Running an MPI Job

Probably the best way to learn how to run an MPI job on the grid is through an example.
  • Download the sample mpihello.c source code and the tester.mpi.sh job script.
  • Compile the source by running mpicc -lm -o mpihello mpihello.c. Make sure you compile on a machine running the appropriate architecture; the majority of the grid nodes are running a 64 bit OS (try using bell or babbage).
  • Make sure the job script is executable and edit it to reflect the mpihello executable you just compiled.
  • Run the job on the grid by issuing qsub -cwd -l arch= -pe lam <# nodes> tester_mpi.sh; where is the appropriate architecture and <# nodes> is the number of nodes you want the job to run on.
  • This should create two output files tester_mpi.o and tester_mpi.sh.po. The ".po" version should have the output from the job control and the ".o" version should have the output from the actual program. In our example case, it should contain a "Hello World" printout from each of the the nodes.

    Interactive Jobs

    The commands qrsh and qlogin, allow for interactive programs to be run on the grid. The use of these commands is discouraged because running an interactive job does require a compute slot, thus preventing other jobs from running. If you do need to run an interactive job, make sure you close your session at the end of the job.
    • qrsh

      allows for scripts to be run interactively (without X forwarding).
    • qlogin

      allows a user to actually log in graphically to a grid machine with X forwarding enabled.
    By default, both of these commands will try to schedule the login immediately and fail if they are unsuccessful. If you would rather have the command wait for the next available slot, then add -now n to the command line. As yet, the command qsh has not been implemented.

    Getting help

    You can find the sge reference doc at /com/sge/doc/user-guide.pdf In order to access the manual pages of the various sge-related commands, you need to source the correct settings file in /com/sge/default/common. If you use tcsh or csh, source settings.csh. If you use bash source settings.sh. If you need further help, you can try subscribing to the compute mailing list, and posting a question. If you're still having trouble, send mail to problem.

Page Owner: Jonathan Sailor Last Modified: Fri Jun 6 17:50:08 2008