skip navigation

This page looks better in modern browsers. Please upgrade.

Brown Home Brown Home Brown Home Brown CS

Grid Resources

Grid

Hardware

Our grid is mainly composed of 65 Intel based, dual-core, dual processor, Dell Poweredge 1855 blade servers. Each blade has 8G of memory and a 300G local disk.

  • dblade1-60 run 64-bit Linux. To prevent memory issues with jobs that require additional memory on 64-bit nodes, these machines have an additional 123G of swap.
  • dblade65-70 run 32-bit Linux

We also have 11 AMD based nodes reserved for timing experiments:

  • zuul, which has 2 Opteron processors and 3GB of RAM
  • banshee14, which has two Athlon processors and 2GB of RAM
  • banshee2, 3, 5, 9, 13, 15, 16, 20, and 21 each of which has a single Athlon processor and 2GB of RAM

Additionally, our grid contains:

  • rockyroad, an Intel based, dual-core, four-way, Dell Poweredge 6800 server. This node has 32G of memory and a 1T of local disk.
  • ethel, a two-way AMD Opteron node with 6.8GB of RAM and a 250GB disk
  • fred2, a two-way, dual-core AMD Opteron node with 13GB of RAM

Queues

All of this equipment is scheduled via Sun Grid Engine. SGE abstracts our hardware into a series of queues, to which jobs are submitted, and then schedules the jobs across the grid.

Our primary queues, each of which runs on our blades and ethel, are:

  • short.q

    Every compute node has two slots, per processor, in this queue. There is a hard limit set at 2 cpu hours for jobs in this queue.
  • long.q

    Every compute node has two slots, per processor, in this queue. There is a hard limit set at 24 hours for jobs in this queue, jobs are started with a nice level of 2, and this queue stops accepting jobs if the short.q is full.
  • verylong.q

    Every compute node has one slot, per processor, in this queue. There is no hard limit for cpu time, jobs are started with a nice level of 5, and this queue is suspended if either short.q or long.q are full.

We also have four special queues:

  • exclusive.q

    Jobs submitted to the exclusive queue will get exclusive use of one of the banshees or zuul. This queue is intended for timing experiments, benchmarking, or anything else that requires exclusive use of a machine. All our exclusive nodes are 32-bit.
  • benchmark.q

    This queue is intended only for benchmarking tests which require repeatability. Jobs scheduled on this queue will be run on the 60 64-bit blades, and 5 32-bit blades. Each job in the benchmarking queue will get exclusive use of one blade (all four cores) for 60 minutes of wall clock time. Existing jobs on that blade (from the other queues) will be suspended. If you find your jobs getting suspended often by jobs from the benchmark queue, first discuss it with the owner of the benchmark jobs before escalating. You must contact problem to be added to the list of users who can run on this queue.
  • highmem.q

    This queue is intended to include machines with high memory. This queue was created because some jobs with high memory requirements were being starved for cycles because their memory requirements were never being met. Currently, only rockyroad, ethel, and fred2 are members of this queue, but we may add more machines as the grid continues to grow. Some of the machines in this queue also have slots in other queues, but these jobs from those queues will be suspended if users start running jobs in the highmem.q.
  • idle.q

    Jobs submitted to idle.q will run on normal department machines or some grid nodes when those machines are idle. Currently, idle.q only contains machines in the Internet Lab, but we are working to expand idle.q to span the department.

Other computing resources

The department also runs quahog, a system allowing you to run jobs on idle department machines. Quahog is deprecated, and will be replaced with SGE's idle.q.

Page Owner: Jeff Coady Last Modified: Fri Jun 6 18:04:47 2008