Talk
"Explicit Control in a Batch-Aware Distributed Filesystem"
Douglas Thain, University of Wisconsin - Madison
Thursday, March 11, 2004 at 12:00 P.M.
Lubrano Conference Room
Science and industry rely on large-scale batch computing to solve important strategic problems with brute force. However, batch computing has never sufficiently supported workloads with significant data needs. A conventional but inadequate solution is to couple a stock batch system to a stock distributed filesystem. This approach is limited in scalability and reliability and has forced users to either limit their data needs or distribute data manually.
To solve this problem, we introduce a new system structure called the Batch-Aware Distributed Filesystem (BAD-FS). Unlike traditional systems, the components of BAD-FS expose their state and policies to an external scheduler. Using this control, the scheduler allocates and orchestrates a personal, data-intensive computing system on the fly. By moving control from the core to the edge, BAD-FS is able to solve several perennial problems in distributed computing, such as space allocation, consistency management, and the safety-performance tradeoff. We demonstrate the utility of BAD-FS with a suite of five scientific workloads in a controlled local cluster and in a wide-area multi-cluster environment.
Host: Steve Reiss