Tech Report CS-94-23

Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms

Paris C. Kanellakis, Dimitris Michailidis and Alex A. Shvartsman

May 1994

Abstract:

The CRCW PRAM under dynamic fail-stop (no restart) processor behavior is a fault-prone multiprocessor model for which it is possible to both guarantee reliability and preserve efficiency. To handle dynamic faults some redundancy is necessary in the form of many processors concurrently performing a common read or write task. In this paper we show how to significantly decrease this concurrency by bounding it in terms of the number of actual processor faults. We describe a low concurrency, efficient and fault-tolerant algorithm for the Write-All primitive: ``using $\leq N$ processors, write 1's into $N$ locations''. This primitive can serve as the basis for efficient fault-tolerant simulations of algorithms written for fault-free PRAMs on fault-prone PRAMs. For any dynamic failure pattern $F$, our algorithm has total write concurrency $\leq |F|$ and total read concurrency $\leq 7\,|F|\log N$, where $|F|$ is the number of processor faults (for example, there is no concurrency in a run without failures); note that, previous algorithms used $\Omega(N \log N)$ concurrency even in the absence of faults. We also describe a technique for limiting the per step concurrency and present an optimal fault-tolerant EREW PRAM algorithm for Write-All, when all processor faults are initial.

(complete text in pdf or gzipped postscript)