Implementation Models

The performance of a multithreaded program is strongly dependent on the underlying implementation of the threads package. Though for many applications the programmer need not be concerned with the implementation strategy of the threads package, for some applications, particularly compute-intensive applications on multiprocessors, the programmer must take into account various crucial aspects of the threads package. In this section we summarize the commonly used implementation strategies.

Many-to-one Model

For kernels that do not support multiple threads of control, multithreading can be implemented entirely as a user-level library. These libraries, without the kernel's knowledge, schedule multiple threads of control onto the process's single kernel thread. Thus, just as a uniprocessor provides the illusion of parallelism by multiplexing multiple processes on a single CPU, user-level threads packages provide the illusion of parallelism by multiplexing multiple user threads on a single kernel thread; this is referred to as the many-to-one model [Kleiman et al. 1996]. There are several advantages to this model:

Cheap synchronization. When a user thread wishes to perform synchronization, the user-level thread library checks to see if the thread needs to block. If it does, then the library enqueues the thread on the synchronization primitive, dequeues a user thread from the library's run queue, and switches the active thread. If it does not need to block, then the active thread continues to run. No system calls are required in either case.
Cheap thread creation. To create a new thread, the threads library need only create a context (i.e. a stack and registers) for the new thread and enqueue it in the user-level run queue.
Resource efficiency. Kernel memory isn't wasted on a stack for each user thread. This allows as many threads as virtual memory permits.
Portability. Because user-level threads packages are implemented entirely with standard UNIX and POSIX library calls (e.g. with getcontext and setcontext), they are often quite portable.

However, the many-to-one model does not come without a price. Specifically:

Single-threaded OS interface. Since there is only one kernel thread, if a user thread executes a blocking system call, the entire process blocks, since no other user thread can execute until the kernel thread (which is blocked in the system call) becomes available. While it adds significantly to implementation complexity, the library can circumvent this problem where nonblocking variants of system calls exist [Doeppner 1987].
No parallelism. Multithreaded programs under the many-to-one model will run no faster on multiprocessors than they run on uniprocessors. The single kernel thread acts as a bottleneck, preventing optimal use of the multiprocessor.

Despite substantial disadvantages, the relative ease of implementation of many-to-one threads packages has made it the most popular model to date. For example, the current implementations of Netscape and Java achieve their multithreading strictly through user-level threads packages.

One-to-One Model

An obvious alternative to the many-to-one model is that every user thread have its own kernel thread (i.e., that there be a one-to-one correspondence between user threads and kernel threads). This provides several advantages:

Scalable parallelism. Because each kernel thread is actually a different kernel-schedulable entity, multiple threads can run concurrently on different processors. Thus, multithreaded programs written under the one-to-one model can achieve significant speedups when migrated from uniprocessors to multiprocessors.
Multithreaded OS interface. Unlike the many-to-one model, threads blocking in the kernel do not impede process progress under the one-to-one model. When one user thread and its kernel thread block, the other user threads can continue to execute since their kernel threads are unaffected.

While the one-to-one model can yield a major performance win, it too is not without its costs. Most of the benefits of the many-to-one model do not carry over to the one-to-one model:

Expensive synchronization. Because kernel threads require kernel involvement to be descheduled, kernel-thread synchronization will require a system call if the lock is not immediately acquired. Estimates vary, but if a trap is required, synchronization will be from three to ten times more costly than for the many-to-one case [Powell et al. 1991, Vahalia 1996].
Expensive creation. Under the one-to-one model, every thread creation requires explicit kernel involvement and consumes kernel resources. The difference in creation cost depends on the specific implementation, but creating a kernel thread is generally between three and ten times more expensive than creating a user thread [Vahalia 1996].
Resource inefficiency. Every thread created by the user requires kernel memory for a stack, as well as some sort of kernel data structure to keep track of it. Many parts of many kernels cannot be paged out to disk; the presence of kernel threads is likely to displace physical memory for applications.

Many-to-Many Model

In an attempt to combine these two models, some operating systems, notably Mach 3.0 [Golub et al. 1990], SVR4/MP, Solaris 2.x[Powell et al. 1991], and Digital UNIX 4.0, give the programmer both user-level threads and kernel threads to the programmer. User-level threads are multiplexed on top of kernel-level threads, which in turn are scheduled on top of processors. The kernel knows only about the kernel-level threads; it does not know of the multiplexing performed by the user-level scheduler. Due to the many-to-many relationship between user threads and kernel threads, this is called the many-to-many model [Kleiman et al. 1996] (it is also referred to as the two-level model [Catanzaro 1994], the split model [Vahalia 1996] and the LWP model). By taking a hybrid approach, this model aims to combine the advantages of both the many-to-one model and the one-to-one model, while minimizing these models' disadvantages.

The major advantage of the many-to-many model is that large numbers of threads can be supported relatively cheaply. As with the many-to-one model, the creation of a user thread does not necessarily require the (relatively expensive) creation of a kernel thread. Thus one can create a large number of user threads, but have the overhead of creating only a small number of kernel threads. Synchronization can also be inexpensive: the implementation of synchronization primitives involves primarily user-level code. A user thread that must block on a synchronization primitive (such as a mutex) is queued on a wait queue and the underlying kernel thread finds and runs another user thread on the user-level run queue. Only if no runnable user thread is available does the kernel thread make a system call and block (or park) in the kernel. Thus the cost of a context switch from one thread to another can be no worse than the cost of a few subroutine calls--a system call is often not necessary.

User threads in the many-to-many model normally ``float'' among kernel threads--they may run on whatever kernel thread is available when they become runnable. However, in some cases it may be necessary to associate a user thread permanently with a kernel thread, i.e., to bind the user thread to the kernel thread. Such bound threads behave as threads do in the one-to-one model--their creation requires the creation of a kernel thread, and synchronization operations requires system calls (to park the kernel thread in the kernel when the user thread is blocked and to unpark it when the user thread is released).

Scheduler Activations

The many-to-many model employs two schedulers: one in the kernel and one in the user threads library. It is not immediately obvious how the kernel scheduler can cooperate with the user scheduler. For example, say the user scheduler has a high-priority thread to schedule, so it preempts the execution of a lower-priority thread, reassigning its kernel thread to the high-priority user thread. But, at the same time, the kernel scheduler decides that the kernel thread's time slice has expired and reassigns the processor to another kernel thread, perhaps one that has been assigned by our user-level scheduler to a lower-priority thread. Thus the thread deemed the most important by the user-level scheduler is denied immediate use of the processor by the kernel scheduler in favor of a less important thread.

Another problem is the number of kernel threads. How many kernel threads should be created to support a particular process? If there are too few, then the available concurrency will not be realized--user threads that are ready to run will stand idle, even though there may also be idle processors. If there are too many, then the kernel may needlessly be multiplexing a number of kernel threads on a smaller number of processors, wasting time doing the context switching, even though the application has no need for such time slicing.

One might be tempted to give a process just as many kernel threads as there are processors. But if a user thread executes a blocking system call (such as reading from an I/O device) or suffers a page fault, then its underlying kernel thread also blocks--another user thread may be ready to execute, but no kernel thread is available to be assigned to it.

An elegant solution to both problems, not yet appearing in a commercial system, is scheduler activations, an approach devised at the University of Washington [Anderson et al. 1992]. This variant of the many-to-many model provides an explicit means for the user-level and kernel schedulers to cooperate. The kernel assigns processors to processes and the user-level scheduler assigns these processors to user threads. The user-level scheduler keeps the kernel apprised of how many processors it needs; the kernel scheduler notifies the user-level scheduler of all processor-related events that affect the user process, such as when processors become available to it or are taken away from it. This model appears to solve many of the problems of the many-to-many family of models; its greatest drawback is perhaps the frequent crossings of the user-kernel boundary.

[ Top | Introduction | Solaris | Threadmon | References ]

The text of this web document was taken from a paper by Bryan M. Cantrill and Thomas W. Doeppner Jr., Department of Computer Science, Brown University, Providence, RI 02912-1910

Greg Foxman (gmf@cs.brown.edu)