Implementation Models
The performance of a multithreaded program is strongly dependent on
the underlying implementation of the threads package. Though for many
applications the programmer need not be concerned with the
implementation strategy of the threads package, for some applications,
particularly compute-intensive applications on multiprocessors, the
programmer must take into account various crucial aspects of the
threads package. In this section we summarize the commonly used
implementation strategies.
Many-to-one Model
For kernels that do not support multiple threads of control,
multithreading can be implemented entirely as a user-level
library. These libraries, without the kernel's knowledge, schedule
multiple threads of control onto the process's single kernel
thread. Thus, just as a uniprocessor provides the illusion of
parallelism by multiplexing multiple processes on a single CPU,
user-level threads packages provide the illusion of parallelism by
multiplexing multiple user threads on a single kernel thread; this is
referred to as the many-to-one model [Kleiman et
al. 1996]. There are several advantages to this model:
- Cheap synchronization. When a user thread wishes to perform
synchronization, the user-level thread library checks to see if the
thread needs to block. If it does, then the library enqueues the
thread on the synchronization primitive, dequeues a user thread from
the library's run queue, and switches the active thread. If it does
not need to block, then the active thread continues to run. No system
calls are required in either case.
- Cheap thread creation. To create a new thread, the threads library
need only create a context (i.e. a stack and registers) for the new
thread and enqueue it in the user-level run queue.
- Resource efficiency. Kernel memory isn't wasted on a
stack for each user thread. This allows as many threads as virtual
memory permits.
- Portability. Because user-level threads packages are
implemented entirely with standard UNIX and POSIX library calls
(e.g. with getcontext and setcontext), they are
often quite portable.
However, the many-to-one model does not come without a
price. Specifically:
- Single-threaded OS interface. Since there is only one
kernel thread, if a user thread executes a blocking system call, the
entire process blocks, since no other user thread can execute until
the kernel thread (which is blocked in the system call) becomes
available. While it adds significantly to implementation complexity,
the library can circumvent this problem where nonblocking variants of
system calls exist [Doeppner 1987].
- No parallelism. Multithreaded programs under the
many-to-one model will run no faster on multiprocessors than they run
on uniprocessors. The single kernel thread acts as a bottleneck,
preventing optimal use of the multiprocessor.
Despite substantial disadvantages, the relative ease of implementation
of many-to-one threads packages has made it the most popular model to
date. For example, the current implementations of Netscape and Java
achieve their multithreading strictly through user-level threads
packages.
One-to-One Model
An obvious alternative to the many-to-one model is that every user
thread have its own kernel thread (i.e., that there be a one-to-one
correspondence between user threads and kernel threads). This provides
several advantages:
- Scalable parallelism. Because each kernel thread is
actually a different kernel-schedulable entity, multiple threads can
run concurrently on different processors. Thus, multithreaded programs
written under the one-to-one model can achieve significant speedups
when migrated from uniprocessors to multiprocessors.
- Multithreaded OS interface. Unlike the many-to-one model,
threads blocking in the kernel do not impede process progress under
the one-to-one model. When one user thread and its kernel thread
block, the other user threads can continue to execute since their
kernel threads are unaffected.
While the one-to-one model can yield a major performance win, it too
is not without its costs. Most of the benefits of the many-to-one
model do not carry over to the one-to-one model:
- Expensive synchronization. Because kernel threads require
kernel involvement to be descheduled, kernel-thread synchronization
will require a system call if the lock is not immediately
acquired. Estimates vary, but if a trap is required, synchronization
will be from three to ten times more costly than for the many-to-one
case [Powell et al. 1991, Vahalia 1996].
- Expensive creation. Under the one-to-one model, every
thread creation requires explicit kernel involvement and consumes
kernel resources. The difference in creation cost depends on the
specific implementation, but creating a kernel thread is generally
between three and ten times more expensive than creating a user thread
[Vahalia 1996].
- Resource inefficiency. Every thread created by the user
requires kernel memory for a stack, as well as some sort of kernel
data structure to keep track of it. Many parts of many kernels cannot
be paged out to disk; the presence of kernel threads is likely to
displace physical memory for applications.
Many-to-Many Model
In an attempt to combine these two models, some operating systems,
notably Mach 3.0 [Golub et al. 1990], SVR4/MP, Solaris 2.x[Powell et
al. 1991], and Digital UNIX 4.0, give the programmer both user-level
threads and kernel threads to the programmer. User-level threads are
multiplexed on top of kernel-level threads, which in turn are
scheduled on top of processors. The kernel knows only about the
kernel-level threads; it does not know of the multiplexing performed
by the user-level scheduler. Due to the many-to-many relationship
between user threads and kernel threads, this is called the
many-to-many model [Kleiman et al. 1996] (it is also referred to as
the two-level model [Catanzaro 1994], the split model [Vahalia 1996]
and the LWP model). By taking a hybrid approach, this model aims to
combine the advantages of both the many-to-one model and the
one-to-one model, while minimizing these models' disadvantages.
The major advantage of the many-to-many model is that large numbers of
threads can be supported relatively cheaply. As with the many-to-one
model, the creation of a user thread does not necessarily require the
(relatively expensive) creation of a kernel thread. Thus one can
create a large number of user threads, but have the overhead of
creating only a small number of kernel threads. Synchronization can
also be inexpensive: the implementation of synchronization primitives
involves primarily user-level code. A user thread that must block on a
synchronization primitive (such as a mutex) is queued on a wait queue
and the underlying kernel thread finds and runs another user thread on
the user-level run queue. Only if no runnable user thread is available
does the kernel thread make a system call and block (or park)
in the kernel. Thus the cost of a context switch from one thread to
another can be no worse than the cost of a few subroutine calls--a
system call is often not necessary.
User threads in the many-to-many model normally ``float'' among kernel
threads--they may run on whatever kernel thread is available when they
become runnable. However, in some cases it may be necessary to
associate a user thread permanently with a kernel thread, i.e., to
bind the user thread to the kernel thread. Such bound threads
behave as threads do in the one-to-one model--their creation requires
the creation of a kernel thread, and synchronization operations
requires system calls (to park the kernel thread in the kernel when
the user thread is blocked and to unpark it when the user thread is
released).
Scheduler Activations
The many-to-many model employs two schedulers: one in the kernel and
one in the user threads library. It is not immediately obvious how the
kernel scheduler can cooperate with the user scheduler. For example,
say the user scheduler has a high-priority thread to schedule, so it
preempts the execution of a lower-priority thread, reassigning its
kernel thread to the high-priority user thread. But, at the same time,
the kernel scheduler decides that the kernel thread's time slice has
expired and reassigns the processor to another kernel thread, perhaps
one that has been assigned by our user-level scheduler to a
lower-priority thread. Thus the thread deemed the most important by
the user-level scheduler is denied immediate use of the processor by
the kernel scheduler in favor of a less important thread.
Another problem is the number of kernel threads. How many kernel
threads should be created to support a particular process? If there
are too few, then the available concurrency will not be realized--user
threads that are ready to run will stand idle, even though there may
also be idle processors. If there are too many, then the kernel may
needlessly be multiplexing a number of kernel threads on a smaller
number of processors, wasting time doing the context switching, even
though the application has no need for such time slicing.
One might be tempted to give a process just as many kernel threads as
there are processors. But if a user thread executes a blocking system
call (such as reading from an I/O device) or suffers a page fault,
then its underlying kernel thread also blocks--another user thread may
be ready to execute, but no kernel thread is available to be assigned
to it.
An elegant solution to both problems, not yet appearing in a
commercial system, is scheduler activations, an approach
devised at the University of Washington [Anderson et al. 1992]. This
variant of the many-to-many model provides an explicit means for the
user-level and kernel schedulers to cooperate. The kernel assigns
processors to processes and the user-level scheduler assigns these
processors to user threads. The user-level scheduler keeps the kernel
apprised of how many processors it needs; the kernel scheduler
notifies the user-level scheduler of all processor-related events that
affect the user process, such as when processors become available to
it or are taken away from it. This model appears to solve many of the
problems of the many-to-many family of models; its greatest drawback
is perhaps the frequent crossings of the user-kernel boundary.
[
Top |
Introduction |
Solaris |
Threadmon |
References
]
The text of this web document was taken from a paper by Bryan M. Cantrill and Thomas W. Doeppner Jr., Department
of Computer Science, Brown University, Providence, RI 02912-1910
Greg Foxman (gmf@cs.brown.edu)