Garbage Collection
In this assignment, you will implement two garbage collectors: mark & sweep, and stop & copy. As we have seen, garbage collection involves starting from a root set of references, which reside in local variables and stack frames, and searching for reachable values on the heap. We have developed support code that gives your collector access to a running program's root set. You just have to implement the garbage-collection algorithms.
For this assignment, you will no longer be using the PLAI language
levels. Instead, set the language level to (module ...)
.
Download the support code for the garbage collector. This includes several files:
memory-core.ss
You will need torequire
this file into your allocator.
It defines the following abstractions:
-
(get-root-set id ...)
(SYNTAX) : produces a list representing the current set of roots, including an entry for each additionalid
you provide (explained below) -
(root-name root)
: given a root, returns the name of the root as a symbol. If the root corresponds to a program variable, it will have the same name. Otherwise, it will betmpXXX
. This procedure is only provided to help you debug your collector. -
(read-root root)
: given a root, returns the location to which it currently refers -
(set-root! root new-loc)
: given a root and a location, updates the root to refer to the new location
allocator.ss
We provide a trivial allocator that does not perform garbage collection (i.e., it just gives up after running out of memory). You will need to provide your own allocator modules—one for each collection algorithm—that implement the same interface. You can implement both of your collectors in the same module and use a flag to select between them, or you can make separate files and create a symbolic link toallocator.ss
. The interface is as follows:
gc:set-heap-size! : num -> () | This procedure should set the size of the heap to the given number. This is the first thing any mutator must do, so no allocation should be possible until this call is made. |
gc:alloc-flat : num U sym U bool U empty ->
loc | This procedure should allocate a flat Scheme value (number, symbol, boolean, or empty list) on the heap, returning its location (a number). The value should occupy a single heap cell, though you may use additional space to store a tag, etc. You are also welcome to pre-allocate common constants (e.g., the empty list). This procedure may need to perform a garbage-collection. If there is still insufficient space, it should raise an error. |
gc:number? : loc -> loc | This procedure
should return the address of a true value iff the content
of the given location is a number. Otherwise it should return the
address of a false value.
|
gc:symbol? : loc -> boolean | This procedure
should return the address of a true value iff the content
of the given location is a symbol. Otherwise it should return the
address of a false value.
|
gc:boolean? : loc -> boolean | This
procedure should return the address of a true value iff
the content of the given location is a boolean. Otherwise it should
return the address of a false value.
|
gc:empty? : loc -> boolean | This procedure
should return the address of a true value iff the content
of the given location is an empty list. Otherwise it should
return the address of a false value.
|
gc:deref : loc -> num U sym U bool U
empty | Given the location of a flat Scheme value, this
procedure should return that value; if the location points instead to
a cons cell, this function should raise an error.
|
gc:cons : loc loc -> loc | This
procedure should allocate a cons cell on the heap, whose
first and rest are the given arguments.
These fields must be stored in separate heap cells, and you may use
additional space for a tag, etc. As with alloc-flat ,
this procedure may need to perform a garbage-collection, and if there
is still insufficient space it should raise an error.
|
gc:cons? : loc -> boolean |
This should return true iff the location refers to a cons cell
|
gc:first : loc -> loc | If the
given loc refers to a cons cell, this should
return the first field. Otherwise, it should raise an
error.
|
gc:rest : loc -> loc | If the
given loc refers to a cons cell, this should
return the rest field. Otherwise, it should raise an
error.
|
gc:loc->scheme-val : loc -> any |
For debugging: this procedure consumes a location and returns a Scheme value corresponding to the contents of the location. |
make-mutator.ss
This module defines a language in which you can implement test applications (mutators) for your collectors. A program that uses this language module will expose its roots throughget-root-set
, allocate its data on your allocator's heap,
and use the versions of primitives defined in your allocator. (When
the mutator refers to procedures like cons
or
symbol?
, the calls are automatically redirected to the
gc:cons
and gc:symbol?
defined in your
allocator.)
sample-mutator.ss
This serves as an example of a mutator. It takes the form of amodule
defined in the make-mutator.ss
language. It declares a heap size of 100 cells, then defines and
applies some simple functions, printing out the results. You will
want to test against this module, probably with smaller heap sizes,
and also to write your own mutators. When you write your own mutator,
make sure the first thing you do is to set the heap size by calling
(heap-size size)
.
Notes
During collection, you will need to useget-root-set
to
obtain the current set of roots. In gc:alloc-flat
, you
should not list any additional identifiers. However, in
gc:cons
, you must provide the arguments (the first and
rest fields), since these are roots. (If you don't list them, you
will need to treat them specially and be sure to update them with
set!
).
In the mark & sweep collector, you should maintain a free list in the heap. That is, you should not use any auxiliary data structure; instead, use the available space in the heap to keep track of the free list. You may use one extra box (“register”) to point to the start of the free list. You may need to adjust your allocation accordingly to have enough space to maintain free list pointers!
You will probably want to use numbers to represent locations, but
you may use another representation if it eases your implementation.
Also, to help see what is happening, you might want to print a
representation of the heap before and after every garbage collection.
You can use the Scheme printf
function:
(printf "Heap is ~a~n" Heap)
Some final words of advice:
- You will want to test your program with small heap sizes, so that the garbage collector runs frequently.
- You do not need to compact memory in mark-and-sweep.
- You may find it convenient to use the Scheme construct set!, which allows you to mutate a variable without using boxes. We recommend you use this feature only in one instance: when swapping the semispaces in the stop & copy collector. This will save you the trouble of unboxing the heap every time you use it.
FAQ
get-root-set
seems to return duplicate roots. Is
this a bug in the support code? No, this is the intended behavior.
Roots come from walking the stack, and several frames may refer to the
same variable. Moreover, different variables may refer to the same
heap location. You need to update all of them in your stop &
copy collector.
It would be a lot easier if I could just make all objects consume the same amount of space. Would that be acceptable? No! We've been extremely easy-going in only making you handle two sizes. To reduce it to one size would make the problem trivial and unrealistic.