BROWNIAN CODE MOTION (aka, Randomly Moving Around Tags) Team: PLT at Brown University http://www.cs.brown.edu/research/plt/ Team Members: Greg Cooper Paul Graunke Rob Hunter David Reiss David Tucker Shriram Krishnamurthi Tool: PLT Scheme (DrScheme) v103 Our program has four parts: reader, writer, local optimizer and global optimizer. The pipeline reads, globally optimizes, locally optimizes, then writes. The reader generates a traditional tree. Using the problem semantics, we convert the output of reading into a canonical "meaning" value. The meaning is represented as a vector whose every element holds two values: a maximal string whose attributes are all the same, and the attributes of that string. The global optimizer consumes the meaning. The optimizer uses a randomized algorithm to generate "parenthesizations" of the tree -- that is, an N-element meaning gets an (+ N 1)-element sequence of tags. Each element of a parenthesization consists of a sequence of closing tags followed by a sequence of opening tags, such that it does not violate the semantics of the input. The resulting string representation may, however, be longer than the input. The global optimizer uses pruning to reject potentially useless parenthesizations. Here's more on the global optimizer: The idea is to scan the meaning from left to right, and between each pair of contexts, to close some tags and open some tags as necessary. Remember that the next context can be determined given the current context and the current stack of tags. So, we go left to right, keeping a stack of tags we've added (and the context that resulted from adding each tag). We then use a function (called delta) that, given two contexts and the current stack, computes all possible close/open tag combinations that are necessary to change from the old context to the new context. Once you know these possibilities, you could do a search for the cheapest path, which will be very exponential. Instead we choose one randomly at each step. Our key function (tags-to-diff-context) takes two contexts (no stack) and determines how to get from the old context to the new context by only *opening* tags. This is a fairly ugly rule-based function. Given that function, we can generate all possible close/open tag sequences by examining every context on the stack, and calling tags-to-diff-context on that context and the new context. The program feeds the output from this phase -- a meaning and a parenthesization -- to the local optimizer. This phase performs fairly traditional tree rewrites to apply common-sense reductions. There are two parts to the local optimzer. The first part only applies contractions. The second part applies expansions (to introduce optimization opportunities) followed by contractions. Finally, the writer prints the resulting, optimized tree. A reader might wonder why we perform local optimization after global optimization, rather than the other way around. There are two reasons: 1. The global optimizer can't benefit from the local optimizer. The input to the global optimizer is a "minimal" representation of the input tree. Thus, it would waste the actions of the local optimizer. 2. The global optimizer uses a randomized algorithm. Therefore, even when it finds a tree that is better than the input, this tree is only "good", not necessarily ideal. In particular, the solution may contain some redundancy. The local optimizer is employed on clean-up duty. (In the worst case, the global optimizer may not have had enough time to produce *any* solutions better than the input, so the local optimizer becomes critical.) In terms of time, we allocate about 10% of the time to the reader, about twice as much to the writer (we had to stop work on the contest problem early, so we ran out of time to tighten the bounds), stop about five seconds early (just in case we page on a large file, we want enough time for the timer thread to kick in) and split the remaining time between the optimization phases. We clearly consumed too little of the available time -- eg, when given 180 seconds, we ran for only about 145. The reason we take 145 seconds even on trivial tasks is because we let the global optimization run forever, hoping it will produce ever-better solutions; only a timer thread aborts it. Brownian Code Motion