Analytical and Empirical Tools for Advanced Query Optimizer Engineering
 
Stan Zdonik, PI
Dept. of Computer Science
Brown University

Contact Information
Stan Zdonik
Brown University, Dept. of Computer Science
P.O. Box 1910
Providence, RI  02912
Phone:  (401) 863-7648        Fax:  (401) 863-7657
Email:  sbz@cs.brown.edu
URL:  http://www.cs.brown.edu/people/sbz/

WWW PAGE

http://www.cs.brown.edu/research/oodb/

List of Supported Students and Staff (optional)
Don Carney (PhD student), Ying Xing (PhD student )

Project Award Information

Keywords

 query optimization, term rewriting, extensibility, formal methods, correctness, profile-based data management

Project Summary
Query optimizers generate plans to retrieve the data specified by database queries.  The quality of an optimizer naturally depends on the quality of the plans that it generates (effectiveness).  But to be useful in practice, optimizers must also generate plans quickly (efficiency), generate plans that return the right answers (correctness) and be easily extended to incorporate new data retrieval techniques in generated plans (extensibility).  The goal of this project is to develop a methodology for developing and maintaining effective, efficient, correct and extensible query optimizers.  Our approach builds upon the rule-based paradigm (introduced in [Fre87] and [GrDe87]) which achieves extensibility.   Efficiency is achieved by making rules efficient to ``fire'', and correctness is achieved by formulating rules in a manner that simplifies their verification.  The craft of building optimizers is the selection of rules and rule firing strategies that make optimizers effective.

Publications and Products

Project Impact
This effort began with the invention of the query algebra, KOLA, a combinator-based (i.e.,variable-free) query algebra described in [CZ96]. KOLA permits the expression of rewrite rules without code including complex rules that express transformations of nested queries.  By eliminating code from rules, we simplify their verification. Rewrite rules are too simple to express many of the complex transformations performed by real optimizers. Therefore, we invented COKO (COKO is an acronym for [C]ontrol [O]f [K]OLA [O]ptimizations.), a language with which one can express more complex query transformations than can be expressed with individual rewrite rules. A COKO transformation consists of a a set of KOLA rules and a firing algorithm that controls their firing.  COKO preserves the correctness results achieved with KOLA; all query transformation occurs via the firing of KOLA rules, and therefore the correctness of all rewrite rules fired implies the correctness of the COKO transformation that fires them.  But COKO provides additional expressive power through a (control) language for expressing algorithms for firing rules.  This language includes such useful constructs as explicit rule firing, query (parse) tree traversal, conditional firing, and pattern matching on query trees. We have implemented a compiler for COKO (available on the web at ftp://wilma.cs.brown.edu/u/mfc/coko.tar.Z).  This compiler translates all COKO transformations into C++ classes that operate directly on KOLA parse trees.

Student Involvement:

 * Mitch Cherniack, Assistant Professor, Brandeis University (graduated, Nov. 1998)
 * four master's students graduated
 * currently supporting 2 Ph.D. students in follow-on work.

Goals, Objectives, and Targeted Activities
For the past five years, we have also been working on the problem of optimizing the end-to-end latency of data delivery in a variety of network settings.  This includes using cyclic broadcast-based techniques in a mobile setting, using push-based techniques over a satellite link or using multicast as a way to deliver answers to more customers in a pull-based environment.

Over the past year, we have begun a convergence of this work with our work on query processing.  The point of convergence is the profile.  A profile is like a continuously evaluating query that expresses a user's interests.
While queries form the basis for profiles, we have also discovered that a richly specified profile needs to include more.  For example, a user's interest in a piece of data is likely to be time-dependent.  For stock data, interest is high in the value of a given security right after its price changes by some substantial amount.  As time passes, this new value becomes less interesting.  Notice that the previous example of stock data also included a specification of what constituted a substantial amount.  This is something that would be defined in the profile as well.  A profile might also specify some notion of how useful a particular piece of data might be.  We have developed a simple,  yet powerful profile language that addresses many of these goals.  Our experience in query processing is very useful in figuring out efficient ways to process these profiles.

Project References
[Che99]  Mitch Cherniack. Building Query Optimizers with Combinators. Brown University Department of Computer Science, Dissertation,. November, 1999.

[CZ98b] Mitch Cherniack and Stan Zdonik. Inferring Function Semantics to Optimize Queries. In Proceedings of the 24th International Conference on Very Large Databases (VLDB), New York, NY, Aug., 1998.

[CZ98a] Mitch Cherniack and Stan Zdonik.  Changing the Rules: Transformations for Rule-Based Optimizers.}  In Proc. ACM SIGMOD Int'l Conference on Management of Data, Seattle, WA, June 1998.

[CZ96]  Mitch Cherniack and Stanley B. Zdonik.  Rule Languages and Internal Algebras for Rule-Based Optimizers.  In Proc. ACM SIGMOD Int'l Conference on Management of Data, June, 1996, Montreal, Quebec, Canada.

Area Background

Optimizers that are effective, efficient, correct and extensible are difficult to build.  But little methodological and software support is available to assist the optimizer implementor (OI) with the development process.  The closest approximation to a development environment for query optimizers is the rule-based optimizer generator.  First proposed by Freytag [Fre87] and Graefe and DeWitt [GrDe87], rule-based optimizer generators permit the rapid development of optimizers on the basis of their specification in terms of a set of rewrite rules.  The goal of this approach is to achieve extensibility; an optimizer generated in this manner can be simply extended by adding or modifying the rules used to generate the optimizer.  But while a rule-based optimizer is potentially effective, the OI is on his/her own in deciding which set of rules make it so.  And in practice, rules get expressed in code making them efficient to fire but hard to verify as to their correctness.
Area References
[Fre87] Johann Christoph Freytag.  A Rule-Based View of Query Optimization. In Proc. of the SIGMOD Int'l Conference on Management of Data, pp. 173-180, San Francisco, CA, May, 1987.

[GM93] Goetz Graefe and Willam J. McKenna.  The Volcano Optimizer Generator: and Efficient Search.  In Proceedings of the Ninth International Conference on Data Engineering, 1993, pp. 209-218, Vienna, Austria, April, 1993.

[GM93]  Goetz Graefe.  The Cascades Framework for Query Optimization. In Data Engineering Bulletin, 18 (3): 19-29. September, 1995.

[LLOW91] Charles W. Lamb, Gordon Landis, Jack A. Orenstein and Daniel Weinreb. The ObjectStore Database System, Communications of the ACM, vol. 34,  no. 10, October, 1991.

[PHH92]  Hamid Pirahesh, Joseph M. Hellerstein and Waqar Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pp. 39-48, San Diego, CA, June, 1992.
 


 
  *All award information can be found on the on the NSF on-line
Awards Abstracts system http://www.fastlane.nsf.gov/a6/A6Start.htm.
 

Back to the IDM '01 homepage