The Columbia Query Optimization Project

The Columbia Query Optimization Project is a joint research project of David Maier from the Oregon Graduate Institute and Leonard Shapiro from Portland State University . It is funded by the National Science Foundation through grants entitled "Query Optimization Engineering", to Portland State University (NSF # IRI-9610013) and to the Oregon Graduate Institute (NSF # IRI-9619977).

Publications Related to Columbia

"Pruning in the Columbia Query Optimizer", with D. Maier et al, in preparation Compressed Word97 or Compressed Postscript.

"Efficiency in the Columbia Database Query Optimizer", Master's thesis by Yongwen Xu, Portland State University, 1998

"A TPC-D Model for Database Query Optimization in Cascades", Master's thesis by Keith Billings, Portland State University, 1997.

Summary

Query optimizers are one of the main means by which modern database systems achieve their performance advantages. Given a request for data manipulation or retrieval, an optimizer will choose an optimal plan for evaluating the request from among the manifold alternative strategies.

Optimizers for commercial relational database systems appeared early in the last decade, and optimization for the basic relational model was considered a solved problem by many. However, new interest in query capabilities for knowledge discovery and on-line analytical processing, in large data warehouses, and against complex multimedia objects has kindled renewed research in optimization. Current optimizers have often proved inadequate to the needs of these new application areas.

Query optimizers were already among the largest and most complex modules of database systems, and they have proven difficult to modify and extend to accommodate these areas. The situation is further complicated by the need to handle new evaluation techniques, such as parallel and distributed evaluation. Over the past ten years, there has been help on the software engineering side for optimizer developers, in the form of optimizer generators and frameworks. Generators derive parts of the optimizer code from specifications, while frameworks delineate internal optimizer interfaces to aid extensibility. However, there are still myriad decisions to make in engineering the optimization and evaluation process itself: which logical and physical operators to use, which search strategies and heuristics to employ, which query transforms to utilize and so forth. While new ideas for operators, access methods and search techniques are continually being proposed, there is little work on how different methods interact, what works well for which classes of queries or under what circumstances a technique is effective.

The proposed work helps with this second aspect of optimizer development. It undertakes a systematic study of optimizer engineering issues, organized around query classes and applications. Among the questions addressed are

Which evaluation techniques and access methods are most useful in a given application area?
Which logical operators and rule sets work best for a particular query class?
How well do different search strategies work with a given query model?
What are the tradeoffs, relative to optimization time and plan quality, of search heuristics and limits on plan shape?
How can optimizers handle variability in the runtime evaluation environment?

Answers to these questions will help optimizer implementors make better choices in the design of their systems.

These investigations use the Columbia query optimizer framework and visualization environment developed at PSU and OGI, which has already proved an effective test environment for aggregation transforms on decision-support queries. The project involves close interaction with companies who produce database management systems and the hardware on which they run, to identify the most pressing problems, assess our proposed solutions and determine how to package technology produced by the project to aid transition into commercial optimizers.