"Efficiency in the Columbia Database Query Optimizer", Master's thesis by Yongwen Xu, Portland State University, 1998
"A TPC-D Model for Database Query Optimization in Cascades", Master's thesis by Keith Billings, Portland State University, 1997.
Optimizers for commercial relational database systems appeared early in the last decade, and optimization for the basic relational model was considered a solved problem by many. However, new interest in query capabilities for knowledge discovery and on-line analytical processing, in large data warehouses, and against complex multimedia objects has kindled renewed research in optimization. Current optimizers have often proved inadequate to the needs of these new application areas.
Query optimizers were already among the largest and most complex modules of database systems, and they have proven difficult to modify and extend to accommodate these areas. The situation is further complicated by the need to handle new evaluation techniques, such as parallel and distributed evaluation. Over the past ten years, there has been help on the software engineering side for optimizer developers, in the form of optimizer generators and frameworks. Generators derive parts of the optimizer code from specifications, while frameworks delineate internal optimizer interfaces to aid extensibility. However, there are still myriad decisions to make in engineering the optimization and evaluation process itself: which logical and physical operators to use, which search strategies and heuristics to employ, which query transforms to utilize and so forth. While new ideas for operators, access methods and search techniques are continually being proposed, there is little work on how different methods interact, what works well for which classes of queries or under what circumstances a technique is effective.
The proposed work helps with this second aspect of optimizer development. It undertakes a systematic study of optimizer engineering issues, organized around query classes and applications. Among the questions addressed are
These investigations use the Columbia query optimizer framework and visualization environment developed at PSU and OGI, which has already proved an effective test environment for aggregation transforms on decision-support queries. The project involves close interaction with companies who produce database management systems and the hardware on which they run, to identify the most pressing problems, assess our proposed solutions and determine how to package technology produced by the project to aid transition into commercial optimizers.