Because find best is actually called throughout the optimization process, and the results of its search are stored in the memo structure, much of the work in finding a best plan will already be done for calls to find best in the later stages of optimization.


Complexity of Optimization

           While the complexity of the algorithms corresponding to the physical operators is generally linear, n log (n) or at worst n2 (in n, the number of tuples), the optimization algorithms can be much more complex. For example, one aspect of query optimization is determining the optimal order in which to do the joins. This problem has been shown in [Ibaraki 84] to be NP-hard in the number of relations involved in the joins. Since the search space is represented by the number of query trees, the last column in figure 3.4 illustrates this complexity.20

           In the execution of a TPC-D query, n might be the number of LINEITEM tuples, 6,000,000. To sort these tuples would take on the order of n log(n) comparisons. (Roughly, around a billion comparisons.) In the optimization of a TPC-D query, n might be the number of relations for which a join order must be chosen. Referring back to figure 3.4, the number of expressions needed to represent the logical search space is 3n - (2n+1 - 1) + n. In TPC-D query 8, there are seven relations, so n=7, and 1939 expressions are needed. (The actual size of the logical search space, i.e. the number of logical query trees, is 665280.)





20. The number of query trees in figure 3.4 represent the search space complexity of the join order optimization problem -- that is the size of the search space. The actual running time of the optimizer and the memory required is closer to the number of expressions or the number of groups, since only a tiny fraction of all possible query trees are generated and none (except the initial query tree and the final plan) are stored. This is true for both Cascades and for the bottom-up optimizers discussed in section 4, Related Work.

3. Fundamental Concepts:    <Models> <Log> <Query Trees> <Groups> <Memo> <Phys> <Item> <Rules> <Opt Stages> <Find Best> <Complexity>
  3.1: EQJOIN     3.2a: Query Tree     3.2b: Equiv.     3.3a: Init Memo     3.3b: Memo     3.3c: Key     3.4: Complexity     3.5: Phys Memo     3.6: Plan     3.7: Pred.     3.8: Op Types     3.9: Rule  

[To Related Work]    Page 30