Because find best is actually called
throughout the optimization process,
and the results of its search are stored
in the memo structure,
much of the work in finding a best plan
will already be done for calls to find best
in the later stages of
optimization.
Complexity of Optimization
While the complexity
of the algorithms corresponding
to the physical operators
is generally linear,
n log (n)
or at worst n2 (in n, the number of tuples),
the optimization algorithms
can be much more complex.
For example,
one aspect of query optimization
is determining
the
optimal order
in which to do the joins.
This problem has been shown in [Ibaraki 84]
to be NP-hard
in the number of relations
involved in the joins.
Since the search space is
represented by the number of
query trees,
the last column in figure 3.4 illustrates
this
complexity.20
In the execution of
a TPC-D query,
n might be the number
of LINEITEM tuples,
6,000,000.
To sort these tuples
would take
on the order of
n log(n) comparisons.
(Roughly, around a billion comparisons.)
In the optimization
of a TPC-D query,
n might be
the number of relations
for which a join order
must be chosen.
Referring back to figure 3.4,
the number of expressions
needed to represent
the logical search space
is
3n - (2n+1 - 1) + n.
In TPC-D query 8,
there are seven relations,
so n=7, and
1939 expressions are needed.
(The actual size of the logical search space,
i.e. the number of logical query trees, is 665280.)
20. The number of query trees in figure 3.4 represent the
search space complexity
of the join order optimization problem -- that is
the size of the search space.
The actual running time of the optimizer
and the memory required is closer to
the number of expressions or the number of groups,
since only a tiny fraction of all possible query trees are
generated and none (except the initial query tree and the
final plan) are stored.
This is true for both Cascades
and for the bottom-up optimizers discussed
in section 4, Related Work.
