When Cartesian products
are not allowed,
the size of the resulting search space
is heavily dependent on the join topology of
the query.
A
join topology describes the join predicates in the query
and in particular,
for which pairs of relations there are join conditions.
In a typical
linear join of n relations,
there are
n-1 join conditions,
with n-2 relations participating
in two join conditions,
and two relations (the end relations)
participating in one join condition.
[Ono 90] shows that
for linear joins
and allowing no Cartesian products,
their optimizer
can optimize queries
involving up to 100 relations.
[Vance 95]
looks at
efficient algorithms
for optimizing join orders,
without heuristics.
Recall that the System R optimizer
eliminated Cartesian products
and restricted
the search space
to left-deep trees.
Vance shows
that the dynamic programming approach
used by Selinger
can be modified to
efficiently optimize join orders,
while allowing
bushy trees
and Cartesian products
in the plan.
Bushy trees are trees
that are not restricted to
being left-deep or right-deep.
By very effective use
of small data structures,
Vance is able to optimize queries
with joins of
up to 14 relations,
for any topology and
without using heuristics.
The Vance
approach
uses a
dynamic programming algorithm
to determine
the optimal join order for a query.
It starts
by evaluating
all possible 2-way joins,
and then provides
a mechanism
for generating
the least cost plans
for all n+1 way joins,
given that the cost
of all n-way joins is known.
The SQL language [Chamberlin 75] allows
for the where clause of
an SQL statement to be another SQL query.
This construct is called
a subquery
or
query block.
In Selinger,
a
query block
was represented
"by a SELECT list,
a FROM list
and a WHERE tree,
containing,
respectively,
the list of items to be retrieved,
the tables referenced,
and the boolean combination of simple predicates
specified by the user" [Selinger 79].