Explanation of the formula 3n - (2(n+1) + 1) + n for the number of expressions
Make
a table
of all n range variables,
corresponding to the n stored relations
that will be joined.
Then mark each range variable
as either -,
L or R.
There are
3n different assignments of these symbols.
Each assignment can be thought of as
a logical expression
-- except that
as we see below some assignments
would correspond to
expressions that are not WELL-FORMED (I call them illegal).
The L means that the stored relation
is on the left hand side of a join
expression,
the R means that the stored relation
is on the right hand side of a join expression
and the -
indicates that the stored relation
is not in this expression.
Then
subtract off
assignments which have only L's or only R's
(not both).
There will be 2n -1 assignments
with only L's,
and the same number with only R's.
Since a join
always takes two inputs,
these correspond to
invalid join expressions.
But,
in Cascades,
you will have
a single GET expression
for every stored relation,
add back n assignments
-- one for each of the n stored relations.
Finally
subtract off
the assignment
of all -'s
(no L's or R's)
-- this corresponds to
nothing
-- there isn't even a group for this.
So we start with
3n
assignments,
subtract off
2n - 1, 2n - 1 and 1,
and add back n.
This adds up
to
3n - (2(n+1) + 1) + n.
As we said before,
the final addition of n
is due to
logical expressions
with Model D GET operator
-- these are not be counted
in [Pellenkoft 96] because they are
only considering the number
of join expressions.
Here is a sample of a few table entries (in NO PARTICULAR ORDER )
and their corresponding logical expressions
for the number of possible expressions in the query AXBXC
table expressions or why its illegal --
A B C A B and C are tables to be joined
- - - illegal expression -- corresponds to nothing: subtract 1
- - R corresponds to GET C (Cascades only)
- R - corresponds to GET B (Cascades only)
- R R illegal - no left hand side -- there are 2 ~n of these
- - L corresponds to GET C but already counted as - - R
- L L illegal - no right hand side -- there are 2 ~n of these
L L L also illegal - no right hand side -- part of the 2 ~n
L R R A X BC
R L L BC X A
L R L AC X B
R L R B X AC
L L R AB X C
R R L C X AB
- L R B X C
- R L C X B
L R - A X B
...
^
|
3 ~n entries in table
there are 3 ~n total entries (and about 2 * 2 ~n of these are illegal --)
Note that each group will correspond to a number m
of relations
that have been joined,
where m is less than n,
the total number
of relations joined in the query.
The number of logical expressions
in a group is equal to
2m - 2
This is because the binary representation of every number
from 1 to 2m represents
a unique (binary) partition (where order matters) of
the set of m relations,
and hence an expression in a group
which represents the join of m relations.
For the query AXBXC (i.e. n=3), consider the group AXB (i.e. m = 2):
table expression
A B A B are tables to be joined in this group
0 1 A X B think of 0 as L, and 1 as R
1 0 B X A
or the group AXBXC (i.e. m = 3, the final group):
A B C A B and C are tables to be joined in this group
0 0 1 AB X C
0 1 0 AC X B
0 1 1 A X BC
1 0 0 BC X A
1 0 1 B X AC
1 1 0 C X AB
This corresponds to the 6 logical expressions in
group 4 of Figure 3.3.
So another way to compute the total number of logical expressions
(in this case not including the n GET logical expressions)
is:
sum from m=1 to n of
(n choose m) * 2m - 2
From private conversation between Len Shapiro and Bennet Vance -- [Vance 96].