Note that the cardinality reported here
is the final cardinality
of the table
from running the query.
Equally or even more important in the cost model
are intermediate cardinalities
-- the cardinalities
of intermediate results.
These intermediate cardinalities
a major effect
on the cost
of the join physical operators,
Many of the errors in cardinality estimates
are the result of the independence assumption
made in Model D.
in query 3 (see Appendix Q.3),
Model D assumes
that L.L_SHIPDATE is independent of
O.O_ORDERDATE -- clearly a suspect assumption.
In the actual TPC-D relations
there is a dependency
the order date is before a certain date
and the ship date is after this same date
is much more selective
(fewer tuples satisfy the criteria)
than it is
(under the independence assumption)
in Model D.
Hence the final cardinality in reality (11541) is much
less than the result of the optimizer (957558).
In the case of query 2,
the use of a prior aggregation in a later join predicate
causes the Model D cardinality estimate to be off
by a large factor.
Some of the cardinality estimates
might also be improved by using
histograms to represent the distribution
of values for an attribute instead of
relying on the uniform distribution assumption.
the accuracy Model D achieves
in some queries
appears to be dismal,
on the plans
the optimizer produces
may not be catastrophic.
If the logical property estimates
for all groups
and the physical property estimates
for all plans
are affected similarly,
on the optimality relationship
might not be as significant
as the magnitude
of the inaccuracies
in the logical model.