Now we look at each of these steps in more detail, starting with predicates and selectivity, then data value distribution assumptions and finally cardinality and cost.

           One of the key concepts used by System R was the classification of predicates in order to estimate their selectivity. For every predicate in an SQL statement, the optimizer assigns a selectivity factor (also called a selectivity estimate or just selectivity).

           Table 1 in [Selinger 79] shows types of predicates that arise in SQL, and for each of these predicate types, a strategy (usually a formula) for estimating selectivity. A few of the types of predicates are:
in 
single range
=
>
between
A combination of ranges is achieved by evaluating boolean expressions of the above predicate types.

           The strategy for estimating the selectivity for each predicate type depends on the distribution of the data in the database. When there is an index on an attribute, System R assumes a uniform distribution of values for that attribute. In many cases they have to make an arbitrary choice of the selectivity factor. For example, for predicates of the form
attribute = value
when there is no index on the attribute, Selinger chooses a selectivity factor of 1/10, independent of the table on which the predicate will be applied. Arbitrary numbers used as the selectivity factor, (e.g. 1/10 in the example above) can be called magic numbers. In general, the use of a magic number is making an assumption about both the distribution of the data in the stored relations and the distribution of queries that will be run on the database.


4. Related Work:    <System R> <Bottom Up> <Join Order Heuristics> <Query Blocks> <Rules> <Top Down> <Volcano> <Phases> <Specialized Techniques> <Unique Rules>
  4.1: Left-Deep  

 Page 3