This query will return a table of customer names and the total quantity of items they have ordered. (To understand the exact meaning of this query, refer to a description of SQL such as [Date 93].) From this point on, I will abbreviate the attribute names C_CUSTKEY to CCK, O_CUSTKEY to OCK, O_ORDERKEY to OOK and L_ORDERKEY to LOK, etc.

           After the user submits the SQL query, the database system would produce a corresponding physical plan. The first step in this process is to translate the logical query from SQL into a query tree in logical algebra. This step is the parser.

           The next step is to translate the query tree in logical algebra into a physical plan. There are generally a large number of plans that implement the query tree; the process of finding the best one is called query optimization. That is, for some query execution performance measure (e.g. execution time), we want to find the plan with the best execution performance. The goal is that the plan be optimal or near optimal within the search space of the optimizer.

           The optimizer starts by copying the relational algebra query tree into its search space. The optimizer then expands the search space and finds the best plan.

           At this level of generality, the optimizer can be viewed as the code generation2 part of a query compiler for the SQL language, that produces code to be interpreted by the query execution engine, except that the optimizer's emphasis is on producing "very efficient" code. For example, the optimizer uses the database system's catalog to get information (e.g. number of tuples) about the stored relations referenced by the query, something traditional programming language compilers normally do not do.

           Finally, the optimizer copies the optimal3 physical plan out of its memo structure and sends it to the query execution engine. The query execution engine executes the plan using the relations in the stored database as input, and produces the table of customer names and item quantities, described above, as output.

2. Code generation may include various post-optimization phases where actual machine code is generated. This thesis is only concerned with the generation of a plan.
3. In theory, optimality is the goal. However optimality is always with respect to a particular search space of possible plans -- defined in Cascades primarily by rules, and with respect to a particular cost model and catalog information -- estimates subject to potentially substantial inaccuracies. In practice (for reasons discussed in section 6) the search space may not be unambiguously defined. Also, for various reasons, most notably the huge size of the search space, the optimizer may not even attempt to search the entire search space. So in practice, we look for a "good" plan, near the optimal plan.

2. Query Optimization:      2.1: Data Independence     2.2: Query Processing  

 Page 6