Schema - A list of columns (or attributes) comprising the table. In Cascades, a schema is represented as a set of equivalence classes. Each equivalence class consists of a set of equivalent attributes, where two attributes are considered equivalent if they are made equal by the equality conditions of an EQJOIN. In the example schemas for the logical operators figures, when the equivalence class consists of a single attribute, as a shorthand, we simply list the attribute. When the equivalence class contains more than one attribute, we use the set notation. For example, in the EQJOIN logical operator figure, the schema includes the singleton equivalence class {OD}, written as simply OD, while the equivalence class created by the EQJOIN condition CCK = OCK is written as {CCK, OCK}. In fact, when the distinction is not important, we say that the schema contains an attribute when, more precisely, the schema contains an equivalence class which contains the attribute.
Candidate Key - In general, a table could have many candidate keys. In our model, however, only one candidate key is tracked. The TPC-D Benchmark Specification lists a primary key for each of the stored relations. A primary key is the combination of the logical concept of candidate key and the physical concept of ordering. Here we are concerned only with logical properties, so for stored relations (i.e. those listed in the catalog), we use the primary key in the TPC-D Benchmark specification for the candidate key logical property. This in turn will be the candidate key for the output table from the GET logical operator for that stored relation. Note that the candidate key could also be considered a functional dependency of the form:
which says that, for a given table, the attributes of the candidate key determine all the attributes of a tuple.candidate keyschema
Tracked Functional Dependencies - These are functional dependencies (other than the candidate key, described above) that we track for each table. No functional dependencies for the stored relations are listed in the TPC_D Benchmark Specification, so the functional dependencies list will be empty after the initial GET of a stored TPC_D relation. In fact, the primary source of the functional dependencies tracked by the optimizer are those computed when updating logical properties for an EQJOIN.
When the functional dependencies of a table need to be accessed, they are referred to as funcd_list. Each individual functional dependency is referred to as funcd. Each funcd has a left hand side and a right hand side, which are called defining and dependent, respectively. Both defining and dependent are sets of attributes, and the functional dependency can be written as:
That is, the attributes of the left hand side functionally determine the attributes on the right hand side.definingdependent
Cardinality - Number of tuples in the output.
Unique Cardinality - Number of distinct tuples in the output.
Unique - Indicates whether the output table is a true set (guaranteed to have no duplicate tuples), designated by 'yes', or a bag, designated by 'no'.
Column Unique Cardinality - For every attribute in the output table's schema, the number of distinct values that occur for this attribute in the output table. (Note that a column's cardinality is trivially equal to the cardinality of the output table.) The unique cardinality of an attribute in the schema is written as attr.cucard. The dot between attr and cucard could be thought of as representing a reference to an element of a structure in the C programming language which represents the attribute. Most of the Cascades data structures (e.g. the data structure for the type, attribute) are C++ classes, and the access is through a member function. So from this, and the discussion of schema above, it is easy to understand how attr.cucard could actually appear in Cascades' C++ code as eqclass.get_cucard().
bool ? value1 : value2
bool1 ? value1 : bool2 ? value2 : bool3 ? value3 : value4
|
|---|
| Type | Context | Meaning |
|---|---|---|
| schema | logical property of a table | Set of attributes comprising the table. Actually representation is a set of attribute equivalence classes -- see text above for discussion |
| attr or attribute | parameter of a operator | Column of a table: Includes the table range_variable and the attribute range variable to uniquely identify any column in any table. |
| func_d | funcd_list logical property of a table | Two sets of attributes defined on the table. The first set of attributes is the defining set and the second set of attributes is the determined set. |
| agg_op | parameter of AGG_LIST logical operator | A new attribute computed as an aggregate function of zero or more of the existing attributes of a table. |
| Parameter | Type | Context, Meaning |
|---|---|---|
| p_attrs | {attr} | For PROJECT, the project list (set) |
| f_attrs | {attr} | For FUNC_OP, the list (set) of attributes used to compute the new attribute. |
| agg_ops | {agg_op} | For AGG_LIST, the set of aggregate operators each aggregate operator consisting of a list of attributes used in an aggregate function and an attribute range_var |
| attribute range_var | STRING | In agg_op or FUNC_OP, the name that will be used to reference the attribute created by the agg_op or FUNC_OP. This corresponds to the name after the "as" keyword in the SQL SELECT clause. This is the name of a column within a table. |
| gby | {attr} | For AGG_LIST, the group_by list |
| lattrs = rattrs | 2 attr[]'s | For EQJOIN, the equality conditions |
| collection | STRING | For GET, a name identifying a stored relation |
| table range_var | STRING | For GET, the name of the table. Corresponds to the as value, specified in the SQL from clause This is a table range variable. It will be used to form the name of an attribute in the table output by the GET, as well as tables formed by applying other logical operators to this table. |
| o_attrs | attr[] | For ORDER_BY, an ordered list of attributes |
| order_func | (asc|desc)[] | For ORDER_BY, an ordered list corresponding to the o_attrs list, which indicates, for each attribute in o_attrs, whether to order in ascending or descending order. |
| Input | Type | Context |
|---|---|---|
| input table or input | stream of tuples | when a single bulk input |
| left table or left | stream of tuples | when two bulk inputs, the outer |
| right table or right | stream of tuples | when two bulk inputs, the inner |
| predicate | item operator | the predicate |
| Output | Type | Context |
|---|---|---|
| output table or output | stream of tuples | Every logical operator |
|
|---|
|
|---|