Exercise 6: Indexes and Join Algorithms  Suggested Answers

1.     

3

 
This is the same as Figure 10.16 in the book on page 354.




Circle the pages you would need to read in order to find all entries >= key value 15.
Number the circled pages in the order that you would read them.

The pages are circled and numbered above.

2.      For this question, use the figure above. Assume that the index is over an attribute called “age”.

  1. How many pages do you have to read to identify all rows that satisfy the condition “age = 16”? (Just find the rows, not retrieve them)

    3 pages: the ones marked 1, 2, and 3 above.

  2. How many pages do you have to read to identify all rows that satisfy the condition “age < 22”? (Just find the rows, not retrieve them)

    6 pages: the ones marked 1 and 2 above, plus the leftmost leaf page, the second leaf page (from the left), and the pages marked 3 and 4  above.  Note: you have to read the page marked 4 above in order to find out that there aren’t any more records with search key value less than 22.

  3. How many pages do you have to read to identify all rows that satisfy the conditions “age < 22 AND age > 5”? (Just find the rows, not retrieve them)

    5 pages: the ones marked 1 and 2, the second leaf page (from the left), and the pages marked 3 and 4 above.

Now, make the following assumptions:

  1. If the index is clustered, what is the most number of pages that you will need to read to retrieve the rows from part (c) (based on the addresses you get, e.g., 8*, from the leaf level of the index)?

    Note: we assume that each entry (such as 8*) in the leaf-level of the index points to just one data record (the record with search key value of 8).  If there were two records with a search key value of 8, then there would be two entries 8* in the leaf level of the index. 

    Answer is 2 pages – to read the data pages corresponding to the index shown above.  (If any leaf-level page in the index that we were interested in had four entries in it, we would need to read 2 data pages to get all four data records.)

    Here’s the minimum number of data pages that you would have to read:
    Page 1: Since the index is dense, we will read a data page reachable from the second leaf page (from the left).  This data page will have all three of the records that correspond to the key value 5, the key value 7, and the key value 8.
    Page 2: Since the index is dense, we will read a data page reachable from the third leaf page (from the left – the one marked as 3 above).  This data page will contain the three data records that correspond to the two key values of 14 and 16.
    We don’t need any other data pages, because we can tell that when we read the leaf-level index page marked 4 above that there are no more records with a key value less than 22.

    The maximum number of data pages that you would have to read is 4.  This would happen if the record with search keys of 5 and 7 were on one data page and the record with search key of 8 was on the next data page.  In a similar way, it could happen that the records with search keys of 14 and 16 were on separate data pages.  So, in the worst case, you would read 4 data pages.


  2. If the index is unclustered, what is the most number of pages that you will need to read to retrieve all rows from part (c) (based on the addresses you get, e.g., 8*, from the leaf level of the index)?

    We will need to read a maximum of 4 pages – one for each entry in the leaf-level for the index, since each data record is (likely) on a different data page.  We need to read a data page for each of the key values 7, 8, 14, and 16.

3.      Suppose I am joining two tables A and B using the join condition A.field1 = B.field1.

Suppose that table A is already sorted into ascending order on field1.
Suppose that table B is already sorted into ascending order on field1.

a.       Will the Page-oriented nested loops join algorithm, as shown on slide 2 of lecture 6, work any slower or any faster on these sorted tables than it would if tables A and B were not sorted? (In other words, is the Page-oriented nested loops join algorithm able to exploit the fact that one or both of its input tables are sorted on the join attribute?) Explain briefly.

No.  The page-oriented nested loops join algorithm works the same whether the data is sorted or unsorted.  You can think of it as a “brute force” join algorithm because it cycles through every combination of one row from the left table with one row from the right table.

Note: someone could write a new algorithm, maybe called “sort-aware-page-oriented nested loops join algorithm” that would expect to be told whether or not either of the input table was sorted on the join key and then it would be able to “stop early” on the scan of the right table in the inner loop.  The might be other optimizations possible, as well.


b.      Can you use the page-oriented nested loops algorithm to compute the cross product of two tables A and B?  Explain briefly.  (For the purpose of this question, you can assume that A and B are not sorted.)

Yes.  The basic page-oriented nested loops algorithm is computing the cross product – by cycling through all possible combinations of one row from the left table with one row from the right table.  You could use it by specifying a join condition of “true” or some join condition that always evaluates to true such as “6 = 6”.

c.       Can you use the index-nested loops algorithm to compute the cross product of two tables A and B?  Explain briefly.  (If you are able to use this algorithm, then explain what sort of index you would use for the table in the inner loop.)

No.  It won’t work to use the indexed-nested loops algorithm to compute the cross product.  For a cross product, you need to scan the entire table (the table on the right) in the inner loop.  The indexed nested loops algorithm expects to read data from the table on the right using some search key. 


 

4.      Challenge question (on your own, if you want to): Devise an algorithm that calculates the DIVIDE operator, and calculate the number of I/O's your algorithm takes.  (Do the DIVIDE directly – in a single operator.)

I leave this as a question for you folks to discuss using the e-mail list.  If you think you have an algorithm for the divide operator you can send it out to the class e-mail list.  Other students may look decide that they want to look at your algorithm and comment on it.