CS386/586 Introduction to Databases
Fall 2011 Quarter


Assignment 6 Disks, Indexes and Join Algorithms

Correction in Part II

Due: Thursday, 17 November 2011 at the beginning of class

You may do this assignment individually or you may work with one partner.  That is, this assignment is to be completed by individuals or by teams of two students.  If you work with a partner, then you should turn in one assignment paper, with both of your names on the paper. You should only talk to the instructor, the TA and your partner about this assignment. You may also post questions to the course mailing list, cs386-list@cs.pdx.edu.

Please turn in your completed assignments on paper. Put your last name, first name, the assignment number in that order in the first line of your assignment.  List last name and first name for your partner, if you have one, on the second line of your assignment. (If you are working with a partner, turn in one assignment paper.)

 

Part I: Disk Variations

Question 1 (10 points): Some hard disk drives have a second read-write head unit located opposite the first one.  What aspect of disk I/O time is this addition most likely to improve?  Justify your answer.

 

Part II: Evaluating Queries with Indexes

It is sometimes possible to evaluate a particular query using only indexes, without accessing the actual data records.

 

Consider a database with two tables:

Book(ISBN, title, year, publisher)

Sells(ISBN, vendor, price)

 

Assume three unclustered indexes, where the leaf entries have the form [search-key value, RID].

<year> on Book

<publisher> on Book

<price, ISBN> on Sells

 

For Questions 2-9, say which queries can be evaluated with just data from these indexes.

-  If the query can, describe how.

-  If the query can't, explain why.

Each question worth 10 points.

 

Question 2:

SELECT MAX(price)

FROM Sells;

 

Question 3:

SELECT ISBN, MIN(price)

FROM Sells

GROUP BY ISBN;

 

Question 4:

SELECT AVERAGE(price)

FROM Sells

GROUP BY vendor;

 

Question 5 (think carefully about this one!):

SELECT ISBN, AVERAGE(price)

FROM Sells

GROUP BY ISBN

HAVING COUNT(DISTINCT vendor) > 1;

 

Question 6:

SELECT COUNT(*)

FROM Book

WHERE year = 2003 AND publisher = 'Knopf';

 

Question 7:

SELECT publisher, COUNT(ISBN)

FROM Book

GROUP BY publisher;

 

Question 8:

SELECT COUNT(DISTINCT year)
FROM Book

WHERE publisher = 'Penguin';

 

Question 9:

SELECT publisher, price

FROM Book NATURAL JOIN Sells

WHERE year = 2003;

 

Part III: Join Algorithms

Questions 10 (20 points): Consider evaluating a join operation that we know is followed by a project, such as

PROJECTr.C, s.D, s.E, s.F(r JOINr.A = s.B s)

Assume we are using 8K pages, and that r has 100K pages and s has 500K pages.

Further assume each row in r is 200 bytes long (so 40 rows/page) and each row in s is 500 bytes long (so 16 rows/page).

 

Consider using the sort-merge implementation of join, with 51 page buffers (50 for input runs, 1 for output run)

 

(a) Calculate how many page I/Os it takes to merge sort each of r and s.

 

(b) Since we do not need all the columns in r and s for the final result, consider a variation of merge sort where on the first pass, we only write out the r.A and r.C columns of r rows, and the s.B, s.D, s.E and s.F columns of s rows. Calculate the number of page I/Os for sorting each of r and s with this modification, assuming each column is 10 bytes and that there are 8000 bytes of row storage per page.