CS386/586
Introduction to Databases
Fall
2011 Quarter
Assignment 6 Disks, Indexes
and Join Algorithms
Correction in Part II
Due: Thursday, 17
November 2011 at the beginning of class
You may do this assignment individually or you may
work with one partner. That is, this assignment is to be completed by
individuals or by teams of two students. If you work with a partner, then
you should turn in one assignment paper, with both of your names on the paper.
You should only talk to the instructor, the TA and your partner about this assignment. You may also post questions to the
course mailing list, cs386-list@cs.pdx.edu.
Please turn
in your completed assignments on paper. Put your last name, first name, the
assignment number in that order in the first line of your assignment.
List last name and first name for your partner, if you have one, on the second
line of your assignment. (If you are working with a partner, turn in one
assignment paper.)
Question 1 (10 points): Some
hard disk drives have a second read-write head unit located opposite the first
one. What aspect of disk I/O time is this addition most likely to
improve? Justify your answer.
It is sometimes possible to
evaluate a particular query using only indexes, without accessing the actual
data records.
Consider a database with two
tables:
Book(ISBN,
title, year, publisher)
Sells(ISBN,
vendor, price)
Assume three unclustered
indexes, where the leaf entries have the form [search-key value, RID].
<year> on Book
<publisher> on Book
<price, ISBN> on Sells
For Questions 2-9, say which
queries can be evaluated with just data from these indexes.
- If the query can, describe how.
- If the query can't, explain why.
Each question
worth 10 points.
Question 2:
SELECT MAX(price)
FROM Sells;
Question 3:
SELECT ISBN, MIN(price)
FROM Sells
GROUP BY ISBN;
Question 4:
SELECT AVERAGE(price)
FROM Sells
GROUP BY vendor;
Question 5 (think carefully
about this one!):
SELECT ISBN, AVERAGE(price)
FROM Sells
GROUP BY ISBN
HAVING COUNT(DISTINCT
vendor) > 1;
Question 6:
SELECT COUNT(*)
FROM Book
WHERE year = 2003 AND publisher
= 'Knopf';
Question 7:
SELECT publisher, COUNT(ISBN)
FROM Book
GROUP BY publisher;
Question 8:
SELECT COUNT(DISTINCT
year)
FROM Book
WHERE publisher = 'Penguin';
Question 9:
SELECT publisher, price
FROM Book NATURAL JOIN Sells
WHERE year = 2003;
Questions 10 (20 points):
Consider evaluating a join operation that we know is followed by a project,
such as
PROJECTr.C, s.D, s.E, s.F(r JOINr.A = s.B
s)
Assume we are using 8K pages,
and that r has 100K pages and s has 500K pages.
Further assume each row in r is
200 bytes long (so 40 rows/page) and each row in s is 500 bytes long (so 16
rows/page).
Consider using the sort-merge
implementation of join, with 51 page buffers (50 for input runs, 1 for output
run)
(a) Calculate how many page I/Os it takes to merge sort each of r and s.
(b) Since we do not need all the
columns in r and s for the final result, consider a variation of merge sort
where on the first pass, we only write out the r.A
and r.C columns of r rows, and the s.B, s.D, s.E
and s.F columns of s rows. Calculate the number of
page I/Os for sorting each of r and s with this
modification, assuming each column is 10 bytes and that there are 8000 bytes of
row storage per page.