CS386 Introduction to Databases
Spring 2006 Quarter
Assignment 6 ¨C Indexes
and Query Evaluation
Due: Thursday, 1 June 2006 at
the beginning of class
You may do this assignment individually or you may work with one
partner. That is, this assignment is to be completed by individuals or by
teams of two students. If you work with a partner, then you should turn
in one assignment paper, with both of your names on the paper. You should only
talk to the instructor, the TA and your partner about this assignment. You may
also post questions to the course mailing list, cs386@cs.pdx.edu.
Please turn in your completed assignments on
paper. Put your last name, first name, the assignment number in that order in
the first line of your assignment. List last name and first name for your
partner, if you have one, on the second line of your assignment. (If you are
working with a partner, turn in one assignment paper.)
(5 points) How many
different (but equivalent) join orders are there for the join of three tables
R, S and T? Two orders are considered different if they have parentheses in
different places. That is (R JOIN S) JOIN T is different from R JOIN (S JOIN
T).
a. (5 points) Determine
the min and max salary values in the agent table, and the number of rows in
that table.
b. (5 points) Give an estimate for the number of rows in agent with salary
< 65000, assuming a uniform distribution between min and max salary. Explain
how you derived your estimate.
c. (5 points) Find the 25th, 50th and 75th percentile values for salaries in
the agent table. (The 50th percentile value, for instance, is the smallest
number s such that 50% of the rows have salary value less than or equal to s.)
d. (5 points) Give an estimate of the number of rows in agent with salary
< 65000, assuming in a uniform distribution in each quartile determined in
c. Explain how you derived your estimate.
e. (5 points) How many rows in agent have salary < 65000?
a. (5 points) How many distinct cities in the
agent table?
b. (5 points) Give an estimate for the number of rows in agent that satisfy the
predicate city = C (assuming C is actually the name of one of the cities).
c. (5 points) How many distinct countries in the agent table?
d. (5 points) Give an estimate for the number of rows in agent that satisfy the
predicate country = N (assuming N is a country name).
e. (5 points) If city and country are uncorrelated, then one estimate for the
number of rows that satisfy the predicate city = C AND country= N is #rows/((# distinct cities)*(# distinct countries)). Calculate
this value.
f. (10 points) How many different city-country pairs
actually appear in agent? What does that number suggest about the independence
of city and country? Propose a better formula to estimate the number of rows that
satisfy city = C AND country = N.
For each SQL
statement below, draw the query plan the Postgres poduces (which you can obtain with the EXPLAIN command).
For each plan, suggest a reason that the particular join algorithm(s) is being
chosen.
a. (10 points) SELECT last, sc_level FROM agent, securityclearance WHERE agent_id
= sc_id (I know, the query doesn't make much sense.)
b. (10 points) SELECT last, language FROM agent NATURAL JOIN languagerel
NATURAL JOIN language WHERE language = 'Russian' and salary > 62000.
c. (10 points) SELECT
last, language FROM agent NATURAL JOIN languagerel
NATURAL JOIN language WHERE language <> 'Russian' and salary > 62000.
Note: You
can find info on the EXPLAIN command at http://www.postgresql.org/docs/8.1/interactive/performance-tips.html