CS386/586 Introduction to Databases
Fall 2011 Quarter


Assignment 7 ¨C Query Evaluation and Transactions Rev. of Question 1b.
Due: Thursday, 2 December 2011 at the beginning of class

You may do this assignment individually or you may work with one partner.  That is, this assignment is to be completed by individuals or by teams of two students.  If you work with a partner, then you should turn in one assignment paper, with both of your names on the paper. You should only talk to the instructor, the TA and your partner about this assignment. You may also post questions to the course mailing list, cs386@cs.pdx.edu.

Please turn in your completed assignments on paper. Put your last name, first name, the assignment number in that order in the first line of your assignment.  List last name and first name for your partner, if you have one, on the second line of your assignment. (If you are working with a partner, turn in one assignment paper.)

Some of this assignment is based on the Spy relational database. Information about this database and information how to access it is on the Database Info Page. Note that the database you want is called ¡®introdb_spy¡¯.
Question 1: Relational Algebra Equivalences

(10 points each) For each of the proposed equivalences below on relations r and s, give example instances where the equivalence does not hold. (Assume the expressions are syntactically valid.) Also, for each proposed equivalence, give an additional condition that will guarantee that the equivalence holds. The condition should be at the schema level, and the more general the condition, the better. X is a set of attributes and C is a single attribute. All joins are natural joins. COUNT() is a function that returns the number of tuples in a relation.

 

a. pX(r s) º pX(r) s

b. pX(r ¨C (sC=c(r))) º pX(r) ¨C pX(sC=c(r))

c. COUNT(r s) º COUNT(r) * COUNT(s)

d. COUNT(r s) º COUNT(r)

 

If the symbols are funny on your browser, see

http://web.cecs.pdx.edu/~maier/386/Assignments/hw11-07.pdf

Question 2: Statistics

a. (5 points) Determine the min and max salary values in the agent table, and the number of rows in that table.
b. (5 points) Give an estimate for the number of rows in agent with salary < 75000, assuming a uniform distribution between min and max salary. Explain how you derived your estimate.
c. (5 points) Find the 25th, 50th and 75th percentile values for salaries in the agent table. (The 50th percentile value, for instance, is the smallest number s such that 50% of the rows have salary value less than or equal to s.)
d. (5 points) Give an estimate of the number of rows in agent with salary < 75000, assuming in a uniform distribution in each quartile determined in c. Explain how you derived your estimate.
e. (5 points) How many rows in agent have salary < 75000?

Question 3: Selectivity Estimation

a. (5 points) How many distinct cities in the agent table?
b. (5 points) Give an estimate for the number of rows in agent that satisfy the predicate city = C (assuming city C exists).
c. (5 points) How many distinct countries in the agent table?
d. (5 points) Give an estimate for the number of rows in agent that satisfy the predicate country = N (assuming country M exists).
e. (5 points) If city and country are uncorrelated, then one estimate for the number of rows that satisfy the predicate city = C AND country = N is #rows/((# distinct cities)*(# distinct countries)). Calculate this value.
f. (10 points) How many different city-country pairs actually appear in agent? What does that number suggest about the independence of city and country? Propose a better formula to estimate the number of rows that satisfy city = C AND country = N.

Question 4: Query Plans

For each SQL statement below, draw the query plan the Postgres poduces (which you can obtain with the EXPLAIN command). For each plan, suggest a reason that the particular join algorithm(s) is being chosen.
a. (10 points)
SELECT A1.last, A2.last FROM agent A1, agent A2, languagerel LR, Language L WHERE A1.salary = A2.salary AND A2.agent_ID = LR.agent_ID AND LR.lang_ID = L.lang_ID AND L.language = 'German'
b. (10 points) SELECT A1.last, A2.last FROM agent A1, agent A2, languagerel LR, Language L WHERE A1.salary > A2.salary AND A2.agent_ID = LR.agent_ID AND LR.lang_ID = L.lang_ID AND L.language = 'German'
c. (10 points)
SELECT A1.last, A2.last FROM agent A1, agent A2, languagerel LR, Language L WHERE A1.salary > A2.salary AND A2.agent_ID = LR.agent_ID AND LR.lang_ID = L.lang_ID AND L.language <> 'German'

Note: You can find info on the EXPLAIN command at http://www.postgresql.org/docs/8.4/interactive/performance-tips.html

Question 5: Transactions

Here are three transactions

            T1: r(A), w(B), w(C);             T2: w(A), r(C) ;          T3: r(B), w(C)

For each partial schedule below, say whether it can be completed (all transactions finish) to give a schedule with no cycles in the precedence graph.  If so, show how. If not, say why.

 

a. (10 points)

T1:r(A)        w(B)

T2:    w(A)

T3:        r(B)

 

b. (10 points)

T1:r(A)        w(B)

T2:    w(A)        r(C)

T3:        r(B)

 

c. (10 points)

T1:r(A)    w(B)

T2:    w(A)

T3:            r(B)w(C)