CS386/586 Introduction to Databases
Fall 2011 Quarter
Assignment 7 ¨C Query Evaluation and Transactions Rev. of Question 1b.
Due: Thursday, 2 December 2011
at the beginning of class
You may do this assignment individually or you may work with one
partner. That is, this assignment is to be completed by individuals or by
teams of two students. If you work with a partner, then you should turn
in one assignment paper, with both of your names on the paper. You should only
talk to the instructor, the TA and your partner about this assignment. You may
also post questions to the course mailing list, cs386@cs.pdx.edu.
Please turn in your completed assignments on
paper. Put your last name, first name, the assignment number in that order in
the first line of your assignment. List last name and first name for your
partner, if you have one, on the second line of your assignment. (If you are
working with a partner, turn in one assignment paper.)
(10 points each) For each of the proposed
equivalences below on relations r and s, give example instances where the
equivalence does not hold. (Assume the expressions are syntactically valid.)
Also, for each proposed equivalence, give an
additional condition that will guarantee that the equivalence holds. The
condition should be at the schema level, and the more general the condition,
the better. X is a set of attributes and C is a single attribute. All joins are
natural joins. COUNT() is a function that returns the
number of tuples in a relation.
a. pX(r ⋈ s) º pX(r) ⋈ s
b. pX(r ¨C (sC=c(r))) º pX(r) ¨C pX(sC=c(r))
c. COUNT(r ⋈ s) º COUNT(r) * COUNT(s)
d. COUNT(r ⋈ s) º COUNT(r)
If the symbols are funny on your browser, see
http://web.cecs.pdx.edu/~maier/386/Assignments/hw11-07.pdf
a. (5 points) Determine
the min and max salary values in the agent table, and the number of rows in
that table.
b. (5 points) Give an estimate for the number of rows in agent with salary
< 75000, assuming a uniform distribution between min and max salary. Explain
how you derived your estimate.
c. (5 points) Find the 25th, 50th and 75th percentile values for salaries in
the agent table. (The 50th percentile value, for instance, is the smallest
number s such that 50% of the rows have salary value less than or equal to s.)
d. (5 points) Give an estimate of the number of rows in agent with salary
< 75000, assuming in a uniform distribution in each quartile determined in
c. Explain how you derived your estimate.
e. (5 points) How many rows in agent have salary < 75000?
a. (5 points) How many distinct cities in the
agent table?
b. (5 points) Give an estimate for the number of rows in agent that satisfy the
predicate city = C (assuming city C exists).
c. (5 points) How many distinct countries in the agent table?
d. (5 points) Give an estimate for the number of rows in agent that satisfy the
predicate country = N (assuming country M exists).
e. (5 points) If city and country are uncorrelated, then one estimate for the
number of rows that satisfy the predicate city = C AND country = N is #rows/((# distinct cities)*(# distinct countries)). Calculate
this value.
f. (10 points) How many different city-country pairs
actually appear in agent? What does that number suggest about the independence
of city and country? Propose a better formula to estimate the number of rows
that satisfy city = C AND country = N.
For each SQL
statement below, draw the query plan the Postgres poduces (which you can obtain with the EXPLAIN command).
For each plan, suggest a reason that the particular join algorithm(s) is being
chosen.
a. (10 points) SELECT A1.last, A2.last FROM agent A1, agent A2, languagerel
LR, Language L WHERE A1.salary = A2.salary AND A2.agent_ID = LR.agent_ID AND LR.lang_ID = L.lang_ID AND L.language =
'German'
b. (10 points) SELECT A1.last, A2.last FROM
agent A1, agent A2, languagerel LR, Language L WHERE
A1.salary > A2.salary AND A2.agent_ID = LR.agent_ID
AND LR.lang_ID = L.lang_ID
AND L.language = 'German'
c. (10 points) SELECT A1.last, A2.last FROM agent A1, agent A2, languagerel
LR, Language L WHERE A1.salary > A2.salary AND A2.agent_ID = LR.agent_ID AND LR.lang_ID = L.lang_ID AND L.language <>
'German'
Note: You
can find info on the EXPLAIN command at http://www.postgresql.org/docs/8.4/interactive/performance-tips.html
Here are three transactions
T1:
r(A), w(B), w(C); T2:
w(A), r(C) ; T3:
r(B), w(C)
For each partial schedule below, say whether it
can be completed (all transactions finish) to give a schedule with no cycles in
the precedence graph. If so, show
how. If not, say why.
a. (10 points)
T1:r(A) w(B)
T2: w(A)
T3: r(B)
b. (10 points)
T1:r(A) w(B)
T2: w(A) r(C)
T3: r(B)
c. (10 points)
T1:r(A) w(B)
T2: w(A)
T3:
r(B)w(C)