Study Sheet: Transactions, Recovery, and Concurrency

Transaction Properties

ACID Properties:

A: Atomicity
All-or-nothing. Either everything in the transaction is executed, or the effects of the transaction are ignored.

C: Consistency
A database state is "consistent" if all constraints are satisfied. During a transaction, a database may be in an inconsistent state. However, if the database was consistent to begin with, it will be consistent again once the transaction has completed.

I: Isolation
The effects of a transaction on a database are not dependent on any other transactions that may be running at the same time. It is as if each transaction were run by itself, even if several are actually running concurrently.

D: Durability
The effects of a transaction, if completed, are resilient against system crashes or reboots. Upon restart, the changes should be visible.

You will need to know these properties, and be able to give an example of how these properties work or can be violated.

Transactions only care about reads and writes to the database. Anything else, such as user interface or client-side computation, are irrelevant to transaction processing.

Transaction Definitions

Commit: marks the end of a transaction that has successfully completed, with the intent to make those changes permanent.
Abort: indicates that the effects of the transaction should be rolled back, or undone so that the database state is returned to the state it was in before the transaction started.

Schedule: the order in which the DBMS will execute the reads and writes from all transactions.
Serial schedule: a schedule where each transaction is run one at a time, where no database actions are interleaved.
Serializable schedule: a schedule where the final state of the database is equal to the final state of any serial schedule.

Aborted transactions are not considered part of the schedule when trying to determine if a schedule is serializable.

Conflict: a pair of actions from two transactions where, if the two actions were to be reversed in order, the final state of the database would be changed. This occurs when the two actions act upon the same resource, and at least one of the actions is a write.
What does it mean for two schedules to be conflict-equivalent? What is a conflict serializable schedule?

Conflict serializable schedules are serializable. The reverse is not true. (You should be able to say why the example given in lecture demonstrates this fact.)

Unrepeatable read: a read-write conflict. Occurs when a transaction reads a value that is then overwritten, and thus shows up as different if read again.
Dirty read: a write-read conflict. Occurs when one transaction reads a value that may be uncommitted and therefore undone.
Phantoms: potential problem where the specific rows returned by a database change between read actions.

You won't be asked what the isolation levels are by name, but you will need to know what effect the isolation levels have with regard to the above anomalies.

Precedence graph: directed graph where each node represents a transaction. There is an arc from node T1 to T2 if there are conflicting actions between them, and the T1 actions occur first. You should be able to draw a precedence graph for a schedule, and use it to answer questions about the schedule.

Locks and Lock Management

There are two kinds of locks. What are they, and what are they used for? In what situations can locks be allowed?

Strict Two-Phase Locking is an algorithm where all locks are acquired incrementally, but released all at once. Given a schedule, you should be able to determine if strict 2PL would work for it.

If a process cannot get the lock it needs, it waits until the previously-placed locks have been released. Lock acquisition is first come, first served.

Deadlock occurs when transactions are waiting, but the locks they are waiting for are also held by waiting transactions. To resolve deadlock, the DBMS must choose a transaction and abort it, then allow it to restart.

Livelock occurs when the same transaction is always chosen for restart. You should know why this is a problem.

Locks can exist on database objects at various resolutions: for instance, at the table level or at the row level. You should know what the advantages and disadvantages are of locking resources at each level, as well as their effect on database anomalies like phantoms.

Crash Recovery

What is the Write-Ahead Log? Why is it important? What might happen if the log entry is written after the database action instead of before it?

What does it mean to force a database action? What does it mean to steal a database page?

There are advantages and disadvantages to both the force/no-steal and the steal/no-force approaches. What are they? Which one is used in most major commercial DBMS's?

What kinds of record are found in the WAL?

There are three phases in the simple model of crash recovery: Analysts, Redo, and Undo. You know be able to (briefly) describe what these phases do.