Did you know that SQL Server’s locking has a name? It’s called two-phase locking. If we’re really getting specific about it, SQL Server uses what’s called strong strict two-phase locking or SS2PL. We’ll get there in a few minutes, right now we’re going to take a look at what makes up two-phase locking.
But First, Some History
The earliest references to two-phase locking (2PL) that I can find is in Bernstein and Goodman’s 1981 paper Concurrency Control in Distributed Database Systems. The authors examine multiple 2PL techniques for synchronizing transactions, alternatives using timestamp ordering, integrated concurrency control methods combining 2PL and timestamp ordering, before mentioning some other also-rans in an appendix. (This may also be the first use of the now tired example involving bank balances and transactions.)
Even though his paper focuses on distributed databases, it’s still valuable because it sets up a common vocabulary for things to come. There’s a lot of theory in here. It’s interesting, but it’s still mathematical theory and most people glaze over when they see that sort of thing.
The Basics of Two-Phase Locking
2PL works by being explicit about who is doing what to whom. Or, in clearer terms:
(1) Different transactions cannot simultaneously own conflicting locks; and (2) once a transaction surrenders ownership of a lock, it may never obtain additional locks
Locks conflict if they’re not compatible with each other. That is to say that if both locks are on the same thing and at least one lock is a write lock, that’s a conflict. (Astute readers will notice that readers won’t block readers.) Anyone who has used SQL Server for a while will be familiar with what happens when we try to acquire a conflicting lock: we wait. Sometimes, we’ll wait for a good long while and a lock will eventually be released. Sometimes, we would end up waiting forever (a deadlock) if the lock manager didn’t step in and kill off one of the processes.
Why Is It Called Two-Phase Locking?
This process is called two-phase locking because there are two distinct phases. The two rules above hint at them, but in effect a transaction can either be issuing locks or releasing locks, it cannot be in stasis.
In reality, this works more like the following:
- A statement is issued by an application.
- SQL Server compiles the statement and determines the types of locks that are needed to most efficiently satisfy the query.
- Once all locks are acquired, the transaction is in a ready state.
- SQL Server will begin operations and release locks as appropriate.
If you’re really playing along at home, at some point you’ll figure out that lock acquisition and release varies by isolation level and results in the various phenomena that you see in each of the isolation levels. If you noticed that on your own, give yourself a gold star. If you didn’t, you’re normal.
Deadlocks are, sadly, a byproduct of a 2PL mechanism. Most of the literature talks about things like transaction graphs (or waits-for graphs) and edges. The Great Triumvirate illustrates this perfectly.
Daniel Webster is grooming his eyebrows in a fashion that almost became known as ‘the Webster’. He’s impatiently waiting for Henry Clay to finish with the funny pages, but he won’t release his comb until he has the funny pages. Henry Clay is reading what passes for the funny pages in 1836 (hint: it’s the New Yorker). Henry is well pleased with himself, but he’s waiting on John C Calhoun to relinquish the volumizer before he will give up the funny pages. John C Calhoun is volumizing his hair to lofty heights but he really wants Daniel Webster’s eyebrow comb. Everyone is waiting on something from everyone else, but nobody will give up first. This is a deadlock. Okay, maybe that’s not really a deadlock, but it’s better than a waits-for graph. A combination of techniques can be used to determine which transactions will be killed off. SQL Server will typically use the least expensive transaction (in terms of optimizer cost). SQL Server avoids some problems, too, by not attempting to retry transactions.
Strict Two-Phase Locking
So far, we’ve only talked about 2PL, but I said SQL Server is SS2PL. Between the two is Strict Two-Phase Locking or S2PL. S2PL is like 2PL but there are some more rules.
To be considered S2PL, a transaction has to follow the rules of 2PL (sound like normalization rules?). In addition, a transaction also has to release write locks after the transaction has ended and either been rolled back or committed. Interestingly, nothing is directly said about read locks in S2PL. However, read locks can be released as they are no longer needed during the transaction.
Strong Strict Two-Phase Locking
SS2PL (called S2PL in Concurrency Control and Recovery in Database Systems) requires that the locks are only released after the transaction is finished and has been committed or rolled back. SS2PL provides serializability – database transactions appear as if they are atomic and occurring in complete isolation from one another. Serializable transactions are interesting because for a database to truly be serializable, it should be possible to process transactions in any order, s long as the effective is the same as that of some serial order (not any, just some).
Why the Devil Should I Care?
Locks are the primary way that SQL Server manages concurrency. This is a limitation of the database. New CPUs, a SAN, more disks, solid state hardware, and more RAM will not remove locking, blocking, and deadlocks from the database. New CPUs, a SAN, more disks, solid state hardware, and more RAM will make locking, blocking, and deadlocks happen faster. They may happen so fast, that you don’t really notice the problem until it’s growing out of control.
Combine a healthy knowledge of how locking operates with a working knowledge of isolation levels and some allegedly insurmountable application problems can be resolved through simple changes in the data layer.