It’s a Lock: Due Diligence, Schema Changes, and You

Last Updated 7 years ago

It’s morning standup, and someone says, “It’s no big deal, we just need to add a couple of columns. It’s already in the build, it works fine.”

The next time this happens, stop and say, “Let’s take a closer look at that.”

When you write schema changes for a relational database, take the time to investigate what your change will do at a granular level.

Schema changes matter. A change that seems simple and works perfectly in a small environment can cause big problems in a large or active environment because of locking issues: it’s happened to me. Let’s keep it from happening to you. The good news is that even though the impact may be very different on an active or large system, you can do your investigation on a pretty small scale– right at home in your test environment with sample data.

What Could Possibly Go Wrong?

If you’re remembering to ask this question when you do any schema change, you’ve got the critical part down already. In this post, I’ll help show you how to find the answer.

Here’s what you’ve already done:

Written the code for the schema change
Tested the schema change within the database
Tested the schema change for calling applications
Tested the schema change for any remote procedure calls or replicated subscribers

If not, head back and do those first.

Why Does Locking Matter?

SQL Server’s engine uses locking to isolate resources and protect current transactions.

Schema changes require the use of different kinds of locks. Data Definition Language (DDL) changes all require schema modification locks (SCH-M), too. As Books Online explains:

During the time that it is held, the Sch-M lock prevents concurrent access to the table. This means the Sch-M lock blocks all outside operations until the lock is released.

When you’re partying with schema, you’re partying alone. The other critical thing to know is that Sch-M locks aren’t the ONLY locks required for your change: you may need locks on objects related by constraints, or referenced by your object definition.

With multiple locks in play, these are going to come in a sequence–and this quickly introduces lots of opportunities for blocking and deadlocks in your schema change.

What to do? Make it a practice to look at the locks required by your schema change.

Method 1: Lock Investigation with sys.dm_tran_locks and sp_whoIsActive (Quick and Dirty)

SQL Server’s dynamic management objects provide the most convenient way to check out locks.

I’m a big fan of Adam Machanic’s sp_whoisactive stored procedure. It’s incredibly quick and easy to use it to grab information on locks:

Open a connection to a test system where sp_whoisactive is installed;
Begin a transaction;
Run a command to make a schema change;
Run the following from another connection to look at locks that are being held: sp_whoisactive @get_locks=1
Roll back or commit your transaction.

Sp_whoisactive will return a nicely organized XML summary of all the locks that are currently being held by querying the sys.dm_tran_locks view– with all the associated IDs translated into meaningful names for you. Adam has recently blogged about this here.

Strengths: This is great for convenience and an initial quick look at a situation. I’ve found lots of interesting things just using this method against a test database. This can also be incredibly useful in gathering information from a production situation where there is contention.

Warnings: Using an open transaction and sys.dm_tran_locks won’t show you all locks. Even though a transaction is open, the Database Engine will release some locks as soon as it is able to when doing a schema change, and you’re unlikely to see those in test. Also, As Adam notes in his documentation, it can be time consuming to use @get_locks in some situations. Supervise your performance when you use this option to look at active locks in production.

Method 2: Trace Your Locks (Messy)

You can look at lock granularity in SQL Server Profiler, or use Profiler to generate a script to run a server side SQL Trace.

It’s easy to set up and run Profiler against a test environment, but you’ll quickly notice that tracing locks generates LOTS of rows, and there’s lots of IDs for you to translate to figure out exactly what’s being locked.

To help work with the results, you can stream your Profiler output to a database table. Even better, you can use a server side trace to write to a file, then import it into a database with FN_TRACE_GETTABLE, then use the sys.trace_events and sys.trace_subclass_values to translate the profiler events and lock modes. Once the trace is in the database, you can resolve all those object IDs programmatically, which is great.

Strengths: This is the best method for looking at locks on test systems prior to SQL 2008. You can programmatically automate the tracing and import and analysis of the data, which is great.

Warnings: This method is suitable for test systems only– profiling locks in a test system with low activity can generate a large amount of rows, and doing so on a production system can be catastrophic. You’ll also find that events happen very quickly even in a test system, and matching up pairs of starting and finishing lock events can be tricky.

Method 3: Extended Events (Difficult, but Fulfilling)

Extended Events are the best way to handle this task on SQL Server 2008 instances and higher. Looking at locks in a test environment is a perfect way to get to know X Events– this is a technology you want to become familiar with!

To use this method, you’ll create an event session and add events to the session. Events to look at locks include lock_acquired, lock_released, sql_statement_starting, and sql_statement_completed.

You’ll need to select a target for your event session. I personally prefer to use an asynchronous file target if I’m looking at lock events in detail. Note that if you select the ring buffer target, it will clear for your session after you stop collecting events. (You can drop events from an active event session to stop collection, but I find it simpler to write to the file target.) Even against a test system, I prefer to collect only the data I need when looking at locks. This simplifies interpretation, limits my footprint on the instance, and allows me to share and review the data at a later time.

Once this is configured, you simply start your session against your test system, then run your schema changes in another connection. You then stop your event session and query your results. If you used a file target, sys.fn_xe_file_target_read_file will read in the file for you so you can query the data, which is stored an an XML format.

Strengths: Extended Events are more flexible and perform better than SQL Trace. You can use different targets, such as the bucketizer target to count occurrences, and you can also provide filters to collect only specific events.

Warnings: Using Extended Events takes a bit of preparation and research. We don’t have a GUI application from the SQL Server team to configure or manage XEvent sessions yet. With some reading and testing, you don’t need one: just allow yourself time to explore and code the right solution. To get started, I recommend watching Jonathan Kehayias’ presentation Opening the SQL Server 2008 Toolbox – An Introduction to Extended Events from SQLBits VII

Where to Go from Here

Pick the test solution to evaluate locks that’s going to work best for your team. Pick the solution that you can automate, make repeatable, and use consistently. Regularly run this against test systems to analyze locks needed by schema changes, and talk about what the possible repercussions of your changes might be.

Even if you decide that you want to use a quick method using sys.dm_tran_locks which won’t show you every single lock, using this tool is much safer than not checking at all. I’ve found lots of interesting gotchas this way.

Remember: figuring out how to make it easy to figure out what locks will be required isn’t the whole story. You need to analyze and interpret those results for each change.

If you take the time to think about locks you will find out ahead of time which changes are safe to run mid-week and which changes need to be carefully scheduled or rewritten. Add this to your practices, and I promise you’ll avoid some post-mortem meetings for changes gone wrong.

As a bonus, mastering these techniques will make you a superstar at troubleshooting blocking issues and deadlocks. And that’s a great thing to be.

DBA Nightmare: SQL Server Down, No Plans

Which Database Is Right for Me?

5 Comments. Leave new

Michael J Swart
May 4, 2011 9:30 am

Wow. I’ve got a very similar post today. Except I focus on blocking instead of locking. I’ll be sure to link back here 🙂

Reply
- Michael J Swart
  May 4, 2011 11:29 am
  
  As promised:
  http://michaeljswart.com/2011/05/when-to-use-blocked-processes-reports/
  
  Reply
Mike Hillwig
May 5, 2011 8:59 pm

Rule #3: We don’t make system changes during peak use times. Changes go in during off-hours or maintenance windows.

Reply
Eric Russell
May 9, 2011 7:56 am

In many cases adding nullable columns to even a large sized table is trivial. However, adding columns that are not nullable will certainly require a size of data operations for every record in the table. Fortunately I have an environment containg a weekly snapshop of production, where I can test deployment scripts to make double sure. Regardless, schema changes should definately be scheduled for off hours.

Reply
Kendra Little
May 9, 2011 3:00 pm

@Mike and @Eric–

Re: Scheduling changes. This varies by team/business. I’ve worked with several agile development teams where it’s routine practice to make incremental schema changes during normal to high usage times. I was very wary of this at first, but after working with the process, I’m a fan of it in many ways. There are a lot of benefits that go with the extra risk. However, evaluating changes, identifying worst case scenario, having rollback code, and making sure application code is backwards compatible are all critical to working this way. (Being able to concretely identify when to make an exception and roll something differently is also critical.)

I’ve also seen environments where there is no real “down” time or a time when a loss of availability is OK. In those environments you need to know as much as possible about each change so you’re best prepared if things don’t go as planned.

In general, it’s really difficult in most environments to simulate true production-like traffic in test, so these posts are a suggestion of a way to use lock analysis to help overcome that limitation– however your change processes are set up.

I do also have a post coming up showing a detailed example of how to use one of these methods to learn more about a particular schema change. Hopefully there’ll be a few interesting surprises in there!

Reply