Changing Statistics Cause Longer Compilation Times

Execution Plans, Statistics

I need to be up front with you, dear reader, and tell you that you’re probably never going to need to know this.

I try to blog about stuff people need to know to get their job done – things that will be genuinely useful in your day-to-day performance tuning and management of SQL Server. This, however, is not one of those things. This is fun, obscure trivia, documenting a situation you’ll probably never run into, prompted by Forrest’s post proving that compilation time includes more than parse time.

Start with a hard-to-compile query plan.

I’ll use any Stack Overflow database and the hard-to-compile query plan from my post, Bad Idea Jeans: Building Big Query Plans, modified to join to the Users table:

Run the query, and check the Messages tab:

That complex query plan is hard to build, so it took some time:

The table variable doesn’t have any rows in it, though, so execution time is nearly instantaneous. The hard part here is clearly building the query plan.

But it gets much worse.
Try updating stats while it runs.

In one window, free the plan cache and start rebuilding the indexes on the Users table with online = on so the CTE query can execute:

Then in another window, run the CTE query again. Make a pot of coffee or hit the loo, because you’re gonna have some time. Several minutes later, both queries finish, and the output of the CTE’s statistics is pretty bizarre:

What’s happening? Every time the Users rebuild finishes, it’s also updating stats. The query compilation sees those updated stats, and…decides that the plan it just got done building is invalid, so it’d better go try to build another plan! Thus, the repeated messages about parse & compile time.

Yes, I actually hit this problem.

It sounds crazy, but to recreate a client issue, I was loading a 10 billion row table, which takes a while. As it was loading, its statistics were constantly changing, because as it grew, it was constantly hitting the automatic stats updates threshold.

As a result, simple queries with no where clause or order by could finish instantly, but add in even the simplest where clause, and the query would never finish. It was constantly recompiling as more data was loaded.

Shout out to Joe Obbish and Paul White who helped me troubleshoot this problem last year – all credit for solving the mystery goes to them:

What’s New & Undocumented in SQL Server 2019 Cumulative Update 9

SQL Server 2019

Last week’s SQL Server 2019 Cumulative Update 9 snuck in some undocumented things.

We got two new DMVs for Always On Availability Groups, sys.dm_hadr_cached_replica_states and sys.dm_hadr_cached_database_replica_states. They have a subset of data from other AG DMVs:

I sure wish I could point you to the documentation or KB article on those.

But we have no idea what changed in CU9 here, or why, or what to do about it.

I’m gonna be honest with you, dear reader: I’m starting to tell clients to choose SQL Server 2017 over SQL Server 2019. 2019 just doesn’t seem like it’s baked, and 2019’s Cumulative Updates have just become too large of a risk: the release notes aren’t honest about you’re deploying. That’s not fair to organizations that depend on SQL Server for mission critical deployments: they need to know what changed so they can be on the lookout for new unusual behaviors.

If your own developers were changing production without telling you how or why, you’d call them into the office, close the door, and have a conversation.

In lighter news, there are also a few new messages in sys.messages:

  • 898: Buffer Pool scan took %I64d seconds: database ID %d, command ‘%ls’, operation ‘%ls’, scanned buffers %I64d, total iterated buffers %I64d, wait time %I64d ms. See ‘‘ for more information.
  • 7877: Backup on secondary replica is not supported in the connection to contained availability group. Reconnect to SQL Instance and retry the operation.
  • 8659: Cannot continue columnstore index build because it requires %I64d KB misc memory per thread, while the total memory grant for query is %I64d KB, total Columnstore bulk insert memory is limited to %I64d KB in query, and the maximum memory grant is limited to %I64d KB per query in workload group id=%ld and resource pool id=%ld. Retry after modifying columnstore index to contain fewer columns, or after increasing the maximum memory grant limit with Resource Governor.
  • 35528: unable to read page due to invalid FCB
  • 46933: PolyBase configuration is invalid or corrupt. Re-run PolyBase setup.

Query Store is the Flight Data Recorder. It’s Obvious What We Need Next.


When Microsoft brought out Query Store, they wrote:

Similar to an airplane’s flight data recorder, Query Store collects and presents detailed historic information about all queries, greatly simplifying performance forensics by reducing the time to diagnose and resolve issues.

I have a straight face under here, for real

It certainly works, making it easier for you to understand when a query’s execution plan suddenly goes downhill, taking your server with it.

But it isn’t enough.

In airplanes, it’s not enough to know that the throttles were powered down and the altitude plummeted. It’s not enough to know what happened: we need to know why it happened. What problem did the team think they were facing? Was there a misunderstanding in the cockpit about a metric? A disagreement between team members about how to solve the problem?

That’s why we need Voice Store.

Just as a cockpit voice recorder helps investigators better understand why decisions were made, SQL Server Management Studio needs a Voice Store to capture what the team was trying to do in the moments leading up to the crash.

To help speed up accident investigations, we simply can’t afford delays. That’s why it’s also important that Voice Store be connected directly to YouTube, automatically uploading recordings to help the community get to the root cause analysis.

What’s a Size-of-Data Operation?

Database professionals say, “That’s slow because it’s a size-of-data operation.”

A size-of-data operation is when the database has to write all of the affected data again. What you’re asking for might seem small, but for the database, it causes a lot of work.

For example, let’s say you live in a strict household, and your parents have told us that under this roof, everyone’s hair can only be black, blond, brown, or red. (Don’t blame me. I don’t make the rules. They’re your parents, not mine. Go talk to them. And tell your mom I said hi.)

If your parents suddenly loosen up and say, “We’re now allowing you to dye your hair pink,” that doesn’t mean we have to touch everyone’s head. It’s a metadata operation: we’re just making a new note so that when someone walks into the door and we check their hair color, we have a new acceptable option.

However, if your parents freak out and say, “Everyone’s hair must be pink,” that means we’re gonna need to dye every person’s hair. We’re gonna have to gather everybody together and do some real work.

That’s a size of data operation: however many people live in the house, we’re gonna need to work on all of ’em.

How that translates into databases

I’ll use the Users table in a large Stack Overflow database to explain.

If I alter a table and add a new nullable column, it happens instantly, and SQL Server doesn’t have to touch any pages:

Adding a column like that is just a metadata change. It tells SQL Server, “When new rows walk into the house, they’re allowed to specify their HairColor now. When rows leave the house, if they don’t already have a HairColor set, just say it’s NULL.”

However, if I populate that column and dye everyone’s hair pink, that’s gonna take some time, because we’re gonna have to touch every row:

That’s a size-of-data operation, so it’s slower.

How that affects development & deployment

When you’re just getting started developing an app, and all your tables are empty, you don’t really have to worry about this kind of thing – because size-of-data operations are fast when there’s, uh, no data involved.

But the larger your data becomes, and the slower your storage becomes (like if you move from nice on-premises flash storage up to the cloud), then the more you have to pay attention to this. You’ll want to run your deployment scripts on a real production-sized database, not just an empty development shell, and time the deployment scripts. The longer an operation takes, the more likely it is a size-of-data operation. On larger databases – at 100GB and higher – and at 24/7 uptime shops like online stores, you’ll likely have to tune your deployment scripts to avoid size-of-data operations.

I haven’t seen a good rundown of all the operations that are size-of-data operations for Microsoft SQL Server, but it’s changed over the years. Hey, if you’ve got a blog – there’s a potential idea for you! Not me though. The hot tub is calling.

I Sat Down for an Interview with Forrest Brazeal.

Cloud Computing

Here’s how the interview started:

Forrest Brazeal: Brent, you’re a performance specialist who’s worked with massive SQL Server deployments; how does it make you feel when you hear cloud architects saying things like “relational databases don’t scale?” What do you wish they knew?

Eyes up here, kid
“Time for a smoke break.”

Brent Ozar: They’re right. Nothing scales worth a damn. Even stuff that looks easy, like Amazon S3 file storage, can run into scaling problems if you design it without an analysis of how AWS partitions reads, and how you have to watch out for your object prefixes.

But the thing is, nobody reads the documentation.

Read the interview here. I had a lot of fun with it.

Partitioned Tables Cause Longer Plan Compilation Times.


Folks sometimes ask me, “When a table has more indexes, and SQL Server has more decisions to make, does that slow down execution plan generation?”

Well, maybe, but the table design choice that really screws you on compilation time is partitioning. If you choose to partition your tables, even tiny simple queries can cause dramatically higher CPU times. Even worse, as the famous philosopher once said, “Mo partitions, mo problems.

We’ll start with any Stack Overflow database, create a numbers table, and then dynamically build a partition function that partitions our data by day:

Then we’ll build a partitioned copy of the Users table, partitioning them by CreationDates:

Let’s compare partitioned and non-partitioned performance.

I’ll create a non-partitioned index on both the Users and Users_partitioned tables. Note that even if I don’t specify partitioning on the Users_partitioned index, it still ends up partitioned by default, which is kinda awesome:

Now we’ll run an exceedingly simple query that only returns one row:

At first glance, the plans look the same – but notice how the non-partitioned table is 0% of the cost, and the partitioned table is 100% of the cost:

That’s because the non-partitioned estimated cost was way less than 0.001, and the partitioned query’s estimated cost is over 15. What’s worse, the compilation time, execution time, and logical reads are totally different – the top one is the non-partitioned query, and the bottom is the partitioned one, both cleaned up for readability:

Hubba hubba. Just compiling the partitioned table’s plan took 27ms of CPU time. I know what you’re thinking: “Brent, who cares about 27ms of CPU time?” Well, remember, this is an overly simple query! In real-life examples, it’s not unusual to see 250+ milliseconds of CPU time spent just compiling the plan – which means you can only compile 4 queries per second, per CPU core. That’s when plan cache pollution due to unparameterized queries really screws you.

I’m writing this particular post because I had a client who chose to partition every table in their data warehouse, regardless of size, and used exactly the same partition granularity on all sizes of tables to keep their dynamic ETL queries simple. Unfortunately, even querying simple configuration tables was taking 250+ milliseconds of CPU time just to compile a plan.

Fine – but what about non-partitioned indexes?

I’ll drop that partitioned index and create a non-partitioned index on the Users_partitioned tables – note that I have to specify ON PRIMARY as the filegroup for the partitioned table, or else any nonclustered index will by default automatically be partitioned as well – then try the queries again:

Now we’ll run an exceedingly simple query that only returns one row:

Now the estimated costs are neck and neck:

But that’s just an estimate – the only thing neck and neck here are the logical reads:

Designing and executing a query plan takes longer when any of the objects involved are partitioned.

This overhead isn’t large as long as you’re comparing it to the overhead of big data warehouse queries where partitioning data can reduce reads. But if you’re comparing it to smaller objects – say, 100GB or smaller rowstore indexes – that are queried more frequently, with more varying queries that require building new execution plans – then the overhead of partitioning starts to add up. Mo partitioned objects, with mo partitions in each object, mo problems.

Partitioning is one of those features where I never hear people say, “Wow, every which way I turn, partitioning has just made performance better!” Rather, it’s a feature where people keep saying, “I had no idea partitioning was going to cause me problems over THERE, too.”

In my Mastering Index Tuning class, we spend just enough time on partitioning to explain where it’s a good fit – but much more time on indexing techniques that are more appropriate for 99.9% of workloads out there.

Unusual Parameter Sniffing: Big Problems with Small Data

Parameter Sniffing

Normally when we think about parameter sniffing, we think about a tiny data vs big data problem: the tiny data’s plan goes in first, and when we try to process big data, performance is terrible.

But sometimes, it’s the exact opposite.

I’ll start with the large Stack Overflow database (I’m using the 2018-06 version, but any large one will do) and write a stored procedure to find the most recent posts authored by folks in a specific location:

To help SQL Server out – I’m a giver like that – I’ll create a few indexes to give SQL Server some choices:

Call it for tiny data first…

When we call the proc for a less-populated location first, SQL Server chooses to look up the people who live in Iceland (because there aren’t many), and then go find their posts. Small data finishes in under 1 second:

If we now run it for a larger location, like Germany, the query takes 24 seconds to run because we find so many people in Germany, and they’ve posted so many answers. (I’m sure they’re not asking a lot of questions. Germans are really smart.)

And the sort spills to disk because we only granted Iceland-size memory.

This is the typical parameter sniffing problem that people blog about: put the tiny data parameters in memory first, and SQL Server just isn’t equipped to deal with big data.

But if the big data plan goes in first…

Do things perform better? Let’s free the plan cache, then start with Germany:

This time around, SQL Server says, “Ah, Germany’s really common. You’re asking for these posts to be sorted by CreationDate – as a reminder, here’s your query and your indexes:”

“So since you’re asking for the top 200 by CreationDate descending, I’ll just scan the CreationDate index from newest to oldest. For each post I find, I’ll go look up the Users table to see where that user’s from. I bet I won’t have to look up too many before I’ll find 200 of ’em that were posted by Germans.”

And he’s right: he only had to look up about 20,000 of them before he found 200 that were written by Germans. Very cool.

But…when we run this for Iceland…

SQL Server has to read over a million Posts, doing over a million Posts key lookups, and over a million checks into the Users table before it’s found 200 posts that were written by Icelanders. Ten seconds isn’t so bad, but if you pass in an even rarer location, like the charming and easy-to-pronounce Hafnarfjordur, Iceland:

SQL Server again scans the index on Posts, doing a key lookup on Posts for every single row that it finds, and I bet that sounds goofy. Oh it gets worse: for all 40,700,647 Posts, it does a clustered index seek against the Users table. We do over 124 MILLION logical reads on a table that only has 143K pages in it. We read the Users table 871 times over, and the Posts table 6 times over due to all the key lookups:

That sure is a lot of work – would be nice if SQL Server parallelized that query across multiple threads. No can do, though – the query plan was designed with Germany in mind, when SQL Server thought it’d find rows quickly. The whole time this query runs, it’s just hammering one of my CPU cores while the rest sit idle:

There is no one good plan here.

I always laugh when I see demos that talk about “the good plan” and “the bad plan.” I wish the real world was so simple, so black-and-white. In reality, queries have many possible plans, and many of ’em just don’t work for other sets of parameters.

SQL Server 2017’s “Automatic Tuning” does nothing here, either: it can try throwing a different plan in thinking there’s been a regression, but it’s not smart enough to pick different execution plans for different parameters. It picks one plan and tries to make it work for all of ’em – and that simply doesn’t work here.

It’s up to us to fix it, folks.

That’s why I teach my Fundamentals of Parameter Sniffing and Mastering Parameter Sniffing classes. Folks with a Live Class Season Pass can start watching the Instant Replays whenever they want, or drop in live. I’m teaching a bunch of classes over the next few weeks:

See you in class!

My Live Streaming Setup for 2021: Iceland Edition

Home Office, Streaming

Erika and I moved to Iceland for a while. For those of you who don’t follow my personal blog, here’s why we moved, and how we got here with a teleworker visa. You can spy on our adventures via my Instagram, Facebook, or Twitter.

We’re working half the time while we live here, and sightseeing the other half. It’s easy to telecommute from Iceland because it’s one of the most connected countries in the world. Most houses have symmetric gigabit fiber available – even though we’re extremely isolated, we still have great bandwidth:

This isn’t a permanent move – probably just 6-12 months – and we’re doing some traveling through the country, so I didn’t want to pack my full home office setup with my Mac Pro and big monitors. I downsized my streaming setup, and I figured some of y’all might find it interesting. It’s closer to what most folks would use for remote presenting.

Laptop: MacBook Pro 16″ – the fastest Mac laptop available today, a 2.4GHz 8-core Intel Core i9, 64GB RAM, but…it’s a long, long way from the 16-core Mac Pro I usually use. When I live stream, I really put the CPU fans to work. I was tempted to try one of the new MacBooks with the Apple Silicon M1 processor, but I’m holding out for the higher-horsepower versions that will surely follow. I brought my Keychron K8 keyboard and an Apple Magic Touchpad at the last minute at Erika’s insistence – I was trying to pack lighter, heh.

External monitor on right: 15″ generic 1080p. Normally for monitors, you want high resolution, but when you’re streaming, you actually want a plain 1920×1080 because that’s the max size you’d stream out to your audience. The monitor has inputs for USB-C, HDMI, Mini DisplayPort, etc, but the nice thing is that it can use a single USB-C cable for both power and for the display signal.

When I’m streaming, the MacBook Pro’s high resolution display has Slack (for attendee chat & questions) and OBS’s control panel and preview. The 15″ 1080p monitor is what’s showing to the audience, so I’ve either got SSMS on there, or a PowerPoint.

I usually teach while standing up, but I’m trying the dining room table to see if I can make that work. If it turns out I just absolutely gotta stand, we’ve got a high bar table picked out at a local furniture store – if necessary, we’ll grab that and leave it in the rental house when we leave.

Cameras: two Sony A6100s each plugged into Elgato Cam Link 4K: in my normal home studio setup, I plugged the Sonys into a Blackmagic Design HDMI recorder, but that’s a PCI Express card. The Elgato is an easier/smaller USB solution for laptops. I do prefer a two-camera setup, alternating between a head-on camera and a from-the-side camera for casual discussionss. I still use my iPhone with the NDI camera app as a scenery camera during the breaks – that’s good for breaks, but not for a main camera because it has unpredictable latency, meaning your speech and moving lips can drift in & out of sync, and I can’t stand that.

Microphone: DPA In-Ear Broadcast Headset plugged into a Focusrite Clarett. I’m only reusing these because they’re part of my desktop setup, so I don’t need to buy them again, but they’re overkill for most streaming setups. I have a Rode VideoMic GO ($100) on-camera microphone as a backup, but the sound on that isn’t nearly as good since it also picks up echoes from the room, keyboard noises, laptop fan, etc.

Customizable keyboard control: Elgato Stream Deck. LCD keys so you can configure what they show & do, like switching between different cameras. Could you do this by memorizing a bunch of hotkeys? Probably, but as you start to rely on more advanced OBS functionality, like playing sound effects, this will come in super handy.

USB-C dock: CalDigit TS3 Plus: if you count the above devices, that’s 5 high-speed USB ports right there, plus Ethernet. The MacBook Pro only has 4 USB-C ports, plus it needs one for electric power, so I needed a USB-C dock. The CalDigit is by far the most stable dock I’ve found – a lot of USB-C docks flake out during sustained 4K streaming with multiple cameras.

Tripod mounting: Manfrotto Magic Arm plus Super Clamp: I could theoretically clamp these directly to the dining room table that I’m using for a desk, but they’d shake when I jostle the table. Instead, I brought a couple of tripods, plus these Magic Arms so I can mount multiple devices to the tripod, like a camera plus lights.

Green screen: collapsible with stand: at home, I used a wall-mounted pull-down green screen, but obviously that’s not gonna work on the road. This folds up, and I’ll have it behind me while I’m teaching. I expect to be teaching sitting down while I’m on the road, but I’d be completely delighted if I could put together a setup where I could continue standing & teaching. The box it shipped in was exactly the max dimensions for standard airline checked luggage. It collapses, but not small enough to fit in a regular checked bag – it’s still an odd shape.

Carrying the electronics: Pelican 1620 case: on one of our Iceland flights, the airport crew unloaded the baggage during a serious rainstorm. Our bags and their contents were all completely soaked with water. We didn’t have any electronics in ’em, thank goodness, but for this trip, I wanted something waterproof that I could check as luggage. (It won’t have any lithium batteries in it – I take those in my carryon.)

If you wanna see the results and you’ve got a Live Class Season Pass, hop into my live class page during any of my classes this month:

See you in class!

Can You Get Parameter Sniffing on Updates and Deletes?

Execution Plans

Sure you can – parameter sniffing is everywhere. Anytime that you have one piece of code that needs to handle a varying number of rows, you’re probably gonna have to worry about parameter sniffing.

I’ll start with any Stack Overflow database and run a couple of non-parameterized update statements. I’m using literals here, not parameters:

These two statements have two very different actual execution plans. The first plan, updating just Brent, is very simple. Reading from right to left, we scan the clustered index to find all the Brents, and then we update the clustered index:


Except…the simplicity is a vicious lie. We’re updating LastAccessDate, and before running this update, I’d already created half a dozen indexes:

Every one of ’em includes LastAccessDate. SQL Server doesn’t have the concept of asynchronous indexes, so all of the indexes have to be updated whenever we change their contents. Hover your mouse over the clustered index operator to see the truth: SQL Server’s updating those, too.

The second plan is different because there are a lot of Alexes at Stack Overflow – Alex is one of the most common names in the database. Read the second plan right to left, and the first thing SQL Server does is find & update all of the Alexes in the clustered index:

But since we’re updating a lot of rows, SQL Server changes the way it approaches updating all of these nonclustered indexes. After the clustered index operator, keep reading…

For every index, SQL Server sorts all of the Alexes in that order so we can find ’em faster in the index, and then updates the index. It also does this process one index at a time, serially, in order, for Halloween protection.

These are called narrow and wide plans.

Paul White is both narrow and tall, but his Query Buck is wide and short

Brent’s plan, which does all of the updates in a single Clustered Index Update operator, is called a narrow plan.

Alex’s plan, which breaks out the work for each index and does them one at a time, is called a wide plan.

No, that doesn’t make any sense to me either. They should be called short and tall plans, because frankly, Alex’s plan is the wide one. I don’t make the rules, though, Paul White does. Take it up with the Kiwi. Even when his rules are wrong, they’re still right. Or maybe they just feel that way.

Put this in a stored proc, and you’ve got parameter sniffing.

Here’s our stored procedure to handle updates:

And now depending on which parameter runs first – Brent Ozar or Alex – we’ll get a different execution plan.

When Brent runs first, we get the narrow plan, so:

  • Brent does 143,688 logical reads and has no worktable
  • Alex does 352,260 logical reads, has no worktable, gets a 72KB memory grant, and the clustered index scan is single-threaded

And when Alex runs first, we get the wide plan, so:

  • Alex does 553,530 logical reads – more than 3 times the number of pages in the entire table! – plus another 41,339 on a worktable, gets a 5MB memory grant, and the clustered index scan goes parallel (although the rest of the plan is serial because that’s how SQL Server does it)
  • Brent does 143,730 logical reads, plus 17 on the worktable

The differences in this case aren’t all that bad, but keep in mind that I’m dealing with just a 1GB table – the smallest one in the Stack Overflow database. The larger your data becomes, the more you have to watch out for this kind of thing.

That’s why I teach my Fundamentals of Parameter Sniffing and Mastering Parameter Sniffing classes. Folks with a Live Class Season Pass can start watching the Instant Replays whenever they want, or drop in live. I’m teaching a bunch of classes over the next few weeks:

See you in class!

“But Surely NOLOCK Is Okay If No One’s Changing Data, Right?”

Some of y’all, bless your hearts, are really, really, really in love with NOLOCK.

I’ve shown you how you get incorrect results when someone’s updating the rows, and I’ve shown how you get wrong-o results when someone’s updating unrelated rows. It doesn’t matter – there’s always one of you out there who believes NOLOCK is okay in their special situation.

The latest NOLOCK defender left me an angry comment:

I challenge you to show us one example of using nolock on a static database with incorrect results.

No problemo – I assume you mean tables that aren’t getting inserts, updates, or deletes. Well, folks can do index reorganize operations during business hours – it’s a completely online operation, even in Standard Edition – and that hoses NOLOCK too.

I’ll create a table and populate it with 100K rows, but do it in a way that makes it highly fragmented:

Then, in one window, I’ll reorganize the index:

While in another window, I’ll run a COUNT(*) repeatedly:

And presto: the results change, going up and down as the pages are moved around:

Even if you think no one’s doing index maintenance at your shop, the answer is still clear here: if you want to build safe, reliable code that produces accurate query results at all times, NOLOCK should not be involved.

I get why you use NOLOCK. You read a blog post somewhere once that said, “NOLOCK is faster because it does allocation scans instead of navigating those big hefty index pages.” But you never actually bothered to measure whether that’s a significant problem for you – and you never bothered to ask your users if they need accurate query results.

Sometimes, you really don’t need accurate query results, and in that case, NOLOCK can be fine. For example, one of my clients runs a monitoring query every few seconds just to see if any new carts have been started in their online store. They get the latest row, check to see whether it’s higher than the previous check, and if it is, they’re happy campers. If it isn’t, they don’t mind checking again to see whether they got bad data, and they check a couple of times before they sound an alarm. They don’t need accuracy on exactly how many carts have been started – they just want a rough idea that the latest value is going up. NOLOCK works fine for their needs.

But for the rest of y’all, when you’re ready to do things the right way, I welcome you to my SQL Server training classes.

You should probably take one of them quickly, before your boss reads this post and starts asking you why you were so vigorously defending your constant use of NOLOCK.

Free Webcast: Help! My SQL Server Maintenance is Taking Too Long!

Eyes up here, kid
“Time for a smoke break.”

You manage growing SQL Server databases with shrinking nightly maintenance windows. You just don’t have enough time left each night to do the necessary backups, corruption checking, index maintenance, and data jobs that your users and apps want to run. Cloud storage isn’t helping the problem, either.

Stop playing Tetris with your job schedules and step back for a second: are we doing the right things, at the right times, with the right SQL Server configuration?

In this session, I will give you three queries to run on your SQL Server to make sure you’re focused on the right problems. We’ll figure out how fast your backups are really going (and why), whether nightly index maintenance actually makes sense for you or not, and whether your hardware is suffering from serious performance issues.

Register now for the free webcast on Tuesday, February 16th. 

What SQL Server Feature Do You Wish Would Go Away?


I asked a two-part question on Twitter: if you could only remove one feature from SQL, why would it be cursors?

The responses were great:

Justin’s referring to MERGE’s hot mess.

But out of all of ’em, this one intrigues me the most:

How about you?

Query Hints You Can Use to Avoid Blocking

No, not NOLOCK. You get the wrong query results, even when your query results aren’t supposed to be changing.

I’m talking about when you need to do writes, but you want your query to get along with others in high concurrency environments, without creating a blocking firestorm.

SET LOCK_TIMEOUT 1000 – if you run this before your query starts, SQL Server will wait patiently for X milliseconds before giving up and aborting your own query. This is useful when a lot of short queries are running all the time, and I don’t want to give up instantly – but I do want to give up if there’s a long-running query blocking me. If you pass in a value of 0 milliseconds, SQL Server gives up instantly. For example, in the below query, I’m trying to delete all of Jon Skeet’s posts, but I want to abandon my query if anyone has locks that stop me on either the Users or Posts tables, and those locks stop me for over 1 second:

If you use this technique, or the other techniques described in this post, your query will fail with an error if it was blocked by someone. Most of us (myself included) don’t do a great job of checking for errors and retrying our query gracefully, so we could probably all use an hour to dig into Erland Sommarskog’s series on error handling to learn how to retry automatically.


WITH (NOWAIT) – this is kinda like setting lock_timeout 0, but it’s a table-level hint like NOLOCK. This means it only affects a single table in your query. For example, in the below query, I’m trying to delete all of Jon Skeet’s posts, but I want to abandon my query if someone has a lock on the Users table – but I’m willing to wait forever for locks on the Posts table:


SET DEADLOCK_PRIORITY LOW – this isn’t about blocking, but specifically deadlocks. When I set this at the beginning of my batch, I’m saying, “I’m willing to wait for blocking, but if a deadlock situation comes up, go ahead and kill my query, because I don’t mind trying it again.” You call it like we did LOCK_TIMEOUT:


WAIT_AT_LOW_PRIORITY – this one is just for index maintenance, and sadly, it doesn’t work for creates or drops. “Online” index rebuilds aren’t completely online: they need a brief schema modification lock to swap in the new copy of the index. This hint lets your completed index lurk patiently in the background while other queries finish, and then swap in later:

If that last one is helpful, then your next step is to do similar blocking avoidance with DML operations by using Kendra Little’s post on which operations support online & resumable.

Updated First Responder Kit and Consultant Toolkit for January 2021

You know how a lot of app release notes say “bug fixes and improvements”? We’re not gonna lie: this one’s all bug fixes. Huge shout out to the community contributors for catching stuff and contributing fixes.

New behavior for SQL Server 2008 & R2 users: sp_BlitzLock has never worked on SQL Server 2008, but starting this month, the installation of sp_BlitzLock simply fails with an error. In this month’s release, one of the checkins uses THROW, which isn’t available on 2008. This also means that if you use the Install-Core-Blitz-No-Query-Store.sql script to do installations, you’re going to get an error. It’s okay – all of the other stored procs install successfully. This behavior wasn’t intentional by any means, and we’d like that not to happen. If you’d like to improve sp_BlitzLock’s installer to bail out gracefully without an error, we’d gladly accept a pull request for that.

How I Use the First Responder Kit
Watch and learn

To get the new version:

Consultant Toolkit Changes

To help troubleshoot plan cache pollution due to unparameterized queries, there’s a new tab for the top 10 queries with the most duplicated plans in cache. It’s based off this troubleshooting query. We also capture the top 3 plans for each query, sorted by duration descending.

This does mean a new querymanifest.json and spreadsheet. If you’ve customized those files, you’ll need to copy your changes into the new querymanifest.json and spreadsheet if you want to get the new tab.

sp_Blitz Changes

  • Fix: standardized the @Help = 1 behavior across this & all of the procs. When you run ’em with @Help = 1, they return the help info and immediately return. (#2717, thanks Todd Chitt.)
  • Fix: offline databases no longer show up in auto-shrink or page verification checks. (#2750, thanks Erik Darling and Ali Hacks.)

sp_BlitzFirst Changes

  • Improvement: if you turn on @Debug = 1, it now raises “Running CheckID #___” messages in the messages tab in SSMS to help you debug. (#2739, thanks Adrian Buckman.)
  • Fix: fix warning typo on queries with max memory grant and clarified that the query’s memory grant is a percentage of max workspace. (#2732, thanks Ali Hacks.)
  • Fix: forwarded fetches in table variables now show that it’s a table variable rather than a temp table. (#2742)
  • Fix: forwarded fetches in tempdb objects were reporting the cumulative number of fetches since the object was created rather than the forwarded fetches created during the sp_BlitzFirst sample. (#2743)

sp_BlitzIndex Changes

  • Fix: if a missing index recommendation included columns whose names included ampersands, sp_BlitzIndex was failing with an error. (#2734, thanks snckirkmarken.)

sp_BlitzLock Changes

  • Improvement: now runs better without sysadmin or db_owner permissions by wrapping an update statement in a try/catch block. (#2751, thanks Chad Baldwin.)

sp_ineachdb Changes:

  • Fix: case sensitivity bug introduced in last month’s version. (#2737, #2744#2747, and #2748, thanks Jefferson Elias and sm8680.)
  • Fix: GO was removed at the end of last month’s version, which broke the Install-All-Scripts.sql installer when running on brand-new servers. (#2753, thanks TLSystemsLLC.)

sp_BlitzWho Changes

  • Fix: we’re supposed to sort queries from longest-running to shortest-running, but last month’s version introduced a bug in that. This should fix it. (#2721, thanks Erik Darling and Ali Hacks.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

Watch Us Working On It

I live streamed while I was working on the forwarded fetches fixes for sp_BlitzFirst:

We Lost Gareth Swanepoel.


Thirteen years ago, I met Gareth Swanepoel (@GarethSwan) for the first time, interviewed him for his first real sysadmin job, and … turned him down.

Gareth Swanepoel circa 2007

Thankfully, my manager overruled me and hired him anyway.

Thank goodness – and thank goodness Gareth forgave me for my incredibly bad judgment, hahaha.

Gareth proved to be one of the hardest-working, most-determined people I’d ever met. He was endlessly curious – it wasn’t enough to solve a problem – he wanted to know how the thing worked, whatever it was. He wanted to know everything about hardware, operating systems, networking, science, and eventually, databases.

When I left Southern Wine & Spirits, Gareth had the “good fortune” of cleaning up the messes I’d left behind. He grew to love databases, moving on to Pragmatic Works, then teaching classes to share what he knew, then moved on to the big goal that most SQL Server people dream of accomplishing: he got a job at Microsoft.

Along the way, we shared so many laughs, beers, and meats. When Tim Ford & I launched SQLCruise, Gareth immediately wanted to do a SQLBBQ at his house, cooking the pre-cruise party, sharing his pool, and introducing everyone to his wonderful family.

To know Gareth was to smile and to share.

Gareth greeted everyone with incredibly wide smiles, and he wanted to know everything about you. If you knew him, you surely witnessed him nodding excitedly and saying, “Yeah?” He wanted to encourage you to go on about whatever you were interested in, and he wanted to develop the same interests too. He wanted to know what made you happy, and it made him happy, too.


I have so many great memories with Gareth. The last time we were hanging out, we were racing a Porsche around in the Nevada desert and eating a crazy dinner with Ryan & Andrea Allred. He said yes to everything, and the joy that he found everywhere was inspiring.

Gareth brought the best parts of childhood into adulthood. He was endlessly curious and excited. From his Lego collection to his love of Disney to his passion for databases, he lived life to its fullest.

Sadly, we lost Gareth last week due to COVID-19.

The data community was so lucky to have him while we did. He was an incredible blessing to those who knew him, and he touched so many of us with his excitement and generosity.

There’s a GoFundMe to help his two children get started on the right foot, and you can watch his memorial service, part 1 and part 2. Gareth’s sister begins speaking at 48 minutes into part 1.

Which Microsoft Certification Should You Get?

You work with data, and you’re thinking about getting a Microsoft certification. Which one should you get?

If you need the certification to get a job…

Then get the certification you see mentioned the most often in the descriptions of the job you’re looking for.

This is just a little tricky because certifications shown on jobs often lag behind by a couple/few years. The HR department may just have copy/pasted the job requirements from the last job ad they posted. That certification may no longer even be offered by Microsoft. (Microsoft has a tendency of reinventing their certification lineup every 2-3 years, throwing the whole thing into the SlapChop for a remix, seemingly just to make their customers spend more money on new certs while learning more about Microsoft’s cloud products.)

I gotta tell you that the certificate absolutely, positively will NOT get you the job. It’s just that when you’re applying for a publicly posted job, you’re competing with a huge, huge stack of wildly unqualified applicants. In order to cut down the stack of resumes, the hiring department will use any kind of filter they can think of, and often, that’s going to be whether someone mentions a Microsoft cert number on their resume.

I don’t agree with that strategy, but I understand why they have to do it.

Long term, though, I’d say that a better strategy for getting a job is to start building your brand online. Start writing blog posts and delivering presentations, and after a couple/few years, you’ll have people coming to you with jobs. You won’t have to worry about standing out in a stack of resumes because you’ll be the only candidate. It takes time, but it’s a much better long term gamble than trying to stand out in a stack.

If you need the certification for your current job…

Sometimes companies require their staff to show that they’re continuously improving, and one way they measure it is by seeing you pass industry tests.

In that case, get whatever your manager or HR department wants you to get.

If they don’t care which one you get, then get the oldest one that’s relevant to your job. That might sound counter-intuitive: you might think you want the latest and greatest cert. The problem is that the new certs will have less training material available. If you’re only trying to check a box for your HR team, an older cert with tons of free YouTube prep videos and cheap PDFs will be easier to pass.

If you want the certification for personal growth…

Then get the newest certification that calls to you personally. It’s going to have less established training material available, but that’s okay: you’re doing this for personal growth anyway.

The best personal growth doesn’t come from reading a book with a bunch of canned test questions and answers. The best growth comes from understanding what the test is trying to measure, and then building your own experience working with that tool. Make a list of the tools and techniques you want to experiment with, and then check out Microsoft’s SQL Workshops repository on Github. It’s full of free self-learning resources for people like you who want to grow their own skills.

No matter why you want the exam,
just take it first.

You have a serious case of impostor syndrome: you believe you’re not qualified to pass the test. Because of that, you’re going to fill yourself with self-doubt as you prep for the exam, and you’re going to think the exam is incredibly challenging. It’s not – it’s just a test. It’s not the final boss of your life’s video game. After all, I bet you’ve met plenty of people who were supposedly certified, but couldn’t solve the most basic problems at work.

Go take it first. You’re going to fail, but you’ll be comfortable with that failure – because you didn’t expect to pass. Then, you’ll have a better feeling of what the test-taking process is like, the level of detail in the questions, and how much work you need to put into studying.

Go get ’em!

Want to Avoid Deployment Downtime? Replication Probably Isn’t the Answer.


You’re tired of taking outages to deploy changes to your database schema.

Wouldn’t it be cool to have two different SQL Servers so that you could:

  • Point your customer traffic at ServerA
  • Apply your deployment scripts to ServerB (which isn’t taking any customer traffic)
  • Point your customer traffic over to the newly deployed ServerB
  • Apply your deployment scripts to ServerA (which isn’t taking any customer traffic)

Presto! All your servers are up to date and in sync, right? Well, no, not in SQL Server at least.

Illustrating the problem with a simple schema change

We’ll use the Posts table in the Stack Overflow database as an example:

The Posts table holds questions & answers. In the dark days of the original design, the Tags column held up to 5 tags to describe a question. In our new design, we’d like to break Tags out to a separate child table, one per row.

If we only have one SQL Server, our deployment script looks like this:

  1. Create a new PostsTags table:
    CREATE TABLE dbo.PostsTags (PostId INT, Tag NVARCHAR(40));
  2. Shut down the app
  3. Populate that table (which is going to require more work than I’m prepared to type out here)
  4. Alter the stored procedures that insert/update rows in dbo.Posts so that they now populate the new dbo.PostsTags table
  5. Start the app back up again

Why shut down the app? Because people can still be inserting & updating rows in Posts while steps 3 & 4 are happening. If I don’t shut down the application, I’ll need to do some kind of additional coding steps afterward to detect the dbo.Posts rows that changed while steps 3 & 4 ran, and make the appropriate changes in dbo.PostsTags.

There are absolutely ways around this problem if you have just one SQL Server:

  • Deploy the dbo.PostsTags table early, and modify the code to keep BOTH the dbo.Posts.Tags column and the dbo.PostsTags table in sync with new inserts/updates, or
  • Write triggers to keep both the column and the table in sync, or
  • Write another process (like SSIS, ADF, or your own custom app) to keep them in sync

But none of those are built into SQL Server. They’re all additional development work, not traditionally considered part of database administration.

Will replication solve this?

The line of thinking is that if we had TWO SQL Servers, then we could just make these changes to a server while it isn’t facing any user traffic:

  1. Create a new PostsTags table:
    CREATE TABLE dbo.PostsTags (PostId INT, Tag NVARCHAR(40));
  2. Populate that table (which is going to require more work than I’m prepared to type out here)
  3. Alter the stored procedures that insert/update rows in dbo.Posts so that they now populate the new dbo.PostsTags table

All the while that these changes are happening, some other SQL Server somewhere is handling all the application traffic – so we don’t have any data consistency problems, right?

Wrong: that other SQL Server is still getting inserts & updates to the Posts table. Replication only keeps the Posts table in sync: it doesn’t translate changes between the Posts table and PostsTags.

Replication COULD be used to translate changes between two different database schemas – but that’s up to you. It’s not as simple as checking a box inside replication. You have to write your own custom replication stored procedures, and that’s left as an exercise for the reader.

And if you’re willing to do that…you don’t need replication. Remember back in the last section how I said there were absolutely ways around this problem if you only had one SQL Server? Go back to that section – those options are simpler than introducing replication.

Replication also makes high availability way harder.

That sounds backwards for folks who grew up with other database platforms where replication is a valid solution for high availability and disaster recovery, hahaha.

However, SQL Server’s high availability and disaster recovery mechanisms rely on the transaction log. ServerA and ServerB will both have completely different transaction logs – after all, they’re taking different changes at different times. (One gets live customer data, then gets patched – the other is happening in the opposite order.)

Because they have different transaction logs, you have to protect them each individually. If you’re using Always On Availability Groups for HA/DR protection, that means you would have two separate Availability Groups, likely each in their own set of servers. (You can’t use the same servers if you want to use just one database name: database name has to be unique per server.)

So you’d end up with a convoluted set of replicas for both ServerA and ServerB. You’ve just doubled your server count, patching, and disaster recovery work.

Note that I didn’t say it was impossible.

You could absolutely do all this – and I’ve had one client in the past who actually did. They were even able to reduce their (perceived) outage windows because:

  • Their deployment scripts were extremely disk-intensive: they liked to rewrite entire tables to delete or reorder columns
  • Their storage sucked, so big logged changes took forever
  • They didn’t mind replication getting way out of date (like hours) – as long as users could see any data on the site, they didn’t care if it was accurate or not, so the replication lag during deployments was tolerable
  • They simply didn’t bother with high availability or disaster recovery for the replication systems – they snapshotted entire servers daily, and they were willing to lose a day of data
  • They were willing to massively staff up to write their own custom replication stored procedures – they had a team of 5-6 people who just wrote replication stored procs for deployments (because they were a mature application with hundreds of tables, each of which required their own custom procs when they made schema changes, and Lord knows they loved to make schema changes, but they wanted the replication procs to be extremely well-tested)

You can do all that. Frankly, though, I just wouldn’t bother – use the single-server methods instead, or standardize on only doing additive, not destructive, changes.

Are You Underpaid? Find Out in the Data Professional Salary Survey Results.


We asked what you make, and 1,747 of you in 69 countries answered. Altogether, you made $171,879,034 this year. Hubba hubba, there’s some earning power in this audience.

Download the Data Professional Salary Survey results (XLSX).

A few things to know about it:

  • The data is public domain. The license tab makes it clear that you can use this data for any purpose, and you don’t have to credit or mention anyone.
  • The spreadsheet includes the results for all history since 2017. We’ve gradually asked different questions over time, so if a question wasn’t asked in a year, the answers are populated with Not Asked.
  • The postal code field was totally optional, and may be wildly unreliable. Folks asked to be able to put in small portions of their zip code, like the leading numbers.
  • Frankly, anytime you let human beings enter data directly, the data can be pretty questionable – people put in questionable units of measure for salary, and we discarded a few responses who were obvious inconsiderate trolls about their gender.

I did some quick slicing and dicing, focusing on SQL Server DBAs in the United States, and deleted the top & bottom 10 outliers (because they were kinda bananas.) Looks like DBA salaries are up again this year:

With the wildly turbulent year we just had, I was wondering if y’all would change your career goals. You said, that in 2021, your plans are to:

And that mix almost identically mirrors last year’s responses, so it looks like the pandemic didn’t change your plans too much. 2/3 of you usually plan to stay in the same company doing the same thing.

What’s the newest version of SQL Server you have in production?

The numbers are way up for SQL Server 2019 this year, up from 11% last year. That’s great! Now, on the flip side, what’s the oldest version you have in production?

Oof. About 1/3 of y’all are still supporting something that Microsoft refuses to support. You’ve got about another year and a half on SQL Server 2012, but that still means you want to start making your transition plans this year. Interestingly, the numbers for 2008 & 2008R2 didn’t drop all that much: last year, 40% of you were running those.

Hope this helps make your salary discussions with the boss a little bit more data-driven, and hope it helps justify keeping your salary competitive. If you’ve got questions about the data, go ahead and jump in – download the results in Excel. Don’t ask me to slice and dice the numbers for you – you’re a data professional, remember? Get in there and analyze it to answer your own questions. You’ve got this!

Your Ten Favorite Blog Posts from 2020

Company News

The most popular posts this year were when I ran my Fundamentals classes for free throughout the year. I ran a different class per month, and posted signups and videos here on the blog. I’m setting those aside, though, because they’re not evergreen: they’ve since disappeared since I only ran ’em free for a limited period of time.

My favorite new pic by Eric Larsen

Here are the still-present posts I wrote this year that y’all visited the most:

Over the course of the last couple of years, I’ve been writing more “how do I ___?” posts – and to understand why, here are the posts you visited most in 2020 regardless of when we wrote ’em:

“How to” posts are particularly evergreen: as long as they’re about long-lasting pain points, they’re the kinds of things you can write and then gradually get more and more viewers over time. It’s also the key to more subscribers: as readers stumble across your stuff repeatedly in Google, they’re more likely to subscribe to your work because they know you write useful stuff.

[Video] Office Hours: SQL Server Career Q&A, Part 3

Professional Development, Videos

Here’s the last video in my vacation series with the answers to your most-upvoted professional development questions. Enjoy!

I won’t be taking new questions in the comments, but I’m leaving the comments open on this post so that y’all can discuss the video together with each other. Just be mindful that your name will be publicly visible, and if you use your real email address, your Gravatar icon will show up next to your comment. If you have something you want to say anonymously, best to use a throwaway name/email on this one.