Writing Diagnostic Queries is Hard Because SQL Server Still Has (Tiny) Bugs.


When I first got started working with SQL Server, I assumed it was flawless because it came in a sealed box from a ginormous corporation. I figured they’d tested every edge case, and that there was no way things were gonna go sideways.

These days, even though I read CU KB articles and giggle about their contents over at SQLServerUpdates.com, I’m still really happy with the quality of SQL Server. There are bugs, sure, but there are a lot of people doing really crazy stuff with databases and expecting it to work – even when there’s no way someone could possibly have thought to test that particular combination of craziness.

So when I run into a problem, I start by assuming it’s a problem with my code, because let’s be honest: it usually is. Almost always. 99.999% of the time, probably. (Hey, whaddya know, I’m achieving five nines.)

But every now and then, it’s actually SQL Server.

And when it is, I try to boil the problem down to the simplest possible code that anyone can walk through in a matter of seconds on their own machine in order to see the problem. I want someone at Microsoft to be able to highlight the code, copy it, paste it in, and go, “Oh whoa, wait a minute, he’s right.”

Here are two examples.

sys.dm_db_persisted_sku_features ignores isolation level requests.

This one bit me in sp_Blitz: I had a client that was rebuilding large columnstore indexes, and whenever they did, their job to log sp_Blitz to table would get blocked, and cause cascading blocking issues in other sessions. At first I thought c’mon, no way – I am really, really religious about using dirty reads in our procs because this ain’t financial data, and I’m fine with inaccurate results here under heavy contention. But blocking? That’s bad.

Here’s how it happens. In one session, create a clustered columnstore index:

Then in another session, in the same database, try to query that DMV:

I know, those two isolation level hints are redundant – I’m just proving the point that it’s really a bug. It gets blocked, which shouldn’t happen. Other DMVs don’t work this way.

Once I had it boiled down to that simple repro script, I filed a bug report with Microsoft, and added code in sp_BlitzIndex to simply skip that DMV by default. (You can change that by setting @SkipBlockingChecks = 0.) I wasn’t sure how quickly Microsoft would fix the bug, or how far back they’d backport the fix.

It’s been a month now without a word from Microsoft on the bug report, so I don’t hold out a lot of hope for that one. (It doesn’t help that I accidentally flagged the idea as inappropriate just now, and I have no way to unflag it, HAHAHA. Man, I *really* hate the user interface for feedback.azure.com.)

sys.identity_columns has the same problem.

Similar repro:

Then in another window:

That gets blocked, and I filed a bug report for that one too, and a Github issue for sp_BlitzIndex that I’ll work on at some point, but that one’s less urgent since you can’t (currently) set up Agent jobs to log sp_BlitzIndex to table. This one’s going to be a lot more challenging though, as you can read about in the Github issue – the workaround’s going to have to be a lot more ambitious, I think.

Are these big bugs? Not at all – but if you’re writing diagnostic queries like sp_Blitz and friends, they can turn into hours of troubleshooting trying to figure out what’s going wrong – especially given how widespread these scripts are these days.

Previous Post
Announcing Weekend Versions of My Mastering Classes
Next Post
How to Think Like the Engine: Index Column Order Matters a LOT.

15 Comments. Leave new

  • You think a database snapshot doesn’t change? Guess what?

    When a table has a rowversion column, the updated rowversion values bleed through into the snapshot database.

  • These DDL statements are wrapped in a transaction and needs to be committed or rolled back for SQL to know the exact values to return from the DMV’s. DMV results are dependent on DDL statements. i don’t see this as a bug. However I might be wrong

    • Uche – the rest of the DMVs allow for dirty reads even when indexes are being created. For example, try these demos with sys.indexes, and you’ll notice that it allows for dirty reads. The whole point of dirty reads is to not need exact values to return.

      I wouldn’t have a problem if none of the DMVs allowed for dirty reads – it’s just that the vast majority do (and that’s how monitoring systems work.) Only a few DMVs, like these two, don’t. That’s why it’s a bug.

      And you’re wrong. 😉

  • A business impact statement helps Microsoft prioritize fixes, as does a Premier account’s technical account manager. Otherwise they must guess at the BI…

  • gabriele d'onufrio
    November 19, 2019 11:58 pm

    Can I buy privately my “Mastering Query Tuning” although I live in europe?

    • Reply
      • Wow, that post was from 2 years ago and things still haven’t panned out. I haven’t been follow the fall out from the GDPR. It would be interesting to know how that legislation has been enforced so far.

        I wonder if people can get around it by using bit coin and signing up anonymously. This type of regulation comes from the idea that “All businesses are bad and are out to screw the worker. And all governments are good and are out to help the people.” mentality. Of course, in reality everything is gray.

  • Michael J Swart
    November 20, 2019 5:52 am

    One bit of this post I’m going to paraphrase and steal the next time someone complains about one of our own escaped defects:
    “Hey, didn’t you guys test this stuff?”
    “We did, but there’s no way someone could possibly have thought to test that particular combination of craziness.”

  • By accidentally flagging as inappropriate you might get someone to review it more quickly 😉

  • Loved your opening sentence.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.