Writing Diagnostic Queries is Hard Because SQL Server Still Has (Tiny) Bugs.

Last Updated November 19, 2019

When I first got started working with SQL Server, I assumed it was flawless because it came in a sealed box from a ginormous corporation. I figured they’d tested every edge case, and that there was no way things were gonna go sideways.

These days, even though I read CU KB articles and giggle about their contents over at SQLServerUpdates.com, I’m still really happy with the quality of SQL Server. There are bugs, sure, but there are a lot of people doing really crazy stuff with databases and expecting it to work – even when there’s no way someone could possibly have thought to test that particular combination of craziness.

So when I run into a problem, I start by assuming it’s a problem with my code, because let’s be honest: it usually is. Almost always. 99.999% of the time, probably. (Hey, whaddya know, I’m achieving five nines.)

But every now and then, it’s actually SQL Server.

And when it is, I try to boil the problem down to the simplest possible code that anyone can walk through in a matter of seconds on their own machine in order to see the problem. I want someone at Microsoft to be able to highlight the code, copy it, paste it in, and go, “Oh whoa, wait a minute, he’s right.”

Here are two examples.

sys.dm_db_persisted_sku_features ignores isolation level requests.

This one bit me in sp_Blitz: I had a client that was rebuilding large columnstore indexes, and whenever they did, their job to log sp_Blitz to table would get blocked, and cause cascading blocking issues in other sessions. At first I thought c’mon, no way – I am really, really religious about using dirty reads in our procs because this ain’t financial data, and I’m fine with inaccurate results here under heavy contention. But blocking? That’s bad.

Here’s how it happens. In one session, create a clustered columnstore index:

CREATE TABLE dbo.Test (Id INT);
GO
BEGIN TRAN
CREATE CLUSTERED COLUMNSTORE INDEX CCI ON dbo.Test;
GO

CREATE TABLE dbo.Test (Id INT);

BEGIN TRAN

CREATE CLUSTERED COLUMNSTORE INDEX CCI ON dbo.Test;

Then in another session, in the same database, try to query that DMV:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT * FROM sys.dm_db_persisted_sku_features WITH (NOLOCK);

1 2	SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; SELECT * FROM sys.dm_db_persisted_sku_features WITH (NOLOCK);

I know, those two isolation level hints are redundant – I’m just proving the point that it’s really a bug. It gets blocked, which shouldn’t happen. Other DMVs don’t work this way.

Once I had it boiled down to that simple repro script, I filed a bug report with Microsoft, and added code in sp_BlitzIndex to simply skip that DMV by default. (You can change that by setting @SkipBlockingChecks = 0.) I wasn’t sure how quickly Microsoft would fix the bug, or how far back they’d backport the fix.

It’s been a month now without a word from Microsoft on the bug report, so I don’t hold out a lot of hope for that one. (It doesn’t help that I accidentally flagged the idea as inappropriate just now, and I have no way to unflag it, HAHAHA. Man, I *really* hate the user interface for feedback.azure.com.)

sys.identity_columns has the same problem.

Similar repro:

CREATE TABLE dbo.Test (Id INT IDENTITY(1,1), CONSTRAINT PK_Id PRIMARY KEY CLUSTERED (Id));
GO
BEGIN TRAN
ALTER INDEX PK_Id ON dbo.Test REBUILD WITH (ONLINE = OFF);
GO

CREATE TABLE dbo.Test (Id INT IDENTITY(1,1), CONSTRAINT PK_Id PRIMARY KEY CLUSTERED (Id));

BEGIN TRAN

ALTER INDEX PK_Id ON dbo.Test REBUILD WITH (ONLINE = OFF);

Then in another window:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT * FROM sys.identity_columns WITH (NOLOCK);

1 2	SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; SELECT * FROM sys.identity_columns WITH (NOLOCK);

That gets blocked, and I filed a bug report for that one too, and a Github issue for sp_BlitzIndex that I’ll work on at some point, but that one’s less urgent since you can’t (currently) set up Agent jobs to log sp_BlitzIndex to table. This one’s going to be a lot more challenging though, as you can read about in the Github issue – the workaround’s going to have to be a lot more ambitious, I think.

Are these big bugs? Not at all – but if you’re writing diagnostic queries like sp_Blitz and friends, they can turn into hours of troubleshooting trying to figure out what’s going wrong – especially given how widespread these scripts are these days.

Announcing Weekend Versions of My Mastering Classes

How to Think Like the Engine: Index Column Order Matters a LOT.

15 Comments. Leave new

Gordon Peters
November 19, 2019 9:14 am

You think a database snapshot doesn’t change? Guess what?

When a table has a rowversion column, the updated rowversion values bleed through into the snapshot database.

Reply
Uche
November 19, 2019 9:59 am

These DDL statements are wrapped in a transaction and needs to be committed or rolled back for SQL to know the exact values to return from the DMV’s. DMV results are dependent on DDL statements. i don’t see this as a bug. However I might be wrong

Reply
- Brent Ozar
  November 19, 2019 10:01 am
  
  Uche – the rest of the DMVs allow for dirty reads even when indexes are being created. For example, try these demos with sys.indexes, and you’ll notice that it allows for dirty reads. The whole point of dirty reads is to not need exact values to return.
  
  I wouldn’t have a problem if none of the DMVs allowed for dirty reads – it’s just that the vast majority do (and that’s how monitoring systems work.) Only a few DMVs, like these two, don’t. That’s why it’s a bug.
  
  And you’re wrong. 😉
  
  Reply
  - Uche
    November 19, 2019 10:08 am
    
    Makes sense. Thanks
    
    Reply
Bill
November 19, 2019 10:15 am

A business impact statement helps Microsoft prioritize fixes, as does a Premier account’s technical account manager. Otherwise they must guess at the BI…

Reply
gabriele d'onufrio
November 19, 2019 11:58 pm

Can I buy privately my “Mastering Query Tuning” although I live in europe?

Reply
- Brent Ozar
  November 20, 2019 3:16 am
  
  No: https://www.brentozar.com/go/gdpr
  
  Reply
  - Jon
    November 20, 2019 7:42 am
    
    Wow, that post was from 2 years ago and things still haven’t panned out. I haven’t been follow the fall out from the GDPR. It would be interesting to know how that legislation has been enforced so far.
    
    I wonder if people can get around it by using bit coin and signing up anonymously. This type of regulation comes from the idea that “All businesses are bad and are out to screw the worker. And all governments are good and are out to help the people.” mentality. Of course, in reality everything is gray.
    
    Reply
    - Brent Ozar
      November 20, 2019 7:43 am
      
      Jon – we don’t accept Bitcoin nor do we allow anonymous signups, so that’s a no-go.
      
      Reply
Michael J Swart
November 20, 2019 5:52 am

One bit of this post I’m going to paraphrase and steal the next time someone complains about one of our own escaped defects:
“Hey, didn’t you guys test this stuff?”
“We did, but there’s no way someone could possibly have thought to test that particular combination of craziness.”

Reply
Becky Harter
November 20, 2019 12:37 pm

By accidentally flagging as inappropriate you might get someone to review it more quickly 😉

Reply
- Brent Ozar
  November 20, 2019 1:00 pm
  
  Yeah, that’s what I thought a month ago when I wrote the post…
  
  Reply
  - Becky Harter
    November 20, 2019 1:23 pm
    
    Oh, yeah… guess not 🙁
    
    Reply
Marcia Wilson
November 23, 2019 9:09 am

Loved your opening sentence.

Reply
- Brent Ozar
  November 23, 2019 9:24 am
  
  Hahaha, thanks!
  
  Reply