Let’s Corrupt a Database Together, Part 2: Nonclustered Indexes

In part 1, I showed you how to corrupt a database – not because you’d need to do it, or because SQL Server has some kind of vulnerability. I just wanted to show you how SQL Server reacts when it finds corruption in a data file, why you should check your databases for CHECKSUM settings, and set up alerts so you can react fast.

However, in part 1, I purposely kept the data simple to teach you the basics first.

This time around, I want to teach you the concepts of what happens when you have multiple copies of the data. Let’s make our table a little more complex, and have both a clustered index and a nonclustered index:

Here’s our data:

Louis Davidson would be so proud

As we did in the first post, let’s fire up XVI32 and edit our mdf file. This time, though, when you do a search for Stan, you’ll find it in two different places in the data file – once for the clustered index, and once for the nonclustered index (which only has Guy and ID on its 8K pages.) For more details on how those are stored, check out my free How to Think Like the Engine class.

Clustered index (note that it has both the way and the guy)
Nonclustered index (which only has the guy, not the way)

Let’s do something a little tricky: let’s edit the nonclustered index (the latter screenshot) and change Stan to Flan, then save the file and bring the database online again. What’s our data look like?

Stan and Flan

His name is either Stan or Flan – depending on where you get his name from, the clustered index or the nonclustered index! SQL Server doesn’t catch this unless you have checksums turned on. (Again, really important: if you inherited somebody else’s server, or if a vendor is creating databases for you, you need to run sp_Blitz and make sure checksums are on for all of your databases. Just because you’re running current versions of SQL Server doesn’t mean you changed database settings along the way as you upgraded. Trust me, I’ve seen your databases. They’re heinous. Also, you need to floss.)

If you query msdb.dbo.suspect_pages, the kinda-sorta-system-table that tracks pages where SQL Server suspects corruption, it comes back empty:

SQL Server doesn’t suspect anything

SQL Server can’t know that there’s corruption if there’s no checksums. We’re flying blind here.

To catch this problem we have to run CHECKDB.

I’m trying to get better at calling it CHECKDB instead of just saying DBCC, because DBCC has other useful commands like SHRINKFILE and DROPCLEANBUFFERS and INPUTBUFFER and … wait, none of those are really all that useful. Anyway, here’s CHECKDB:

Whew – CHECKDB caught that.

CHECKDB has a heck of a lot of work to do: it has to read every page and identify differences between objects that are supposed to be the same. In this case, it basically joined the clustered index and the nonclustered index together, detected that not every field matched, and told us exactly what 8K pages had problems.

More data, more pages, more indexes = more time.

If you read CHECKDB’s output in that screenshot carefully, you’ll see that it says “index ID 2” – which means our nonclustered index. (Index #0 is a heap, and #1 is a clustered rowstore index.)

Oh, and by the way – even though DBCC CHECKDB realized that something is wrong with the page, and our index rebuild failed, the suspect_pages table is still empty:

Still no suspects

Muhahaha. There’s a very specific list of times when SQL Server will populate the suspect_pages table, like errors 823 or 824, but this particular issue isn’t in the list.

How do you fix this corruption?

In the real world – especially after you read this post – you’ll have checksums on all of your pages, which makes this kind of thing really simple:

  • The clustered index says his name is Stan, and the page has a correct checksum
  • The nonclustered index says he’s Flan, but the page has an incorrect checksum

By reading CHECKDB’s output, looking at index ID’s, and looking at page numbers, I can figure out whether I need to just drop & recreate the nonclustered index, or whether I have more widespread problems with clustered indexes too.

In this case, I can’t just rebuild it to make the corruption go away, because that fails:

Put the beer down

Instead, I have to actually recreate the index by dropping it and recreating it.

This was an artificially simple scenario (and I’m still coming up on a thousand words) because I need you to understand that:

  • Clustered and nonclustered indexes get corrupted independently
  • The same field can have different contents in different places
  • CHECKDB won’t do all of the work for you, and you have to do some manual work to recover from corruption even in the simplest cases

I haven’t even covered:

  • How to parse CHECKDB’s output at scale (multi-terabyte databases, thousands of corrupt pages)
  • How to recover from single-instance-storage failures (where a field’s only existence is corrupt, like when there’s not an index copy)
  • How to do page-level restores

What you need to do next:

I said it in the first post in the series, but I need to drive it home again: enable checksums. No foolin’, this is important stuff. It doesn’t stop corruption, but rather just lets you detect it faster.

You might go your whole life and never experience the sheer terror of corruption. However, if you wanna be more prepared – or if you’re like me, and you just find these kinds of stunts fascinating – check out Steve Stedman’s Corruption Challenge. He gives you ten corrupt databases, and you have to figure out how to recover with as little data loss as possible.

In the next part in the series, you’ll learn what SQL Server operations do – and don’t – detect corruption.

Previous Post
[Video] Office Hours 2017/02/22 (With Transcriptions)
Next Post
Vote Now to Pick the #GroupBy April Lineup

18 Comments. Leave new

  • Hi Brent,

    You should be able to disable the index and then rebuild it. Prefer that to delete and recreate.

    Reply
  • SQL Server Performance Tune
    March 1, 2017 4:49 am

    “In this case, it basically joined the clustered index and the nonclustered index together, detected that not every field matched, and told us exactly what 8K pages had problems.”

    Did it, really?

    “Keys out of order on page (1:264), slots 2 and 3.”

    I read this like the value in slot 2 (third row, Lee) is greater than the value in slot 3 (fourth row, Flan).
    This is a consistency problem within the index itself. There is no pairing to the clustered data here as far as I can see.

    Reply
  • Innocent Gumbo
    March 3, 2017 1:45 am

    Thanks Brent. I did try this and also using my own way. Honestly this gave me a hint why I was failing to solve a sticking issue in my life. I am beginning to open Database files and try doing all sorts of things. I will see.

    Thanks, this was very good information especially when it comes to Clustered and Non-clustered index relationships.
    Great article

    Reply
  • Good to see more people publicizing corruption and recovery techniques. However, I’m confused how this example illustrates your point: “CHECKDB won’t do all of the work for you, and you have to do some manual work to recover from corruption even in the simplest cases”. Running repair will fix this corruption – no effort required (apart from having the database essentially offline). Thanks

    Reply
  • Unable to repair the corrupted database. How to fix it?

    Reply
  • Michael MacGregor
    July 6, 2018 1:01 pm

    Where’s Part 3 and any other parts? There’s no link to the next part at the end of this one.

    Reply
  • OMG.. this is a funny discussion. haha

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.