Storage Corruption

ball-of-flames-fire-shutterstock_161205068When SQL Server writes data, it just assumes that the data will always be there when we need to query it.  Unfortunately, storage corruption happens without SQL Server knowing: a drive will go bad, the SAN will write garbage, or goblins will throw a party in your solid state drive.  SQL Server only finds out about the bad news when it needs to make sense of the data.

(Spoiler alert: doing a backup isn’t making sense of the data, either – by default, SQL Server just reads the storage and dumps it out to backup without trying to parse stuff on each page.)

Suspect Databases Found

If you hit this check, it’s too late – time to look at repairing the corruption. Start by opening a support case with Microsoft, because you’re in dangerous territory here. A $500 support case with Microsoft will be the cheapest way to get to the bottom of the problem – consultants can cost tens of thousands of dollars in scenarios like this.

SQL Server Tracks Suspect Pages in MSDB

When SQL Server reads a page to satisfy a query, and it detects corruption on that page, it logs the problems in msdb.dbo.suspect_pages.  One part of sp_Blitz® queries that table looking for any records whatsoever, and logs an alert.  You can do a simple SELECT * FROM msdb.dbo.suspect_pages to get more details about the corruption.

Corruption Can Be Repaired by Mirroring and AlwaysOn

AlwaysOn Availability Groups and Database Mirroring both have a really cool feature: they’ll automatically recover corrupt pages using the mirrors and replicas.  If a page is corrupt on one of the servers, it’ll contact the mirroring partner, get a clean copy of the page, and repair the bad one.  This is possible because storage corruption usually happens inside the data file storage (well, duh) and the storage isn’t what’s being replicated from one place to another.

We check sys.dm_db_mirroring_auto_page_repair and sys.dm_hadr_auto_page_repair looking for events from the last 30 days.  For more information, query the DMV sys.dm_db_mirroring_auto_page_repair and sys.dm_hadr_auto_page_repair, plus check out the Books Online topic on automatic page repair.

To Fix the Problem

If you’ve been experiencing corruption, it’s time to bust out DBCC CHECKDB to check the entire database – or even better, all of the databases on that server.  SQL Server only repairs the pages that it needed to read, but it may not have had to read every page.  Better catch corruption fast before you lose data!

After you’ve fixed it for good, you can delete the rows in msdb.dbo.suspect_pages. (Isn’t that weird? Rare that a system table actually allows you to do that.)