Microsoft has been resting on Paul Randal’s laurels for far too long.
From 1999 to 2007, Paul poured his heart and soul into rewriting SQL Server’s code to check for and repair database corruption. (For more about his illustrious career, read his bio and enjoy the infectious enthusiasm in his bio photo.)
Paul did great work – his baby has lived on for over a decade, and it’s an extremely rare cumulative update that fixes a bug in CHECKDB. I’d like to think it’s not because nobody’s looking, but because he wrote good, solid code that got the job done.
But Microsoft is coasting.
This is a $30 RAID controller.
Meet the LSI MegaRAID SAS 9285CV-8e, one of the most junior RAID controllers you can buy for a server. When he’s bored, he has a couple of homework tasks he likes to perform: Patrol Read and Consistency Check. Between these two, he’s checking all of the drives in the array to make sure they match each other, and that they can successfully read and write data.
This helps catch storage failures earlier with less data loss.
You don’t have to configure this or set up a schedule – he just knows to do it because that’s what he does. It’s his job. You trusted him with your data, so every now and then, he does his homework.
SQL Server needs to do that.
Some of the pieces are there – for example, SQL Server already has the ability to watch for idle CPU times and run Agent jobs when it’s bored. For starters, that’d probably be good enough to save a lot of small businesses from heartache. For the databases over, say, 100GB, it’d be really awesome to have resumable physical_only corruption checking – tracking which pages have been checked (just like how the differential bitmap tracks page changes), with page activity reset when the page is changed (again, just like the differential bitmap.) This wouldn’t count the same as a real CHECKDB, which needs to do things like compare index contents – but holy mackerel, it’d be better than what we have now.
Because I’m just so tired of seeing corruption problems, and we can’t expect admins to know how this stuff works. I know, dear reader, you think admins should know how to set up and run corruption checking because it’s just so doggone important, you say.
But if it’s so important…
Why isn’t SQL Server doing it in the background automatically like $30 RAID cards have been doing for decades?