Companies call us for performance or high availability issues, but over and over, the very first two things we find are:
- They’re not taking backups to match the business’s RPO and RTO
- They’re not doing CHECKDB weekly, or at all, and don’t understand why that’s an issue
So let’s walk through a simple scenario and see how you do.
It’s Thursday morning at 11AM, and you get an email: users are reporting corruption errors when they run SELECTs on a critical table. You run the query, and it turns out the clustered index on the table has corruption.
Here’s your maintenance schedule:
- Full backups nightly at 11PM
- Log backups every 15 minutes
- Delete log backups older than 2 days (because you only need point-in-time restore capability for recent points in time, right?)
- CHECKDB weekly on Saturdays at 9AM
You can’t repair the corruption (it’s a clustered index, and there aren’t enough nonclustered indexes to cover all the columns), and the business needs that data back. You’re on: answer these questions:
- What backups do you restore, in order?
- Will they be free of corruption?
- How much data will you have lost?
- How long will that process take?
- Given that, what’s your effective RPO and RTO?
- If the business said that wasn’t good enough, what specific steps could you take to improve those numbers without spending money?
This week, while folks are working at low speed due to the holidays, double-check those backups and corruption check jobs.