When the DBA reports to you, ask yourself two questions:
- “What would I fire the DBA for?”
- “How can I prove that they’re not doing that?”
There’s a lot of possible answers to #1, many of which involve cocaine and the office copier, but let’s focus on one I hear a lot: an inability to restore data that they claimed was backed up.
Backups are the foundation of our Hierarchy of Database Needs. They’re incredibly important, but yet they often get ignored because nobody’s banging on the DBA’s cubicle telling them to focus on backups. Instead, users are screaming about performance or too-tight security.
It’s up to you, their manager, to enable them to get the basics down. Here’s how to do it.
1. Ask for a report of most recent backups, by database.
Call in the DBA and give them a list of columns you want in the report:
- Server name
- Database name
- Database size
- Purpose (production, QA, development)
- Last full backup finish date/time
- Last log backup finish date/time
- Last successful DBCC CHECKDB finish date/time
- Last date/time the backups were restored for testing
They can produce the report in any format that works for them, but they have to be able to produce it on demand with less than five minutes of manual work. (Ideally, they’d give you a solution you can run on your own, without telling them, but let’s start with something easy.)
Tell them this first task is about getting the lay of the land first, not fixing the problems. Oh, they’re going to find problems alright – but focus on just getting a clear picture of the work ahead first.
This task may take them a day or two because they have to assemble an exact list of the servers they’re responsible for, plus build the report process. (The good DBAs reading this are thinking, “I could do this in fifteen minutes!” If you’re just saying that now, that means you haven’t done it yet, and you’re not quite as good as you think. The great DBAs are thinking, “Yeah, that’s about how long it took me to get it right the first time.”)
2. Help the DBA come up with an improvement plan.
They’re going to be ashamed/disappointed by some of these numbers. (Or if they’re not, there’s your answer about their fit for the DBA position.)
Let them build a back-of-the-napkin mini-project plan to improve the metrics for your most important server. Give them that time – typically there’s a day or two of work involved – and after they finish, review what worked about the mini-project and what didn’t. Use that knowledge to build out a plan for the rest of the servers.
Your job is to run interference for them while they get the backups and DBCCs under control. When the DBA gets “urgent” help requests from end users, here’s what I coach them to say:
“Right now, I’m focused on a project to get our backups under control. If this issue is more important than getting our production data backed up, let’s go into my manager’s office together and talk through it. If not, I hate to say this, but I’ll need you to file a help desk ticket and I’ll get to it once we’ve got the backups under control.”
Yes, this means admitting that the backups aren’t under control. It’s the only way end users will accept a “no” from the DBA.
After this project finishes, run the report again, talk through the results, and then take the results to the business users who are storing data on these servers.
3. Ask the business if this amount of data loss is acceptable.
Take the backup numbers to the business along with our free RPO/RTO Worksheet. By default, the business usually wants zero data loss and zero downtime, but as good IT managers, we have to communicate the realities of today’s databases.
Armed with the costs and timelines in that worksheet, bring written RPO goals back to the DBA. (Yes, to be a good manager takes work, and these are political problems, not technical problems – save your DBA for the technical problems.) Sometimes, this will mean bringing budget numbers back as well – if the business wants tighter RPO/RTO goals than the current infrastructure provides, the DBA will need resources to build the infrastructure improvements.
4. Measure the gap between IT’s RPO goals and the DBA’s delivery.
Remember that report the DBA built? Add two columns:
- IT’s goal for this database’s data loss (in minutes/hours) – you and the DBA come to an agreement on a realistic goal. In underfunded shops, this is sadly less ambitious than the business’s goals. (Example: “The business wants us to lose no more than 1 minute of data, but I understand that they’re not giving you space for log backups, so let’s aim for losing no more than 1 hour of data.”) In well-funded shops, the IT goal may actually be more ambitious than the business’s goal.
- The gap between the DBA’s delivery and IT’s goal – if the last successful backup was 45 minutes ago, but we’re aiming for <30 minutes of data loss, we’re in the red by 15 minutes. This metric is how we measure DBAs.
As long as this gap is in the black, the DBA is doing a good job on backups. (Note: I’m not even talking about restore tests or DBCCs here – this is just what I consider table stakes on managing a DBA.)
When this number goes into the red, the DBA should stop focusing on anything else – anything – and focus exclusively on getting the data protected again.
For bonus points, trend the total server count, database count, and total data size over time. This helps your user base understand that the infrastructure will come under more pressure as the data size grows – you can only cram so much data into a server before performance issues arise.