DBA Training Plan 3: Treat Your Servers Like Cattle, Not Like Pets

In our last post in this series, I talked about why you should be using Ola Hallengren’s maintenance script for backups and sp_DatabaseRestore for faster restores. These two tools have implications for how you build, back up, recover, and fail over your SQL Server.

There’s a saying in the cloud: treat your servers like cattle, not like pets.

What this means for you, in graduating level of difficulty through your upcoming career:

1. Server setup isn’t done until you’ve tested and timed a point-in-time rebuild & restore. When you’re done setting up the full and log backup jobs on the new server, but before you’ve gone live, restore your real production databases. Run the full and log backup jobs, producing a few sets of backup files in their soon-to-be-normal location.

Now here comes the fun part: shut the server down, and start the clock. Your mission is to bring all of the databases online, with their databases up to a specific point in time – say, 5 minutes ago. Do it, and time how long it takes you. No, you’re not allowed to turn that original server back on – he’s broken. (Say for example that he’s having problems with a Windows patch, and the business is tired of waiting for the sysadmins to figure it out – they need you to start Plan B now.)

You’ll learn a lot as you step through this. It feels overwhelming at first, because you’ve probably been building your servers manually, stepping through the SQL Server install GUI manually, setting up things like Cost Threshold for Parallelism, your favorite trace flags, and SQL Agent jobs. That was fine when you weren’t a Database Administrator – but now that you’re moving towards becoming a full time DBA, you need to think about a better long term plan. That means…

2. Servers should be built from well-documented checklists. As you go through phase 1, you’ll realize that you really need to be writing this stuff down as you go. Documentation is a gift to your future self – not just in the sense that it makes your job easier, but it lets you pass work off to others when you need help. During an emergency, when you’ve lost 19 SQL Servers at once (hoo boy, August 2007), you’re able to hand different checklists off to different coworkers and say, “You, follow these checklists and tell me when you’re done.”

This affects your daily database management, too: I’ve seen DBAs apply a trace flag or sp_configure setting during an emergency in order to fix something, but then not document what they just did. When that SQL Server goes down hard, and they build a new one, they simply don’t remember every setting they made to the old server along the way. When the new server comes up online, and has the same performance issues they had a long time ago that required the unique setting, they have to go through the same old troubleshooting process. Bottom line: don’t make undocumented changes to servers – leave a bread crumb trail behind in your server’s documentation or build checklist instead.

I’ve heard folks say, “It’s no big deal: I can just copy the settings from the old server to the new one.” Not when the main server goes down hard, you can’t, nor when you’re dealing with a disaster recovery situation because your primary data center or availability zone is down.

3. Long term, servers should be built and user databases restored with automation. As you start to manage dozens of database servers, checklists don’t scale. You need a way to turn manual steps into automated actions so that you’re not sitting around waiting for the next progress bar to move to the end. However, this is where my advice starts to get a little fuzzy: the SQL Server product has sprawled out to a huge coverage area over the last decade (SSAS, SSIS, SSRS, clustering, Availability Groups, Linux, and now Kubernetes) and there isn’t a single desired-state installation & configuration tool that covers the whole surface area.

If you want to get to automated builds & restores, the closest thing right now is DBAtools & PowerShell, but just be aware that while we call this “automation,” it’s more about automation tooling than just something you download and click Go. Think of it as a box of tools, and you still have to build out an automation solution that works well for you. Thankfully, the DBAtools community is welcoming & hard-working, and they’re constantly making the tools better.

4. Even longer term, if it’s automated, it’s testable. For decades, serious database administrators have automated their own fire drill testing. They’ve built standalone SQL Servers that restore their production backups, run CHECKDB against them, and then send reports of success or failure to the DBA. This gives you a canary-in-the-coal-mine approach to detecting backup problems early.

Your Homework

Database Reliability Engineering – good buy

This post is a little shock-and-awe.

But it’s also a fork in the road: do you want to specialize in being a production database administrator (who makes sure the SQL Server is online and the database is accessible), or do you want to specialize in database development & performance tuning, making sure the queries run quickly?

These are really two different careers: people who tell you they do both are lying, because they suck at one of them (or maybe even both of ’em.)

If you want to specialize in production database administration, your future learning path focuses on things like PowerShell, scripting builds, monitoring for server configuration changes, applying patches in a controlled and recoverable manner, and engineering reliability into your infrastructure. If you think this is the route you want to take, pick up the book Database Reliability Engineering to get a glimpse of what your next ten years will look like. I don’t think there’s ever been a more exciting time to take that leap.

If you want to specialize in performance tuning, we’ll be talking more about that over the coming weeks as well – stay tuned.

If you’re not sure yet, that’s okay. You can always come back to this fork again later. For several years, I chose to focus on production database administration because I really liked building servers and troubleshooting ’em. Then back around 2015, I came to the fork again and decided to give up on that line of work and focus on performance tuning instead. I love the feeling of making a user’s query faster, or even better, making an entire workload go faster. (Plus I was sick and tired of getting called after hours for emergencies!)

In the next post, we’ll talk about the next job duty for production DBAs: checking for corruption.

Previous Post
DBA Training Plan 2: Backups (And More Importantly, Restores)
Next Post
DBA Training Plan 4: Checking for Corruption

12 Comments. Leave new

  • Jerry Manioci
    July 22, 2019 9:26 am

    Ola’s scripts for backups are nice. However, they can fail silently if things get mis-configured. Recommend monitoring / alerting for evidence that backups have succeeded, like an alert that looks for timestamps in msdb for each backup type.

    Reply
    • This should be a standard alert anyway imo. we have on log and fulls.

      Reply
    • Commvault is a god send for that. It takes good SQL backups, stores history in a SQL table with a very nice selection of pre-constructed reports and you can tune your alerts very well. Success alerts I think are even worse have no alerts. They tend to bury all the bad alerts in even small sites and provide a false sense of security where no spot checking EVER gets done, or even if you do have someone faithfully looking through thousands of emails per day, can bury a problem that may be 700 emails down while you lose valuable time reading “XXXXXX successful…” 699 times

      Reply
  • You are so accurate with your statement “These are really two different careers: people who tell you they do both are lying, because they suck at one of them (or maybe even both of ’em.)”. I am one of those people and I will admit I do suck at both of them really. When you do both jobs, it is hard to get proficient at either of them. I read a lot of articles, learn what I can, and make mistakes along the way. Humor is what gets me through some days. I enjoy the articles. Keep them coming!

    Reply
    • Krystal – awww, thanks!

      Reply
    • Ditto to what Krystal said! I was hired five years ago to do more performance tuning but ended up being asked to do all sorts of DBA stuff. My background was in Oracle so I ended up relying a lot on Brent’s posts, videos, and scripts to learn as much as I could about SQL Server. My previous career as a professor allows me to appreciate the amount of work (and the necessity of humor) that goes into all of this. Thanks, Brent!

      Reply
  • I like to be able to restore master, then msdb, then use msdb backup history to restore user databases from full and log backups. That means I always backup the msdb database last for both full and log backups so the other databases have their backup history completely in msdb for the restores. Also, make sure msdb is set to full recovery. Don’t forget to set model to any defaults you use like we set to full recovery so new databases have full recovery be default.

    Reply
    • Rob – I hear you, but I don’t like doing that because I often need to restore a mix of some databases and not others, or leave some databases up and running on a DR server. Because of that, I can’t overwrite MSDB – I’d lose the existing history each time I do it – so restoring straight from the folders with sp_DatabaseRestore works better for my style. To each their own though!

      Reply
  • I wish more people would support the fact that Production DBA work is a different role than Database Development and query tuning. I get so many companies looking for people to fill roles and they want both. And they don’t want to pay for it. I’m a Database Developer and I love tuning, but I don’t have much DBA experience.Thank you as always!

    Reply
  • Another step on the path to “cattle, not pets” level of professionalism is to stop giving your servers cutesy names. Yeah I get it, you love Star Wars, but don’t force future co-workers to try to remember whether “Tattoine” or “Kashyyyk” is the one with SSIS.

    Reply
  • EndUser2012
    July 24, 2019 5:48 am

    Very true … I am in this cover both juggling match with not so perfect documentation I cannot retype because 1 I not helping with builds and am instead buried by over aggressive monitoring and being asked to observe three different engagement paths at once with users constantly pm’ing me…

    Some skills I had literally are wilting on the vine

    Reply
  • Not necessarily a lie – I do both. Not nearly as well as I would like to, which is why I constantly read and study things like this site, but many of us are ‘reluctant DBAs’ – usually developers who fell into the role because there wasn’t anyone else to do it. I also have a very small setup to care for – I certainly couldn’t do both in a large shop.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}