Perk Up Your Career with the SQL Server Troubleshooting Checklist

10 Comments

When an application is offline or performing so badly that the users are complaining, what do you turn to?

When your cellphone says a major incident is ongoing and you can’t get to a computer, where do you send the team at the office?

When you’re putting out fires with whatever you can find, how do you record what steps you’ve taken and what information you’ve gathered?

You’re Only As Strong As Your Team

One major reason to put together a troubleshooting checklist is simple: you can’t always be working.

SQL Server Troubleshooting Checklist
What Would Brent Ozar PLF Do? (PDF)

If you’re the only person who knows how to use SQL Server in your office, you need to train someone. They don’t need to be an expert, they just need to be able to run through your checklist successfully and gather information. Find someone sensible, practical, and with a steady hand who you can practice the steps with, and tailor the checklist so it makes sense to your secondary.

If you work with a large team of SQL Server experts, you need a troubleshooting checklist just as much. As we all gain experience we all develop different ways of doing things. We each focus on different things and may interpret things differently.

The troubleshooting checklist gives your team consistency: it gives everyone a base process to make sure the major areas are covered.

You’re Only As Smart As Your Documentation

The other major reason you want a troubleshooting checklist is that all the rules change when things get really bad.

When a company is losing money, it’s hard to do things in a logical fashion. You’ve got four people at your desk asking all manner of questions from “we just bounced the Heffalamps services, is it OK now?” to “are the Wuzzles impacted by this problem?” Your boss’ boss keeps poking their head around your monitor saying, “I’m just checking in.” You’ve got 500 emails from your monitoring.

When this happens, you remember about 75% of all the basic stuff to check. It’s incredibly easy to miss some obvious things, though. After all, you’ve needed to go for the bathroom for about forty five minutes and still can’t leave your desk.

The troubleshooting checklist helps here in three ways:

  • You always hit a consistent list of things that are important to your business.
  • If someone misses a step when following up on a problem, you have a place to add a step to ensure the mistake doesn’t happen again. The troubleshooting checklist gives you a way to correct human error— that’s the secret to de-personalizing an embarrassing mistake and instead showing you’re a professional who follows up on errors and is in control of their process.
  • You have a place to record information which others can read. This helps you clear out that crowd from behind your desk! Save the checklist in a place where they can read it, and let them know they can see updates on your progress if they let you get going.

To Do: Give Our Checklist To Your Manager

There’s one person who really wants you to have a troubleshooting checklist, but they think of it in slightly different terms. They think of this as an ‘Incident Management Response Process’.

Your manager would love to have predictable response to problems. This helps them understand and explain to others what value you add to the company. It also helps them understand and justify having someone who can be your backup when you’re not available. It helps your manager show that you’re working to have an organized production environment with defined processes for keeping applications available.

Here’s how to handle this. Download our SQL Server Troubleshooting Checklist and give it a read. Think about a couple of things you’d customize for your environment.

Then show the checklist to your manager and say, “I think having a basic process like this would be helpful for our team. I’d like to lead a project customizing it for our applications. What do you think?”

According to Penelope Trunk, the path to promotion is shortest in December.

Now there’s a holiday score.

Check Out the Troubleshooting Checklist

Download the free SQL Server Troubleshooting checklist (PDF).

Previous Post
Coping with Change Control Video
Next Post
Consulting Lines: SQL Server Needs a Dog

10 Comments. Leave new

  • Hi Kendra,
    Great article! I got back from a week off with > 200 articles in my RSS reader and this was the one I flagged for follow-up.

    You mention xp_readerrorlog, what’s the difference between that and sp_readerrorlog? (btw, should @p1 start with 0?)

    Reply
    • Hi Michael! Glad I made the list, and hope you had a good week off.

      I believe sp_readerrorlog calls xp_readerrorlog. I haven’t done any tracing to test, and both are undocumented– I actually didn’t even know about the sp_ version because I’ve always used the xp_, and both are undocumented. I’m basing this claim off what I just read here on sqlblog: http://www.sqlteam.com/article/using-xp_readerrorlog-in-sql-server-2005

      Yes, @p1 should be 0. Thanks for catching that and pointing it out. I’ll get a fix for that worked up shortly.

      Reply
      • Hi All

        I hope nobody minds the late post on this.

        There is some difference between xp_ and sp_ flavors of this. If I insert the returned data into a temp table for sp_ and do a search on the text field like ‘%checkdb%’ I get data; however, when I do the same for the xp_ through the same date range, nothing is returned.

        I’ve found that dumping everything into a temp table from both works best.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.