How to Pick a Monitoring Tool

Step 1: Make a list of 5 problems you’ve faced in the last couple of months that you needed alerting on. If you’ve got a help desk ticket system, look at the ticket types that occur most frequently and cause the most outage times.

For me as a DBA, that might be:

  • SQL Server service down
  • Deadlock occurs
  • A runaway query consumes high CPU
  • An Agent job is running more than 2x the time it usually takes to run, and it’s still going
  • Log shipping gets more than 15 minutes behind

Step 2: Set up a lab to repro those problems on demand. This is actually a great way to learn about these problems, by the way – the more you understand how to create these situations, the better you’ll be at detecting and reacting to them.

Step 3: Download the eval editions of the tools you want. All monitoring software vendors give away short-term (10-15 day) versions. Install & configure them to monitor your test lab.

Step 4: Actively evaluate them. Build a spreadsheet with a column for each monitoring tool, and a group of rows for each failure scenario. For each problem, when you trigger it, document:

My favorite monitoring dashboard. I'm looking for four 7's of uptime.

My favorite monitoring dashboard. I’m looking for four 7’s of uptime.

  • How long it takes to alert you about the correct underlying problem
  • How many false alarms you get (alarms that are unrelated to the real problem)
  • How intuitively obvious the real problem is when looking at the tool’s dashboard (are all the lights flashing red, or is there a single light flashing red exactly where the problem is?)

Step 5: Pick winners and negotiate price. Out of the tools you evaluated, pick at least two that you’re willing to live with. Call each of the vendors and say, “I did a tool evaluation, and it was a tie between you and ___. What’s your best price? I’m going to be asking the other guys too.”

They’re going to want to start talking about value differentiators, like how they’re so much better than the other company because they do ___. Doesn’t matter – you’ve already picked the two tools you’re willing to live with. Let them talk, listen fairly, and then repeat the question: what’s your best price?

You don’t have to pick the cheapest one – there may be one tool you like much more – but at least now you’ve gotten good prices on both, and you can make an informed decision.

Previous Post
Trace Flags 1117, 1118, and Tempdb Configuration
Next Post
Performance Tuning SQL Server Change Tracking

8 Comments. Leave new

  • I was choosing my current monitoring tool in such a way [REDACTED], which one accoring to all parameters:quick various alerts, offred troubleshooting, easy to use and enough positive reviews from users.

  • I had to do this recently, and even after putting together a convincing case for the monitoring suite that I really wanted that came packaged with a certain tool for digging into execution plans, my boss picked a product that was recently purchased by a company that we already have a relationship with for our general computer and network monitoring, even though it cost about twice as much. Bum and bummer.

  • is this still the best approach if the environment is new? And what I mean by new is we just took a legacy app desktop app using postgresql and rebuilt it in java and sql server 2012. So I don’t have any type on errors or true tickets bum rushing my inbox just yet…just maybe disk space issues. But anywho our CIO expects there to be such monitoring already in place once we go live.

    • Marcus – you’ve had issues in the past with PostgreSQL, right? Take those same types of issues and check with your new platform.

  • Hello, I’ve poked around your site a bit, and didn’t find much about Management Data Warehouse. Have you b/vlogged about it at all, and I just missed it?

    We’ve got a good third-party tool (SQLSentry), which is a nice “Swiss Army knife” thing to have in the drawer but I’d really prefer to have something that stores stuff natively, so that it’s easy to extract and compare against other performance and log data from other sources during the same time window.

Menu
{"cart_token":"","hash":"","cart_data":""}