It’s not your job.
No, seriously, hang on a second. I’m not saying that you should ever say things like “It’s not my job,” but I do want you to understand when you should avoid reinventing the wheel. Your Windows administrators should be monitoring every Windows server you have, making sure it doesn’t run out of drive space. This isn’t some wild, unique need that only SQL Server has – it’s everywhere in your data center.
(You might actually even BE a Windows admin, just here because you need to learn about working with SQL Server. It might actually BE your job to monitor this stuff. That’s cool – that’s how I got started too.)
In our journey from the base of my Hierarchy of Database Needs to the top, we do indeed need to talk about capacity – but I don’t want to monitor drive space from inside SQL Server, and I don’t want to buy SQL-specific tools in order to pull it off. Yes, you can technically use commands like xp_fixeddrives to list all of the SQL Server’s drive letters and how much free space they have, but that doesn’t work everywhere. For example, you might have mount points or databases on UNC paths, neither of which will show up in xp_fixeddrives. So leave the drive space monitoring to the pros.
Why You Shouldn’t Build a Monitoring Tool
If you want to be a professional developer, you should build things. You should learn what exceptions could happen, how to trap them, and how to fail elegantly. It’s hard work, but if you get good at it – really good – you can build amazing things and make a killer living for yourself.
But if you want to be a professional DBA, you should leave the developing to the developers.
I’m not saying you shouldn’t learn the dynamic management views (DMVs), how to dig through them to find the evidence you need, and how to turn that data into actions. Hopefully, I’m going to teach you how to do a lot of that over the course of the next six months. Take another look at the Hierarchy of Database Needs again, and think for a second about all the things we’re going to be learning over the coming posts. Just in the last five, I’ve had you build an inventory of your servers, automate your database restores, start locking down security, and enumerate your database risks. The next few are chock full of things that will help you become a hero in the eyes of your users and your developers.
Building a crappy monitoring tool in your spare time will not give you that same level of respect. (And yes, if you’ve only got your spare time at work, your monitoring tool is going to be crappy. If you’ve got so much time that you can build a great monitoring tool, you need to focus on providing results to the business fast, because you’re about to be laid off as pure overhead.)
How to Pick a Monitoring Tool
There are basically two kinds of real time monitoring tools out there for SQL Server.
Up/down Alerting – these tools make sure the SQL Server service is running, and that it has all the basic needs taken care of. If the server runs out of drive space, if the service stops, if the event log starts filling up with errors, or zombies attack, these tools will let you know. The most popular software in this category is Microsoft System Center and Red Gate SQL Monitor.
Performance Diagnostics – these tools try to derive insight from SQL Server’s dynamic management views (DMVs) to help performance tuners wring the most throughput out of their hardware. The most popular tools here are Idera SQL Diagnostic Manager, Quest Spotlight, SentryOne SQL Sentry, and to some extent, the tools in the above category too.
If I was you, I’d start by asking the Windows team if they’ve got any software that handles the up/down alerting for services, drive capacity monitoring, etc. If so, get them to start monitoring your servers. I’m being selfish here – my real reason is that I want to save my monitoring budget for tools in the Performance Diagnostics category. Surprisingly, all of these tools are around the same price – around $1,000-$2,000 USD per monitored instance.
Then, no matter which category of tool you’re buying, make a list of the last 4-5 things that have caused your phone to ring after hours. Here’s some of my personal favorites:
- Out-of-control query running for hours
- Long-running job, like a backup going too slow
- Queries that desperately need an index
Figure out how to recreate those same symptoms in your development environment, and then get a free trial of a couple of the tools I mentioned. (All of them provide free 10-14 day trials.) Reproduce the problem, and watch how the monitoring software reacts. Does it lead you to the root cause quickly, or does it just have a bunch of flashing red lights on the console? The best ones will save you time by getting you right to the solution.
After you’ve done your evaluation and picked a favorite, get a quote from them – and get a quote from the other vendors as well. Share the quotes with the competing salespeople. You’d be surprised how willing they are to negotiate, and you might be able to use the cheapest quote to get the best tool at a price that works for you.
Your Homework: Start Tracking Your Pain Points
I don’t expect you to magically get budget approval for a tool this week, but I want you to start making a list of day to day problems as they strike. If you waste four hours troubleshooting a deadlock problem, make a note of the date, the time required, and a few words about the root cause. In a month or two, you’ll be able to present this list to your boss as proof of why the right monitoring tool could save you money.
If you’re still convinced you should build your own tool, go check out the open source project Opserver. It’s a free performance monitoring tool from the people at Stack Overflow. They use SQL Server (and a lot of other cool tools), and theirs is the best open source performance dashboard I’ve ever seen.
Whew. So far, we’ve been focused a lot on protecting these valuable MDF files. Next up, what’s in ’em?