In the free 30-minute intro calls I do about our SQL Critical Care®, one of the questions I ask is, “Do you have a monitoring tool? And if so, what has it told you about the root cause of the problems you’re facing?”
I either hear one of two things:
- They don’t have a tool, or
- They have a tool and can’t figure out what the problem is
During the engagement, I’ll sometimes ask the admins to open up the monitoring tool and then answer the following questions – not verbally, but actually do it in front of me:
- Find the last time when SQL Server was slow (not from guessing or help desk tickets – use the tool)
- Find the most resource-intensive queries that were running at that time, and show me their query plans
It turns out that most people can’t do that.
Monitoring tools aren’t intuitive.
There’s no built-in Clippy that pops up when you open the tool and gives you a guided walk-through of what you’re looking for. It’s up to you to read the manual, figure out how to accomplish those tasks, and then train the rest of your team to make sure they can do it, too.
Your monitoring vendor wants to help. Call their support desk or open a support ticket and ask if they offer a guided tour service. All of the monitoring vendors have support teams who can share your screen, take control, and walk you through the app. Have them answer the same two above questions, and take notes while they do it. Then, repeat the process yourself, and get good at it.
It’s not your fault that most monitoring systems are a little hard to use. However, it IS your fault that you’re not an expert at using your tools. You have to learn your monitoring tool the same way you learn SQL Server. Now get in there and sharpen those skills.
What’s a manual? It sounds hard.
Captain Slow? 😉
I don’t like manual stuff either. I like things automated. Manual is dreary and mindnumbing. Kittypics are better.
Many monitoring tools are also just bad in some areas. I always say this, that holistic network monitoring is one of the most difficult things in IT. While many monitoring tools claim to be able to monitor everything, they usually only monitor a few things well and with a reasonable amount of effort. id est – trying to monitor almost anything that isn’t a microsoft product with SCOM should be classified as cruel and unusual punishment by the Geneva Convention. Some third party vendors have better management packs than others – Dell for instance, but if you end up with multiple generations of those products you quickly end up in management pack hell and configuring monitoring for each generation of those products that may use different technologies for monitoring.
Don’t get me wrong, SCOM can certainly monitor whatever you want it to, but the amount of time you need to invest in it to get there is a bit absurd.
My preferred, conceptual model of monitoring – which I haven’t seen implemented anywhere is to use SCOM to monitor Microsoft products at the OS and application level, If you are a Cisco network, use Prime to monitor the network, if not use solar winds, and then use solar winds to monitor storage, virtualization, printers, but most IT managers aren’t going to like having 2 or 3 monitoring solutions, and will ask, “why cant we just use one?”
I worked somewhere a few years ago that went through 2-3 monitoring solutions that monitored “everything.” what they all did was have all of the settings turned up to 11 and generate something like ten thousand email alerts per day, which ended up making it worse than no monitoring at all, as the help desk triaging thousands of alerts per day ended up taking them away from real problems and pro-active maintenance. We eventually had someone go through one of them and tune it all for about a month and a half and ended up with a reasonable monitoring solution. Cool. Except his labor cost more than SCOM and Orion licensing for something like 2-3 years which would have provided more functionality that was required in this particular environment.
People underestimate monitoring tools. It’s kind of a science, I think. What alerts do you want to get, what is an alert when, and when not? Configuring and maintaining a monitoring tool is full job, because metrics and baselines keep changing, as well as monitored solutions do in versions and fixes. That all costs money, and many a manager will say: well, it already is pretty expensive (unless you have written the correct business case, then it is just part of the costs) and I cannot sell this to upper management as this system actually does not do anything directly related to selling our office supplies, nuts & bolts, or pizza’s – depending on your LOB….
Until, ofcourse, a load of your systems go SNAFU, which you could have predicted with the right tools. Oh well, who ever said life was easy, eh?