Why monitoring SQL Server is more important than ever

Last Updated 7 years ago

Moving parts

SQL Server keeps on growing. With every new edition, you get more features, feature enhancements, and uh, “feature enhancements”. As I’m writing this, SQL Server 2005 is less than a week away from support ending, and SQL Server 2016 is up to RC2. Brent’s retrospective post got me thinking a bit.

We went from Log Shipping, to Log Shipping and Mirroring, to Log Shipping and Mirroring and FCIs (yeah, I know, but Clustering 2005 was a horror show), to Log Shipping and Mirroring and FCIs and AGs, and Microsoft now keeps finding ways to add Replicas and whatnot to AGs. Simple up/down monitoring on these isn’t enough.

You need to make sure your servers are keeping up on about half a dozen different levels. Network, disks (even more if you’re on a SAN), CPU, memory, etc. If you’re virtualized as well, you have a whole extra layer of nonsense to involve in your troubleshooting.

And this is just for you infrastructure guys and gals.

For those of you in the perf tuning coven, you have to know exactly what happened and when. Or what’s killing you now.

Tiny bubbles

SQL Server has pretty limited memory when it comes to these things. Prior to 2016, with the advent of Query Store, and a ‘bug fix‘ to stop clearing out some index DMV usage data, your plan cache and index DMVs may not have all that much actionable or historical information on them.

And none of them keep a running log of what happened and when. Unless you have a team of ~~highly specialized, highly paid~~ barely cognizant familiars mashing F5 in 30 second intervals 24/7 to capture workload metrics and details, you’re not going to be able to do any really meaningful forensics on a performance hiccup or outage. Especially if some wiseguy decides the only thing that will fix it is rebooting SQL.

Monitoring is fundamental

If you have a DBA, you (hopefully) have someone who at least knows where to look during an emergency. If you don’t, it becomes even more vital to use a monitoring tool that’s looking at the right things, so you have the best set of information to work with.

There’s a learning curve on any tool, but it’s generally a lot less steep than learning how to log a Trace or Extended Events session (probably a whole mess of Extended Events sessions) to tables, and all the pertinent system DMVs, and blah blah blah. You’re already sweating and/or crying.

Because you know what’s next.

Visualizing all that data.

Time and Money

You don’t have time to do all that. You have too many servers to do all that. You need it all in once place.

SQL Sentry, Dell, and Idera all have mature monitoring tools with lots of neat features. All of them have free trials. Just make sure you only use one at a time, and that you don’t stick the monitoring database on your production instance.

The bigger SQL gets, the more you need to keep an eye on. Monitoring just makes sense when uptime and performance are important.

Thanks for reading!

[Video] Office Hours 2016 2016/05/04

Implicit vs. Explicit Conversion

14 Comments. Leave new

Jake Godi
May 9, 2016 8:47 am

‘Restoring tempdb since GETDATE()” this is an awesome tagline.

Reply
Rob
May 10, 2016 12:45 pm

Wish there was a hyper-link under “some wiseguy decides the only thing that will fix it is rebooting SQL” so that I can send it to some clients.

Reply
- Erik Darling
  May 10, 2016 5:23 pm
  
  If only all the reasons that people reboot servers could fit into one blog post. I think what may be helpful is to make sure the Remote DAC is enabled, and that people are aware of it. It’s also a good idea to have troubleshooting steps in place that happen before they’re allowed to reboot. Like, connect via the Remote DAC (if they can’t connect normally), run sp_whoisactive, look for troublesome sessions, and potentially kill them off.
  
  That being said, I’ve seen people want to reboot SQL because an Agent job failed, so, you know.
  
  Can’t help everyone.
  
  Reply
Martin Guth
May 17, 2016 1:52 am

I agree, monitoring is important while you typically don’t have time to look at everything happening. Being a DBA of just three SQL servers gives me the luxury to treat each server personally :-). I found the SQL Server Management Data Warehouse to be a great and free tool to monitor various things about SQL server. It may not be perfect but hey it’s included in the product and it’s free. I also like the top down analytic approach. I must admit, that I haven’t tried monitoring software from different vendors yet as MDWH satisfies my need. With the advent of query store it might be thrown out of the product in the future…but since I am on SQL 2008 to 2014 that’s not relevant for me today. For alerting I am using custom checks with Nagios and the error reporting of messages severity 16 to 24 which is recommended in the Brent Ozar SQL Server Setup Checklist.

Of course I also look at the error log and configuration changes. I found Michael Bourgons Overlord Auditing to come very handy for this job: http://thebakingdba.blogspot.de/2015/06/presenting-overlord-my-24-hours-of-pass.html

Reply
- Brent Ozar
  May 17, 2016 6:16 am
  
  Martin – make sure you check the impact of MDW. In every production system where I’ve seen it in place, most of the top 10 most resource-consuming queries have come from MDW itself.
  
  Reply
Mark Freeman
May 18, 2016 10:16 am

I’ve been using Solarwinds Database Performance Analyzer (formerly Confio Ignite) for about two years. It has some good monitoring features. Info at: http://www.solarwinds.com/database-performance-analyzer-sql-server

Reply
Jonathan Shields
May 29, 2016 1:58 pm

You get a situation for instance where an app using Entity Framework is generating some hideous queries. Users complain of db timeouts. Out of desperation and lack of understanding SQL server is rebooted. The first user tries to do their work and lo and behold things are better. Phew well done that guy !

Next day the situation is back to dire again…

Reply
Steve Jones
June 15, 2016 9:25 pm

*cough* Redgate *cough*

We make monitoring and alerting tools as well. I’d like to think we belong in that list 😉

Reply
- Brent Ozar
  June 16, 2016 8:06 am
  
  Steve – so, uh, about that.
  
  After this went live, I got an email from another RG person asking why they weren’t in the list. I’ll forward you the email, and let me know if you think it’s fair. (I don’t wanna air it here.) It’s not that SQL Monitor is a bad product, I just don’t think it’s at the same level as these other tools.
  
  Reply
  - Steve Jones
    June 16, 2016 9:38 am
    
    Thanks, Brent, and fair comments. I do think we do a good job of monitoring and alerting, but in the performance troubleshooting tuning area, I’d agree.
    
    Reply
Wes Crockett
January 12, 2017 5:09 pm

Hey all,

Coming in to this late. I inherited about 10 SQL Server instances at a new gig and they are using Solarwinds. It seems OK… Certainly pretty, but I’m not sure how it stacks up to the other products. Idera looks dated (read: Windows XP visuals) and SUPER deep in terms of data, but is it too complex to digest information quickly?

We are at a point where it is a good time to investigate solutions outside of Solarwinds, or renew our licenses, but I don’t really want to do all three since the amount of time and effort will be huge. Which would you recommend that I put head to head with SW?

Our issues seem to revolve around index fragmentation (easy enough to figure those out…), drive sizing (easy again), and Page Life Expectancy (dropping off the cliff temporarily).

Thank you,
Wes

Reply
- Erik Darling
  January 12, 2017 6:34 pm
  
  Hi Wes,
  
  If you have a monitoring tool that’s warning about index fragmentation, you’ve got a pretty silly monitoring tool. If you’ve got a person who’s worried about it, they should read a few articles around our site.
  
  As far as tools go, have a look at the links in the last paragraph. We’re still pretty keen on those.
  
  Thanks!
  
  Reply
  - Wes Crockett
    January 12, 2017 6:40 pm
    
    Haha. Yeah, I’m not really worried about the fragmentation. It’s spotlight that’s giving off medium level alerts. I’m going to adjust some setting on that.
    
    Thanks for the reply.
    
    Reply
    - Erik Darling
      January 12, 2017 6:43 pm
      
      Probably a good idea! Have a look at SentryOne — I think that’s the only one in the post that you haven’t had tried yet.
      
      If you’re feeling particularly endeavoring, check out Opserver.
      
      Reply