This week, Tara and Richie discuss offline file database and log backup solutions, VSS database backups, monitoring products, parameterization, page latch waits, SQL ConstantCare™, and Availability Groups on 2017 Standard Edition.
If you prefer to listen to the audio:
Enjoy the Podcast?
Office Hours 11-7-2018
Should I use VSS snapshot backups?
Tara Kizer: Alright, Gary has a question, “What are your thoughts on offline file database and log backup solutions versus VSS snapshot backups? We’ve used Redgate for years. It works great and it’s my preference. But our company switched to StorageCraft VSS snapshot backups based on our MSD’s recommendation. VSS solutions seem to be difficult to manage with the mounting of the drives to get to the MDF and LDF file.”
I am not familiar with StorageCraft. I don’t care how your full backups are happening, especially on larger databases. It’s very challenging to do SQL Native backups or Redgate backups, or any of those types of software, because of how long they take. So I do like snapshots, like a VM snapshot or a SAN snapshot. I don’t know if StorageCraft is doing that. but you have to ensure that it’s freezing the I/O for your database files so that your data files and log files are at the same point in time, you know, it’s quiescing them. And so you’ll see those messages in the SQL Server error log. SQL Server will notice I/O is being frozen. Then you’ll see resume messages. And that’s what you want to see if these are good backups. Otherwise, they’re – I forget what they’re called but basically dirty backups that you may not be able to recover from. So you have to freeze I/O and resume it with the solution that you picked. I just forget what that’s called.
Log backups, I don’t know. If I’m a DBA – I don’t know if Gary’s a DBA or a sysadmin, but if I’m a DBA, I’m going to have control over those differentials and log backups – log backups at least. I need to be able to restore quickly. That’s one of my jobs in the case of if an issue happens. Sysadmins I can see wanting to not use SQL Server technology, so I don’t know. It just depends on the situation. But the full backups, I can see why people need to use other solutions if you’ve got large databases.
A database mail job has been rolling back for days…
Tara Kizer: Alright, Christopher asks, “We have a 2012 Availability Group and a database mail job was run and took over 24 hours before someone killed it. It’s been in rollback since Tuesday with wait type blah, blah, blah… According to our database tools, which is SolarWinds, it doesn’t seem to be locking or blocking.”
I’ve never seen this situation. I would recommend that you post the question out on Stack Exchange, dba.stackexchange.com – and someone may have encountered the issue. It doesn’t seem to be locking or blocking. If it’s not causing a problem, just leave it there. Are you able to still use database mail for whatever you’re using it for? And just wait for a restart to occur down the road, and hopefully you’re doing patching every now and then so the restarts aren’t that far off.
Which recovery model should I use?
Tara Kizer: Alright, is this a different Gary? Same Gary, alright, “Secondary question regarding VSS database backups – should the database be in full or simple recovery model…” Gary, I love it, you said recovery model instead of recovery mode, “Since there’s no log backups occurring? I assume the point in time recover comes from mounting the drive…”
So, if you are using full or bulk log recover models, you are required to do log backups. And if nobody’s doing that, you’re going to, eventually, blow out – the transaction log is going to grow and grow and grow and fill up that drive. It just depends if you need point in time recovery or not. Is it okay if you restore to the last full backup or differential? If it is, then go ahead and use simple recovery model. If it’s not okay, you must use full or bulk log, and I’ve never even used bulk log because point in time recovery is so critical and you lose recovery points with bulk log. So full, and then do log backups per your RPO level, your data-loss. So it’s very common these days to see log backup jobs running every five minutes. Sometimes they’re even every minute. You must add log backups through whatever tool you want to use for that.
Can you explain the offside rule in soccer?
Tara Kizer: Joseph says, “Can you explain the offside rule in soccer?”
Richie Rump: Yeah, actually I could. I could really do that.
Tara Kizer: Even our kids could.
Richie Rump: Since Brent’s not here maybe I should probably do it. It’s really simple. The offensive player, at the point of a pass, needs to be equal to or in front of the last two defenders. That’s it.
Tara Kizer: Last two defenders or last defender?
Richie Rump: Last two defenders, because the goalie’s included in that and sometimes the goalie is up. So it’s got to be the last two defenders.
Tara Kizer: I was like, wait a minute, does Florida do it different?
Richie Rump: No, and that’s why I say the last two defenders. It’s usually just the one because the goalie’s behind, but if the goalie has come up for some reason, then it’s the – so the rules say the last two defenders. There you go.
Which monitoring tool do you prefer?
Tara Kizer: Alright, Joseph asks, “What’s your latest preference on a monitoring product? Different products seem to do different things better than others. What do you like currently and why?”
A lot of the monitoring tools are really great. They’re expensive for a reason, and they just leapfrog each other as far as features. One product will have a new release and add a new feature, and then another competing product will get that feature, plus they’ll have other new features. So they leapfrog each other. I’ve used Sentry One’s Performance Advisor and Quest’s Spotlight and Foglight tools in the past. We’ve got clients using IDERA Diagnostics Manager and they seem to be happy. Some people are using SolarWinds. That doesn’t seem to provide the level of features of those other three companies that I mentioned. Some people are using New Relic. I’d say it’s all over the place and we don’t really say, here’s the products we recommend. You guys see which product provides you the monitoring for that system.
Richie Rump: Yeah, and the tools we provide are not monitoring either, so just to throw that out there.
Should I use forced parameterization or optimize for ad hoc?
Tara Kizer: Ron asks a good question, “What are your thoughts on parameterization simple force and optimize for ad hoc workloads?” So I like optimize for ad how workloads to be enabled just as a best practice. Just get that out of the way. There’s only like one edge case where it may not work for you, and that’s if you always had two execution plans. So I enable it, basically, anytime I install SQL Server on a production box. The parameterization I like also, but there’s a downside to it.
So let me describe what forced parameterization is. So, the default for databases is simple parameterization. Any ad hoc queries that are coming in, you’re going to get ad hoc query plans and very little plan reuse of those for-ad hoc queries. They’d have to be an exact match, even for the values being passed to your search predicates; your where clause join conditions. So where column one equals 10, where column one equals two, those would generate a different execution plan if they were ad hoc.
Now, forced parameterization was introduced, I think, SQL 2012, maybe 2008. It’s not that old of a feature. You can enable that at the database level for SQL Server to convert your ad hoc queries to be parameterized. But there are some limitations to it. There’s an MSDN article that you can read about the limitations, but there’s a downside to it. And the downside is, now those ad hoc queries are vulnerable to parameter sniffing, which they were not before because it was optimizing each time, optimizing and compiling an execution plan each time it needed to create a new query plan. But now, you’re going to be reusing query plans, just like you would with parameterized queries, such a stored procedures, prepared statements, things like that, so that they are now vulnerable to parameter sniffing.
I’ve had a client who had over 90% CPU utilization basically at all times during their busy hours, and they had ad hoc query plan issues. I mean, just thousands and thousands and thousands of ad hoc query plans. They had high compilations per second and we enabled forced parameterization on the fly during the day, which we usually don’t do as part of our critical care. Very occasionally we’ll make a change because this one is low risk. And immediately, their CPU utilization dropped from 90% down to 60%. Now, there were other changes that they could make to the system, but that gave them some breathing room. They no longer had the extreme CPU pressure.
How do I fix PAGEIOLATCH waits?
Tara Kizer: Alright, Daryl says, “According to sp_BlitzFirst, our server is experiencing page I/O latch waits due to slow reads. If our VMware host and SSD SAN is not reporting any problems, what else should I be looking at?” Based upon my experience, the problem is not always at the disk level or somewhere in between. It can be on the Windows box itself. And so one particular instance where we had severe I/O issues, I opened a case with Microsoft, the SAN team opened up a case with their vendor, and we even had another vendor; I think HP was also involved. Microsoft had us update the I/O subsystem on Windows, the firmware drivers and bios and [store port] file. Anything I/O related on the Windows box itself got updated and that resolved the performance problem.
Microsoft, if you open a case with them, that may be where they tell you to start. Now, this was about 10 years ago, the support case with them. But you may want to start there because it can be an easy win, and that may be what Microsoft tells you to start with first. They can run tools to help you diagnose where the problem is. I don’t know if they’ve renamed the tool, but there used to be a tool called ETW Trace that you can run with the help of them to analyze that data. And it will tell you, is the I/O problem on the server itself or external to the server. And SSD SAN, you know, that doesn’t give you the best performance out of SSDs, you know, there’s bottlenecks there. And you are asking about a VMware virtualized system, so you update the I/O subsystem on the guest, the virtual machine itself, and the host. There’s two layers there, so making sure everything’s up to date, firmware, drivers, and bios. I’m actually working with a client this week that is experiencing that same alert in sp_BlitzFirst and I’m telling them to start with the I/O subsystem on the Windows box and the host server.
Tara Kizer: Steve says, “Richie, thanks for the Python recommendation a few weeks ago, it’s great.” Alright.
Richie Rump: He’ll love it.
How is SQL ConstantCare licensed?
Tara Kizer: Christopher asks, “Is SQL ConstantCare licensed per user or per SQL Server? I’ve several SQL Servers and only one of me.” I’ll let Richie answer that. Constant care is his baby.
Richie Rump: Per user – we get lots of servers from people and they send in all their stuff and we send them all recommendations every day.
Tara Kizer: And I think that Brent hasn’t set a limit on the number of servers yet, but he may in the future.
Richie Rump: I don’t think so yet. He does a lot of the business stuff and he says, you go make this work, and then I just go off and…
Tara Kizer: Exactly, he does that type of stuff, or makes those rules.
Richie Rump: Yeah, and then I’m like, I don’t know, wait, yeah, I could do that, yeah. A week or two weeks later, it comes back.
Tara Kizer: Yeah, and I mean, if there’s no limit, if you’ve got a lot of SQL Servers, sign up. It’s a very cost effective solution to help you with your environment. When I worked at Qualcomm many years ago, not that many years ago, but we had 700 SQL Servers that we supported. There were several DBAs to support that, but I could have signed up for an account to ConstantCare and got mentoring advice on all 700 servers. Not that I’d want to…
Richie Rump: 700, [crosstalk] conversation there…
Tara Kizer: Well, that’s when Brent was like, maybe we will restrict it…
Richie Rump: Yeah, I think at that point, we’re like, well…
Tara Kizer: Yeah, I think … I don’t think anything that high…
Richie Rump: No, no, no.
Are Basic Availability Groups worth it?
Tara Kizer: Gary says, “Availability Groups on 2017 Standard Edition seems like a pain. Given the offline status of the secondaries and the one database per group limitation, is it worth it?” I mean, it’s worth it – so we’re talking about basic Availability Groups, since this is Standard Edition. Sure, there’s some pains, but what’s your alternative for whatever you’re using the Availability Group for? So it’s a solution for HA or disaster recovery – both of those are available with basic Availability Groups, whereas database mirroring in order to get the DR support with an async mirror, that was an Enterprise Edition feature.
Basic Availability Groups gives you synchronous and asynchronous. So it is a solution for HA and DR. it is one database per group and you don’t get to use a readable secondary, but hardly anybody, out of the people who are using Availability Groups, regardless if it’s Standard or Enterprise, most people are not using the readable secondary replica, at least from what we’ve seen with our clients. Some people are, for sure, but not a lot. Most people are using it for HADR.
Alright, that’s all of the questions, so we’re going to end here. If you had any questions and just didn’t realize that we would run out of your questions, next week, get your questions in earlier. So we are out of questions. Have a good one.
Richie Rump: And it’s lunchtime.