This week, Brent, Tara, and Erik discuss update statistics, how soon it is safe to use a new version of SQL Server, VMs vs physical servers, backups, disabling TCP Chimney Offset in VMware, SQL Server on Linux, performance troubleshooting, most valuable tools to have in your SQL Server DBA toolkit, nolock and read uncommitted, and more.
If you prefer to listen to the audio:
Enjoy the Podcast?
Office Hours Webcast – 2018-11-21
Does UPDATE STATISTICS have a cost?
Erik Darling: [Chitan] asks, “Is update statistics a free tool, or does it have performance impact? How bad is it to run update stats every hour in an OLTP environment?
Tara Kizer: If you think you need to run it hourly, you probably have a parameter sniffing issue. At my last job, I inherited a server that had update stats running every 30 minutes. And I was only there about 18 months before I came here and I never wanted to rock the boat. I mean, it was working fine. I knew that it wasn’t necessary. I normally set up my servers – the ones that I am in control of – I set them up to do, once a day, update stats. And then maybe twice a day if I have a different type of load happening. But usually, I’ll run update stats after the nightly stuff runs, but it was working on that server, so I didn’t want to touch it and the people that set it up were still there and I just didn’t want to rock the boat. But it’s probably completely unnecessary to do it every 30 minutes, every hour. I mean, maybe at the most twice a day. I don’t know, it just sounds like you’re probably dealing with a parameter sniffing issue and you should be troubleshooting that instead of update statistics.
Brent Ozar: The overhead is going to be, of course, it will scan objects, you know, do lots and lots of reads. It’s not doing that many writes, but it’s doing a lot of reads and it will slow queries down while it’s happening.
Erik Darling: The one thing I can think of, if you’re on older versions of SQL Server like pre-2014, sometimes you would hit this nasty thing with ascending keys where if you had fairly large tables and you had, like, let’s just call them identity columns – it could be any column, as long as the value keeps increasing – and you had data coming in at the end and you didn’t have stats updated enough, SQL Server makes some really bad – I want to say like a one row guess for any off-histogram value. And you could have a bunch of stuff in there and SQL Server is like, nope, just the one row, please. You’re like, no, that’s wrong, please don’t guess that.
Brent Ozar: If you want to learn more about that, search for Thomas Kejser; the ascending key problem. And it’s really interesting. Most of us don’t have this problem, but it’s really interesting to read about. It’s like car crashes, you know…
Erik Darling: Erin Stellato has a pretty good post about ascending keys and a couple of the trace flags you can use along with them over on SQL Performance.
Brent Ozar: Okay, cool.
Erik Darling: So good stuff over there too.
When is a new SQL Server version ready?
Erik Darling: So, Steve, I wish there were some way that I could physically touch you, Steve. Steve asks a two-part question. I’m going to read both parts and then I’m going to let them sink in, “With SQL Server coming out fast and furious, how soon or late after a new version comes out do you bring it into your new environment? Also, physical servers or VMs?”
Tara Kizer: Completely different questions.
Brent Ozar: Wow.
Erik Darling: By the by… So let’s start with that first one. How soon after a new version on SQL Server comes out do you think it’s safe to start using it?
Brent Ozar: I’d go like two or three cumulative updates. I’d probably feel comfortable with doing it, but I would only go if it fixes a problem that you have. If there’s a problem you identify and go, 2019’s inlining of functions, got to have that. It’s my biggest performance bottleneck, then I want to jump. But if you don’t see something in the release notes where you go, this solves one of my biggest pain problems, I don’t know that I would jump that fast. I would probably – man, what scares me is you look at – scares is the wrong word. 2017’s cumulative updates, you look at the ones that even come out now in 2018, 2017’s been out for a while, even a dozen cumulative updates in there’s some terrifying stuff inside of there. And it’s not that older versions don’t have problems too, but I want to only go when I know I’m fixing stuff.
Erik Darling: And to be fair, even service packs today for older versions, there’s some pretty terrifying stuff in them too. It’s not like old versions are problem free. Like, these are settled. I had a boss once who was petrified of getting off 2008 R2 RTM and I was like, why? What’s going to be the problem…
Tara Kizer: What? Oh my god…
Erik Darling: He’s like, you know why? Because I know what I’m getting with it. No, you don’t.
Tara Kizer: It’s more risk to stay on it…
Erik Darling: Getting me to work weekends when crap goes wrong, that’s what you know you’re getting.
Brent Ozar: There was a good Twitter thread with Pedro Lopes of Microsoft. Someone was asking – they were like, I don’t really understand why I would upgrade this often and Pedro banged out like four things that are, like, performance, supportability, compliance, and I forget what the fourth one was…
Erik Darling: So does he… So how about that second part, physical versus VMs; what do you opt for? What’s your first choice?
Brent Ozar: Man, I believe that every SQL Server should start as a VM; like a four-core 6GB of RAM VM. And then only add more when you believe you have a performance problem that you can’t fix by changing the queries or indexes. But we’re in kind of a weird business. We’re in this performance tuning business where people come in when they have serious performance problems. Nobody comes in when they’re happy and they’re like, things are great here, we just like hanging out with y’all. So there are a lot of times when we’ll see people on giant metal, 40, 60, 80-core servers. We’re dealing with a couple of 80-core VMs in Amazon the other day.
Erik Darling: The X1E, yeah, 128 cores, 2TB of RAM…
Brent Ozar: And you hit that size and you’re like, why are we not on bare metal, because there’s a couple of advantages you get with bare metal. You get different NUMA handling if the guest and host map up beautifully. You also get the ability to run local solid state for tempdb, which you can do in the cloud with stuff like an X1E. But when you’re running on-premises and VMware and Hyper-V, sysadmins are usually really hesitant to let you do local solid state for tempdb. So people are like, we can put a million dollars in the shared storage and make it perform really fast. Well yeah, I can strap a jet engine to a Honda Civic, but that doesn’t mean it’s a smart investment.
Erik Darling: And you know, for me, it all depends on where that server is coming from. If I have an existing server that’s already a big beast and I need to put it on something else, I don’t want a virtualized big server because you run into a thing where you have this one guest that dominates the host. You can’t have other servers on there with it. You can’t move it to other servers because they have a bunch of other stuff on there. Part of the beauty of VMware is the manageability stuff where you can put different VMs on different hosts, balance things out. As soon as you have that one big boy hanging around, you kind of lose a lot of that. You get that one to one guest host mapping. No one says, that’s my virtualization dream. Plus I hate VMs.
Brent Ozar: It only gets worse in the cloud.
Erik Darling: Alright, so Greg says – and clearly, they have reached a rocket science-y point of tuning this VM – he’s been recommended to disable TCP chimney offset in VMware to improve SQL Server performance.
Tara Kizer: Goodness…
Erik Darling: By god, you’ve done everything else haven’t you?
Brent Ozar: Somebody read a book from 2006. [crosstalk 0:08:33.5] 2006 rings a bell. We did that back in 2006-2006. There was a whole deal with TCP chimney offsets. I would guess that somebody just has a really old best practices. I would be like, if you want to do that, that’s great. Let’s just measure before and after, make sure it makes a difference. And when it doesn’t make any difference, you’ll identify they don’t have pants on.
Erik Darling: Yeah, I’d also press him and say, what metric do you expect to see improve when you make this setting? What’s going to get better? What’s going to get faster? It’s like that disk offset thing in 2003. It’s like, you’re off by 1MB, you bastard, you’ll never work in this industry again…
Who should do the SQL Server backups?
Erik Darling: So Thomas asks, “Have you ever heard of anyone not doing backups in an ERP system and letting another program like ArcServer take the backups?” He says, it sounds like they’re taking a backup of the computer and they’re not able to run CHECKDB on it. So he’s concerned about the recoverability of those backups, the integrity of his data, and wants to know if that’s a good way of doing things.
Tara Kizer: I’m just so confused.
Brent Ozar: I’ve seen people do flat file backups, like Symantec had a backup exec or something a while back that would purport to backup open files; involved VSS snapshots and all that. But the easy way for me is to just ask for a [inaudible]; I want to restore. I want to restore as of this time. You know, whatever, yesterday as of 5 pm, and you just fake it. You go, I dropped a table yesterday at 5pm, can you go ahead and restore the database for me to another server? I’m going to pull the contents across…
Erik Darling: [crosstalk 0:07:51.1] Just say you changed the stored procedure. Don’t say you lost it.
Tara Kizer: That would get escalated quick.
Erik Darling: This guy, Thomas…
Have you used SQL Server on Linux?
Erik Darling: Alright, so Mike asks, “Have you used SQL Server on Linux much? We’re going to start using it soon because our vendor uses Linux for some other apps.” It seems pretty decent to him, but he wants to know if we have any thoughts on it.
Tara Kizer: Have you guys installed it? I don’t think any of us have.
Erik Darling: Ask me if I can install Linux…
Tara Kizer: I know…
Brent Ozar: The release notes were enough that it scared the pants off – I’ve said pants like two questions in a row – scared the pants off of me is what’s not supported…
Tara Kizer: Everything…
Brent Ozar: Holy cow, there’s a lot of stuff that’s not supported. Yeah, I’m going to have to go back down. Replication wasn’t supported…
Erik Darling: Well thank god. That’s a feature, buddy.
Tara Kizer: [crosstalk 0:09:58.0] do an insert, update, delete, and select, you’re good to go. If you need any feature, you’re not getting it.
Brent Ozar: I’m going to have a tough time finding it now on demand, but I would just go through and make sure it actually supports the features that you want. I tried it on Docker containers and there were problems around parallelism, for example. Queries could only use one core. I was like, okay, I’m about enough here. That sounds cool, but I’ll let somebody else dip in.
Erik Darling: That’s a good time to bail out.
Brent Ozar: Yeah, it doesn’t solve a problem I need to fix.-
In which Richie trolls Erik
Erik Darling: Let’s see here. Another Erik – I can’t figure out if this is someone’s last name…
Tara Kizer: I think they just love you…
Erik Darling: Alright, Erik Lover, we’ll see what happens.
Tara Kizer: It’s got the K, I think he loves you.
Erik Darling: Yeah, because a sign of true love is that someone spelled my name right. I know that because my mom doesn’t spell my name right. But Erik Lover has an Availability Group they set up using 2014 and Windows Server 2012. There are three nodes and it’s running as a managed service account on each node. “Can I have Erik’s old desk…” You got me good there. Is that Richie?
Will clearing the plan cache fix my problem?
“We’re experiencing performance issues during a mass run at midnight?” Well, that’s the problem; no one’s there watching the server. “CPU peaks around 98%. The plans in the cache around 40,000. Single plans in the cache is around 18,000. Will clearing the plan cache solve my performance problem?”
Brent Ozar: If queries are already running and they’re using 100% CPU, you’re kind of screwed there. If there were new queries coming in and you had a bad plan, like the queries kept coming in, then you could conceivably clear the plan cache and get new plans coming in. But I wouldn’t do it by clearing the plan cache. I would go look at – if you search for parameter sniffing. Actually, just go to brentozar.com/go/sniff. This would be the first thing that I would suspect it would be. And on that link, we have a video that I did at a SQL Day in Poland explaining how to identify parameter sniffing and why freeing the plan cache works temporarily, but just not long-term.
Erik Darling: You know what – I’d say if you have that many plans coming into the cache, you might want to look into a setting called forced parameterization, because it will make parameter sniffing worse, but it will reduce the number of plans in the cache, so you’ll have fewer of those to look at, which is nice too. There’s all sorts of upsides to that. Because it’s a job security thing, because now you’ll have a parameter sniffing problem to fix, and you’ll have solved the problem of all the plans in the cache so someone will think you’re really, really smart. And you’re smart enough to fix parameter sniffing, so it’s a two-part approach to getting a good raise, I think.
Brent Ozar: So we also have a page on there, so if you search for why multiple plans for one query are bad, it talks a little bit more about how you get into forced parameterization.
Erik Darling: She also adds that page life expectancy is up and down, which yes, it tends to do that. It tends to do a little up-jump-boogie when memory gets different levels of usage.
Brent Ozar: Even when the server is fine it does that. it’s not necessarily an indicator of problems.
A Microsoft presenter recommended sp_BlitzIndex
Erik Darling: Someone sent a bunch of Os, I don’t know what that means. Here’s one from Chris Good, or Chris Bad, maybe, I don’t know, Chris Somewhere In The Middle, “I took a performance tuning course from Microsoft last week…” That’s not good, “The presenter suggested we look into sp_BlitzIndex…” Holy cow…
Tara Kizer: Was it Connor? I think it was Connor.
Erik Darling: It had to have been Connor.
Brent Ozar: Wow, that’s kind of cool.
Erik Darling: Yeah, I want to hang out with that instructor. They know their goods.
Tara Kizer: Yeah, send them some cookies.
Erik Darling: Yeah, right.
Brent Ozar: I really would love to know who that presenter is. That’s amazing. I remember the first time – we used to capture emails every time someone downloaded the First Responder Kit and when I started seeing Microsoft PSS people, like customer support people downloading, I was like, yes, we’ve made it; we’ve made it.
Tara Kizer: That’s awesome.
When does RCSI not get used?
Erik Darling: Alright, let’s see here – sort of a crazy question, “What are the cases when RCSI does not get mapped using SQL Server 2017 Enterprise Edition? Is there any doc or whitepaper?” I’m not sure what you mean by mapped.
Tara Kizer: Well because, you know, my client last week, I think it was – all the days are running together now, but I think it was last week’s client – they had RCSI and we still saw blocking and the select query that was causing the blocking was read committed. It hit none of the limitations. It had none of the isolation level changes, table hints. I mean, none of us could figure out what was going on there. So I don’t know if there’s ever a time where the mapping – you know, map it from read committed over to RCSI – we could find nothing that was escalating above read committed to cause what we were seeing.
Brent Ozar: And there are lists of things that will stop at select. If you put index hints or lock hints in your code, life if you say with serializable, you say with NOLOCK.
Tara Kizer: And I think that client was on SQL 2017 Enterprise Edition. I’m pretty sure that was the client. I know I had a recent 2017 client. So I wonder if there’s a change in 2017, you know?
Brent Ozar: Yeah, I would open a support case just because there was the 2016 service pack – I want to say it was SP1 CU4 – that broke NOLOCK. It suddenly started taking out locks, so it is possible. It’s happened before.
What are the downsides of indirect checkpoints?
Erik Darling: Alright, so let’s see, Rakesh asks, “What are the downsides of indirect checkpoint? We migrated to SQL Server 2016 and the new databases are using it but the migrated databases are using automatic checkpoints.”
Tara Kizer: What the heck are indirect checkpoints? When you’re actually issuing a checkpoint command?
Brent Ozar No, that was – I remember hearing about this from like Conor or something.
Erik Darling: Yeah, I don’t remember anything about it because it was one of those things, like yeah, okay, when will that come in handy?
Brent Ozar: I would honestly just Google. We don’t touch it either. We haven’t heard anything good, we haven’t heard anything bad. It’s just one of those. We all make the same gas-face as we’re reading that…
Tara Kizer: Like why would you ever change it?
We’re thinking about enabling Query Store…
Erik Darling: Well in the meantime, Zoe-Ann asks, “We have over 20 databases…” Holy cow, someone went on a spree, “We’re on SQL Server 2017 and we want to turn on Query Store. We understand that it should generate about 5% overhead. Are there any suggestions you have related to Query Store settings or any overhead that you’ve seen?”
Tara Kizer: I’ll tell you a little story about a recent client [crosstalk 0:16:59.2]. It might even be the same one, I don’t know. It was 2017 and they enabled it and the retention that they set up in the Query Store was one year and they had a failover occur through Availability Groups due to patching or whatever it was. And they were down for several hours. And it had to do with Query Store, because they opened up a case with Microsoft and they needed to put a trace flag in place. In the end, the whole issue was because of the many plans for one query. They had ad hoc queries; thousands of plans per query. And so they have, I don’t know, 100,000 plans hourly and they’re distinct plans, so all of that’s getting dumped into Query Store. So Query Store, I mean, tons of overhead because of their environment, so make sure you don’t have this many plans for one query issue, and primarily ad hoc plans.
Erik Darling: So for me, there’s a setting in Query Store, which I actually just wrote a blog post about that hasn’t gotten pushed out yet. But the setting is in the plan collection portion. And it lets you choose – there’s a setting that says collection something, and you can choose all or you can choose another setting called auto.
Tara Kizer: Auto, that’s right.
Erik Darling: And when you choose the auto setting – auto, not otto – when you choose the auto setting, what happens is SQL Server doesn’t go and collect every single plan that comes in. It only collects ones that meet some certain threshold, like capture mode. So if you change that to auto, then you’ll only get plans that use a certain amount of compile time, a certain amount of CPU, have a certain amount of executions. Before that, it just doesn’t’ stick any old thing in there. Whenever you turn on Query Store – I was experimenting at trying to write sp_BlitzQueryStore – I’d always find these ridiculous Microsoft queries in there. I’m like, why are you showing me this? This is not anything that I would ever want to look at in Query Store. Show me my bad [crosstalk 0:18:55.8]. Pretty crazy.
Brent Ozar: Yeah, so Erin Stellato also has a post – she just happened to publish her most recent one – about the overhead of Query Store. The other thing that I would say too is make sure you’re on the most recent service pack and cumulative update because man, this is one of those features that has had a lot of patches pushed to it.
What are the most valuable SQL Server tools?
Erik Darling: Alright, we have a question from Brandon who is a brand new DBA. He’s eight months into his gig and he wants to know what tools you think are the most valuable to have in your SQL toolkit for working with SQL Server and doing…
Tara Kizer: Has he heard of us before?
Erik Darling: Interesting question.
Tara Kizer: We have tools.
Brent Ozar: Ah yes so we needed those tools desperately. Every not and then, we talk about [inaudible 0:19:50.5] suck if we didn’t have them. If you click on scripts up at the top of our site, we have a whole barrage of those things. You can download them all in one chunk. If you made us each pick one, I think my favorite one would probably be sp_BlitzCache, just to rapidly point out what the problems are. That would probably be my favorite if you only let me live with one for the rest of my life. What about you, Tara? Which one would be your favorite?
Tara Kizer: Well, does it have to be one of our scripts?
Brent Ozar: No…
Tara Kizer: I think sp_whoisactive, because I can do a lot with that.
Brent Ozar: And so that’s over at whoisactive.com. And this is one of those things where there’s a download, you go install it and you run it and you think you know everything. Oh no, no, no. Go into the documentation and it’s insane. It does so much good stuff.
Tara Kizer: If it has to be one of ours it’d be, I think, BlitzFirst because normally, as a production DBA, I am troubleshooting an active performance issue and so BlitzFirst, you know, can really lend a hand to figure out what’s going on.
Brent Ozar: Erik, how about you?
Erik Darling: For me, you know, I’m just going to give a shout-out to a different group of people; dbatools. I don’t work with PowerShell. I thank god every day that I don’t have to work with PowerShell because they’ve written things that do things in PowerShell. You can do – we’ve talked about stuff that helps you troubleshoot performance and query plans and indexes and server liability. This is like bread and butter DBA tasks; moving databases, moving users, setting stuff up, stuff that I don’t ever never in my entire rest of my life want to do again. So thank goodness these people, A, work with a language that I just don’t get along with – I am incongruous to this language – and they have written tools that I don’t ever want to have to remember syntax for, look up, think about. So I think they’re a pretty cool bunch even though me and PowerShell have our disagreements.
Brent Ozar: Somebody said the other week that they were like, hey it’s been a while since Erik’s done a live blog of trying to use PowerShell. We kind of miss that. We should do that again.
Erik Darling: No one said that, did they?
Brent Ozar: They did, they did. It’s funny.
What’s better: NOLOCK or READ UNCOMMITTED?
Erik Darling: Alright, [Chetan 0:21:57.5] asks another question, “It might be a bit of a beginner question, but which one is better for performance; NOLOCK or set transaction isolation level read uncommitted?” Take it away, Tara.
Tara Kizer: I was going to reply just neither, you know, end of story. It is not the turbo button. They’re identical, you know. One of them is just put it at the top of your script and the whole thing is now read uncommitted, the other one’s at each table level. They’re identical. NOLOCK is read uncommitted just at the table level. Don’t use it. I mean, if you think that this is a performance improvement, you know, you need to figure out what the root cause of the blocking or whatever contention issue you’re having and fix that. Read uncommitted and NOLOCK is not safe for production systems where data accuracy matters. I had a client a few weeks ago that – the day three call where we deliver the findings – there were some people in management in the room – this is actually during day one – and everything said NOLOCK. And so I asked the question, does data accuracy matter on this system? And the management people are like, hell yeah it matters. And I was like, well you can’t do this then. This can give you completely erroneous data. It is not the turbo boost. Use RCSI instead. It’s safer. If you have to go the isolation level route…
Erik Darling: The thing about NOLOCK or read uncommitted, no matter which one you choose, is everyone thinks that it means that your query isn’t going to take locks out. What it really means is that your query isn’t going to respect locks taken by other queries that are trying to modify data. And you can read all sorts of just awful nasty stuff that Brent’s going to show you with this lovely typing demo.
Tara Kizer: Is he going to be able to do it fast enough?
Brent Ozar: I think so. So over on the left-hand side, I’m going to run an update. Over on the right-hand side, I’m going to say select count star from DBO.users with NOLOCK. And so I’m going to, on the right-hand side, run a count at the same time on the left-hand side I am updating the table. Notice that I am not, over on the left-hand side, changing the numbers of rows in the table. It’s the same number of rows in the table; I’m just changing the contents of one of the columns. So every time I run a select count star, you would think I would get exactly the same rows. So here’s a quick rundown. I’m going to make sure I don’t have any indexes too because I’ve been playing in this database all day. So count star from DBO.users – so there’s about two and a half million rows in here. If I run it 10 times…
Erik Darling: Not 24 million…
Brent Ozar: Not 24, just two and a half; small number. Now over on the left-hand side, I’m going to do the update. At the same time on the right-hand side, I’m going to go run the select star and my numbers are all over the place. That’s kind of messed up because I’m not even, in this case, going to commit. I’m going to stop while the update’s running, and yet still the count star just sees different numbers of rows. So that’s pretty hairy.
Erik Darling: Crazy, yeah, don’t like that one bit.
Tara Kizer: Fix your queries, add indexes, you know, fix your table design, maybe use RCSI if your system can support it. In my opinion, NOLOCK read uncommitted should not be used except as a DBA or developer querying production data in Management Studio. Then I use NOLOCK because I don’t really care about the results.
Erik Darling: Only use NOLOCK in incognito mode, or else. People will see and you’ll get really embarrassed.
Brent Ozar: When you really don’t care about the quality of data – like if you’re running Twitter or Facebook and you’re just showing people cat pictures, then sure, but how often do those people put that in SQL Server? You don’t put stuff like that in SQL Server. Alright, well thanks, everybody for hanging out with us this week at Office Hours and we will see y’all next week; adios.
Erik Darling: Goodbye.