This week, Erik and Richie discuss performance tuning, relativity, Always On Availability Groups, parameter sniffing, technical debt, unit testing, deadlocks, testing storage performance, and more…
Enjoy the Podcast?
Office Hours Webcast – 2017-07-19
Erik Darling: James says, “Thank you for your DBA fundamentals videos.” You’re welcome, James. It’s a pleasure recording them. I have to re-record one on setting up Ola Hallengren’s scripts this week, so there’ll be another one coming down the pipe in the near future.
Can I trust the index usage DMVs?
Erik Darling: [Tashea] wants to know, “Can I trust sys.dmdb_index_usage_stats to drop indexes which are not used? What else resets its readings other than a SQL restart?” Well, there are a whole bunch of wonky things that happened in certain versions of SQL Server 2012 and 2014 where the behavior is different between service packs and CUs. There are certain ones, like I think almost all of 2012 until SP3 and probably 2014 up until SP1, where if you rebuild an index then that will clear out all the usage stats. And up until now, on many versions and editions, if you add an index or drop an index, it will also reset some things. So be very careful with that one, especially if because even if your system has been up for a long time, you may have rebuilt indexes, you may have added or dropped indexes, and there also might be really important indexes sitting around that just don’t get used frequently.
So one thing that I always caution people about is, you know, always remember to disable your indexes, don’t drop them. Because if some query comes along that runs, you know, for the super important guy who sits in an office that runs every three months or six months at some quarterly or whatever financial report or something, and there’s a very important index for the query that you just disables because it had one use in the last three months; well, you could be in trouble. So make sure you don’t drop indexes, make sure that any indexes you choose you just disable them.
One really good way to trend that kind of usage is by using sp_BlitzIndex. It’s a free script that we write. It’s available over at FirstResponderKit.org. If you head over there, you can download it for free and run it for free on as many servers as you want, for free. And it will give you the usage stats and you can copy out to an Excel file and kind of keep track of usage over time. You can also persist it to a table, if you’re that kind of person… I don’t know, maybe you are – maybe you’re not. Anyway, I hope that answers your question.
Richie Rump: Yeah, we call those people hoarders. Or Hordor, I don’t recall…
Erik Darling: I don’t know, Mordor hoarders, The Hoarders of Mordor? That would be a good show, The Hoarders of Mordor. Just Orcs with like a pile of bones, like this is mine…
Richie Rump: Just this random weapon and armor, just throw it all over the place…
Erik Darling: It’s like my office, it’s just garbage everywhere.
Richie Rump: Yeah, it’s just body parts, you know, maybe they categorize them, you know, and they’re just in different places.
Erik Darling: Yeah right, it’d be weird.
Why is CXPACKET my top wait percentage?
Erik Darling: James asks, or says, rather – says then asks, “My prod server has 12CPUs and I have my MAXDOP set to eight and cost threshold set to 65, however, I still see CXPACKET has 49% of the overall wait stats on my server; why?” James, you should know better than to tune wait stats by percentage. You’ve been to Brent’s classes, you know these things. You don’t tune things by what percentage they are. You should revisit some of Brent’s lessons on that. For the benefit of the group though, it is absolutely natural to have parallelism on your server, unless you’re running SharePoint or something else that forces you to set MAXDOP to one.
Trying to stamp out all evidence or all percentages of parallelism is not a terribly good idea. You still have parallel queries on your server because you still have queries where the serial plan has a cost of greater than 65, and those queries go parallel because the parallel plan was cheaper. That’s why you still have CXPACKET waits. Who cares if it’s 49% of the waits on your server? It all depends on what the total waits are and other fun things like that.
Should I use RCSI for kCura Relativity?
Erik Darling: Let’s see here, Jack asks, “Do you have clients successfully using RCSI with Relativity or Tunnel Vision?” Well I’ll be dead honest with you, I’ve never heard of Tunnel Vision. For Relativity though, no, and the reason for that is because every single Relativity query, behind the scenes, is generated with NOLOCK hints. And when you have query level hints like that, they override database level settings like snapshot or recommitted snapshot isolation. So it wouldn’t do any good to turn those on and overwhelm tempdb with all the modifications that come in only to have all your read queries use NOLOCK anyway. You would really need to do a lot of app redesign to benefit from having NOLOCK on there.
A long time ago, when I was a DBA at a Discovery shop, I opened up a [table] with Relativity and I was like, “Hey, how come you guys don’t use RCSI instead of just NOLOCK hints?” And at the time when the app was designed, and it’s still true to this day, many shops that run Relativity do not have a DBA or a DBA team or anyone really paying close or good attention to those servers in any way. If you just – if you install SQL Server and click next a whole bunch of times on 202, 2008, 2008 R2, all the older versions that were around when Relativity was first getting started, you didn’t have all those cool options and features kick in that you do with 2016 and up. So you wouldn’t have the installer going and creating multiple tempdb data files for you, you didn’t have anyone adhering to server best practices by not putting tempdb on the C drive. Just, you know, regular DBA stuff.
So, just the amount of work that has to go into supporting something like an optimistic isolation level is not trivial to a shop that does not have the correct staff for it. So, no is the answer and that’s the reason why; fun stuff, right? Fun stuff right, Richie?
Richie Rump: If you say so, sir…
Erik Darling: Are you enthralled yet? Are you riveted?
Richie Rump: I actually – someone put a link out for the collector’s edition of Star Trek Adventures the RPG and it’s like $500. It’s amazing but it’s like $500, and no…
Erik Darling: Well, I don’t know [crosstalk]
Richie Rump: Can I expense it?
Erik Darling: If it’s office equipment, I don’t see why not. I mean, what’s left on your hardware budget? You might as well…
Richie Rump: Not much…
Erik Darling: I got like 500 bucks left, I’m thinking about getting one of those desk-risers so that I can have like a fake stand desk, but I just don’t know how that’s going to work with the wires. I’m kind of terrified of it. I’m like oh if I hit the button everything’s just going to tip over and unplug and it’s going to be…
Richie Rump: Well what you could do is, you could get a mount for your Hoss and mount it underneath the tabletop.
Erik Darling: For my what?
Richie Rump: For your Hossmachine, and like mount the machine underneath the desk, and that way you could raise it and lower it.
Erik Darling: I think I’m just going to get real strong and lift the desk up when I’m ready to stand.
Richie Rump: That’s too close to a table flip, man. You’re just going to be like… I just finished a job.
If a sync AG replica goes down, does the AG switch to async or go down?
Erik Darling: Steven asks a question about availability groups, which I will answer to the worst of my ability. “If Always On sync goes down, does it automatically change to async and just start filling the log, or does it go down?” There are a lot of weird things that happen when an availability group goes down that are different between 2012, 2014 and 2016. So in 2012, it’s like if one goes down, the whole thing goes down if quorum gets messed up.
On other versions, it could stay online, but either way, I think what you’re asking is if it stays online and it stays in good shape. But the answer is kind of no; you want to get things up and running as soon as possible. You also want to make sure that log backups, if it does failover – if it does failover, you want to make sure you get log backups going on the new primary. If the primary just goes down and you’re working off replicas, well you should work on getting that primary back up or making one of the replicas the new primary so that your AG is complete. That’s all I have to say about that. If Tara were here, she’d probably have a better answer. That’s my patchwork answer.
Richie Rump: Definitely more complete, sir.
Erik Darling: I don’t know what that means.
Richie Rump: I don’t know either, but it’s definitely better than my answer.
What is parameter sniffing?
Erik Darling: “What is parameter sniffing and how does it work?” God, well that’s the funny thing, is that it doesn’t if it’s parameter sniffing.
Richie Rump: No, it works then it breaks, right?
Erik Darling: Parameter sniffing is when you have some code that accepts parameters. And on first compile, that code executes and compiles with those parameters. And then on subsequent executions, that code repeats as if it’s received the same parameters, even if the parameters are different. So, that is parameter sniffing and that’s how it works. If you have a stored procedure, and let’s say it takes a date – let’s say it takes a start date and you say – let’s just say it takes a date for an equality. And you run your stored procedure and it brings back everything in the table that is equal to that date, and that date was 2017/7/19, right; so that’s today.
And then the next time that stored procedure runs, it runs and it runs as if – with that date you pass in as 2017/7/18, so that was yesterday, and all of a sudden you have to bring back a lot more rows because you had a lot more time to insert data yesterday. Let’s say you had a million rows on the 18th and you only have about 20 rows on the 19th. So that stored procedure that compiled and ran expecting to get 20 values back, all of a sudden, has to go and pull a million values back. And when it pulls that million values back, the execution plan that it uses to do that may be inadequate because a million is a lot more than 20. So there may be a case where a different execution plan would have been more appropriate for a different value. Say for a small amount of rows, you do a key lookup, you use a non-clustered index, you do a key lookup, you get some columns back from the clustered index. For the million rows, it just may make more sense to scan the whole clustered index and get the rows back that way. So it’s all out there in fun land – because it happens every single week because we talk about it every single week, I’m going to stick the link up there for it, Erland Sommarskog. So the man himself, or his self, depending on how you like to talk, has a blog post called “Slow in the Application, Fast in SSMS.” And what it does is attempt to explain to you parameter sniffing in biblical blog post fashion. If you are really interested and that is not where your question ends, I would read that entire blog post and see where it gets you. If you have more specific questions, come back and ask.
Richie Rump: So, if I did run into some parameter sniffing, what type of things would I do to fix it? See, this is what they call, in the business, a softball. Hit the softball.
Erik Darling: Okay, well if I was really crappy at my job, I’d restart SQL Server. If I was a little bit less crappy at my job, I would probably clear the entire plan cache. If I was slightly less crappy than that, I would probably clear that plan out of the plan cache. If I was a little bit less crappy, I might recompile, or throw a recompile hint on the stored procedure or the statement in the stored procedure that’s causing my parameter sniffing. If I was much less crappy, I might take a look at my indexes and try to figure out what a good index is to take bad choices away from the optimizer, regardless of the value that gets passed in. So, perhaps making that nonclustered index covering, or rearranging the key values in a non-clustered index to make it more aptly used, no matter what.
There are all sorts of things that you can do down the line including rewriting the query to need less columns to begin with and do all that sort of fun stuff.
Richie Rump: Yeah, what I found with parameter sniffing is sometimes the answer is a recompile, maybe it just doesn’t run frequently enough and, you know, you throw a recompile on it and it’s just not going to tax the system and you’re good to go. Sometimes it is the index, I mean, you just – when doing tuning, I’ve always gone, what’s my output goal? Right, you could get lost in tuning land and just get – oh I need to get it further down, further down, further down, but what’s your exit criteria? So you say, hey, the exit criteria, for me, is that it’s got to run another second and it doesn’t raise these metrics up this much. And when you hit that, you leave and you go onto the next problem.
Erik Darling: That sounds like a good plan to me.
Can I set isolation level in my connection string?
Erik Darling: So there is an Entity Framework question that I think you should take while I get a link for Kevin.
Richie Rump: Yes, so the question is, “Hi…” [crosstalk]
Erik Darling: It’s hi with an ellipsis afterwards, so it’s more like hi…
Richie Rump: “We have an Entity Framework application running on SQL Server 2014 Enterprise. I wanted to know if it’s possible to set the transaction isolation level at the connection string level.” So, I don’t believe so. I don’t believe that is the case. I’ve never tried it; I don’t recall ever seeing any parameters in the connection string for setting the isolation level. I’m pretty sure on EF6, the default isolation level is read committed snapshot; don’t quote me on that. I haven’t done much with transactions in Entity Framework proper. A lot of transactions that I’ve done in my career have been in queries themselves, and of the times I have used transactions, it’s been in code in Entity Framework. I don’t believe that is the case… [crosstalk]
Erik Darling: I think I’ve seen people use, specifically for snapshot isolation, a separate connection pool that is set up to use that isolation level rather than go in with a default…
Richie Rump: Yeah…
Erik Darling: But I don’t know how to do that because I’m dumb at Entity Framework.
Richie Rump: Yeah, and you could totally do that. Say hey, for these queries, go and run this isolation level, right, but you need to know what those queries are. It’s not going to be a [crosstalk]… Now I could totally be wrong, got to Stack Overflow, ask the question there, that’s probably a better place for it. You could go to, you know, #EFhelp. I think Julie Lerman monitors that one. So as far as I recall, I don’t think you can, and if you need that kind of stuff, you need to change code. So there is not an easy, oh just let me change a connection string and not the application, I’m pretty sure you’re going to need to change the application.
Erik Darling: Or just throw it out and start a new one; that’s what I’d do.
Richie Rump: Listen, Entity Framework, it’s not broken, the way we use it as developers could be broken.
Erik Darling: Is usually horribly broken…
Richie Rump: Yes, and it’s just – there’s nothing different than what we see with SQL Server. We see all these bad implementations and we see all the breaking of best practices and we see all this crazy stuff going on, especially here at Brent Ozar Unlimited, because we see all these different things. And maybe you’ve actually walked into a new organization and you say, why would you do this and why is your backups here and where are your logs? Well it’s the same thing in developer land, you know, we just don’t understand our tools and we just start throwing things out there – and, hey look it works on my machine, it must work out there in production.
Erik Darling: Well the worst part about it is the overwhelming pressure to ship often gets in the way of doing things correctly, designing things correctly, you know, making sure that you have the right query, saying, okay well maybe this isn’t a good idea to run this 10,000 times a minute, perhaps we should wait three or four weeks before we ship this and invest in some sort of caching layer for this query instead. But that sort of stuff all gets shoveled by the wayside because you have some jerk sitting up top who wants you to ship, ship, ship to make someone happy.
Richie Rump: Let’s get it out, we’ll mark it as technical debt, we’ll deal with that down the road, just go, go, go. And the problem may be bigger than you think because, well, we don’t think very hard on the problem and what could happen down the road. So we don’t take a look at the risks and do a risk assessment and say, hey if we don’t do this it’s going to cost us this much money down the road because we don’t equivalate risk to dollars. And that’s, you know, a lot of problems with projects that I’ve seen is that no risk assessment to dollars; but it’s hard to do, frankly.
Erik Darling: The thing about technical debt is, it is a lot like regular debt and everyone wants to spend more instead.
Richie Rump: You know, the other thing about technical debt is that it’s very hard to quantify, right. I can’t put a dollar amount to it because it’s all – ooh it’s technical debt, you know, and how much will that cost to fix and how much would that cost to throw it down the path; nobody does that. Nobody says, oh if we make this technical decision, how much will that cost us in the long run? Because we don’t equate dollars to technology outside of, I need to purchase something.
Erik Darling: It’s like, you know, how much hardware can I mortgage to pay off my technical debt? But if I buy 512 more GB of RAM, the problem goes away. Well alright, guess I’m going to buy…
Richie Rump: You know – and from a developer perspective, there are certain things that we could do to, kind of, alleviate some of that, right. I write unit tests, okay, if I change something it will let me know if that is, if I have the right test and all that other good stuff, but even writing something simple as, you know, in 2017, as writing unit tests, that gets kicked on the curb too as, oh we’ll just do that later. And well, now we’ve got 15 people testing this thing, you know, and we make one change and everything breaks, you know. Well, that’s what you’re going to get.
Erik Darling: Yeah, well I guess for me, the thing is, it’s like okay, you want to write all these unit tests, but man, with a database where you have to have so many things to come up with a condition, or to reenact a condition, to make sure – it’s tough for some things. There are some really hard conditions to replicate. I’m not saying all of them and I’m not saying there’s not some laziness behind it, but there’s this perfect storm of, okay how do I replicate this particularly low memory condition with these parallel queries doing this? And figuring out how to write a single repeatable test for that once just to unit test this one thing over. It’s like, man…
Richie Rump: Yeah, unit testing queries is exponentially harder than unit testing code. It’s because we’re dealing with data, now I got to worry about setting up my data, I got to worry about running and I got to worry about what my result is. And doing all that is so much harder than saying, I’m just going to go ahead and fake this and mock this and do this and now I’ve got a result. When I’m writing unit tests, and frankly I’ve never done it with queries because it is so stupid hard [crosstalk]… I was on one project where I did it, and we actually had to write our own unit test, kind of, module to do that because there was no tool out there to do this kind of thing. But still, you had to load the data, run it, and then you had to see what the expectation is and you had to do that, and it took forever. It’s one of those things – I still follow the 80/20 rule with running unit tests.
Hey, if I could get 80% of this stuff and the other 20% is going to take me another day or two or whatever, I’m just going to punt on that stuff and I’m only going to get the stuff that’s going to get the main course on it, right. And let’s keep moving forward and if we run into a problem, I’m going to write a unit test for it and we’re just going to keep moving forward.
Erik Darling: Right, so it’s like the main goal is to get things working, and then you can catch the edge cases on down the line.
Richie Rump: Yeah, I don’t get paid to write unit tests, I write unit tests because I get paid to verify my functionality, not because I want all these unit tests to do that. Writing a unit test is not fun.
Erik Darling: No, it doesn’t look like fun. Like I have a thing to get to in GitHub where I need to write code or stored procedures that will inflict upon a server various warnings that BlitzCache and BlitzQueryStore will flag, and I’m dreading that because there are so many things – I’m going to be like – a lot of these stored procedures are going to have to add and drop indexes or do something else awful or run for a long time to get these things to happen. Like there’s some stuff you see out in the wild where you’re like, oh I can easily write code that will catch that and I can do that on the spot because this condition exists right here where I need it, but recreating that condition sucks. Databases are hard. People should get paid more for working with them.
Richie Rump: I know, I know.
Erik Darling: You hear that, Brent? People should get paid more for working with data.
Richie Rump: I’m very happy, sir. May I have another day.
(Brent writes: get back in the data mine, peons.)
Can transactions help avoid deadlocking?
Richie Rump: We’ll wrap this thing up here. So here’s another interesting question, “Can begin and end transaction encapsulating DMLs help avoid deadlocking?”
Erik Darling: No, no it can’t. No, it won’t help with that. You will have…
Richie Rump: Why?
Erik Darling: Because you will have just have things wrapped in a BEGINTRAN and ENDTRAN, it doesn’t matter. It does not matter, it won’t help you. There is another question…
Richie Rump: That question fell flat. [crosstalk] I thought there was more of a story there, but I guess not…
Erik Darling: No, no, no BEGINTRAN and ENDTRAN, all that does – I mean, that will isolate a single transaction, it’s not going to help you with other transactions though.
I have this problem with Idera…
Erik Darling: Let’s see here, “Idera SQL monitoring tool and services…” Dude, open a ticket with Idera. Do I look like Idera tech support? You pay them money, probably…
Richie Rump: Now that you mention it…
Erik Darling: You probably pay them money; you should open a support ticket with them. I have no idea why Idera would stop working. I have never used Idera, so I couldn’t tell you there. Couldn’t tell you on that one.
How should I test storage performance?
Erik Darling: Let’s see here, Uday wants to ask, “Do you have any recommendations on how to test storage performance?” That’s a good one. CrystalDiskMark is a good tool; it’s free. And also DISKSPD is a good tool as well. Those are the ones that I would go to there.
Richie Rump: So is there – I’m going to cut you off man, just cut you off. Is there an asset management tool besides MAP that reports on SQL Server versions… [crosstalk] Oh, look at that, now you cut me off that I cut you off.
Erik Darling: Yeah. [crosstalk]
Richie Rump: Don’t cross the streams, Ray…
Erik Darling: That’s a bad idea. So I think Redgate has some stuff like that. Redgate has some tools that do that. They’re not free though, at least not for more than two weeks; or not if you want to get all sorts of stuff out of them. I don’t know the names of all the Redgate tools, because they have a new one like every week and I’m not sure what they do past, like, SQL Prompt. There was SQL Octopus and SQL Lighthouse and SQL Crack… I don’t know, SQL Titanic – but they have tools out there that will do server discovery, I want to say, so I would go and check out “el Red Gatto,” the red cat.
Richie Rump: I know Kendal Van Dyke wrote a tool in PowerShell that does some of that stuff…
Erik Darling: I want to say I tried to use that at my last job to find stuff out when I first started and I did not have good luck with that. I don’t know if he keeps it up to date.
Richie Rump: It’s called SQL Power Doc, that’s what it’s called. Try it, it’s free. I mean, it’s only going to cost you your time, right. And if it gets you what you want, then you can move onto some of the paid versions – or if it doesn’t then you go onto some paid versions. I know he put a lot of work into it and I remember him talking about that like way back, I’m doing this power thing… And I’m like…
Erik Darling: Yeah, I mean, I just wonder what SQL MAP does differently than PowerShell. Like it probably queries the same things, you would think.
Richie Rump: Yeah, I don’t know. I know that – he was a consultant at the time and he’s moved on to Microsoft since – but he was a consultant and he would go into environments and he would have no idea where any of the server are or what is actually running, so he wrote this thing to actually go and find all these servers and figure out what’s on them, where they’re running and doing all this stuff. So if you’re looking for something like that, you know, that may be something for you. Try it and let us know.
Erik Darling: I mean, especially because he went to work for Microsoft, I’d be pleasantly surprised if it was still being worked on, still in development and all that stuff.
Richie Rump: What are you trying to say?
Erik Darling: I’m trying to say that Microsoft… [crosstalk]…
Richie Rump: Ready to be updated? Yeah, I know that, thank you, thanks, Erik…
Erik Darling: Yeah, we need to get those [segment] eliminated things from Columnstore in there because there’s a massive rush to the Columnstore mall, next to the Columnstore store. Everyone’s buying into that one.
Richie Rump: I’m not. Let them go, let them go.
Erik Darling: Alright, fine. Anyway, we’re at 12:46…
Richie Rump: Yeah, baseball’s already started so we’d better [crosstalk] so we can watch the baseball…
Erik Darling: Goodbye everyone, thank you for joining us, we’ll see you next week, hopefully, if I ever get some sleep. Adios.