This week, Brent, Erik, and Richie discuss small vendor apps, “Slow in the Application, Fast in SSMS?”, virtualization, licensing, msdb backup history issues, implicit transactions, how to convince your boss to allow you to use sp_blitz, and more!
Enjoy the Podcast?
Office Hours Webcast – 2018-10-17
Should I use an instance for each 3rd party database?
Brent Ozar: Donna starts out asking, “For small vendor apps, do you recommend spinning up a SQL Server instance for them so that you end up with multiple instances per server? Or do you recommend one instance with multiple databases for different apps, assuming you don’t have to give the vendor sysadmin rights?”
Erik Darling: I mean, if they don’t have to talk to each other, I’d probably want to keep them as separate as possible. Who knows, I’ve had different needs for settings. They might have some weird stuff going on. Maintenance stuff, who knows if one vendor app might support a newer version of SQL; one might not. If you want to move out to a new version, you know – you know, if things are truly different, put them somewhere else. Especially VMs make it so easy to separate stuff out like that without having a bajillion different pieces of hardware [server acts] and everything. I would much rather just do that.
Then, you know, that makes patching SQL Server easier. You don’t have to wait for downtime across a bunch of different apps. One of the servers goes down, you don’t have a bunch of apps out. There’s a lot of different reasons that I would keep things as logically separate as possible.
Richie Rump: You said virtualize and not a separate instance, right? Those are two big different things.
Brent Ozar: Yeah, and it does increase your management work. There’s more servers that you got to watch. But boy, I tell you what, the first time that you can get a restart on one of them without affecting anybody else, that’s kind of magical. Windows patches, same exact deal.
Brent Ozar: Garret follows up and says says, “But you need to license more VMs.” Well, so Standard Edition, you could do instant stacking inside one VM, but yeah, you have to license an entire host with Enterprise Edition if you want to run multiple VMs. Usually in most shops these days, you see people running VMs where the host is licensed with Enterprise Edition. They buy separate hosts just for their SQL Servers.
Brent Ozar: Pablo follows up and he says, “Do you consider contained databases?”
Erik Darling: I’ve never considered a contained database…
Brent Ozar: Only when they come up as questions. Man, there’s so many problems. The problem is, a lot of vendor databases aren’t going to support it. The vendor databases are going to require things that are outside of the scope of the contained database. They’re going to require linked servers, agent jobs, SSIS packages, et cetera. They’re going to be striped across three or four different databases. Contained databases is one of those things that seems like it works really well in theory, and then it just seems to collapse in practice.
Why am I getting a different plan in SSMS?
Brent Ozar: Jan asks, “One of our CPU costing queries is started via exec and it seems not to use an index due to a data type difference between nvarchar and varchar. I’ve simulated it in SSMS and it doesn’t seem to be the case. Why is it that I would get one query plan is SSMS and a different query plan through the application?” Three minutes, ladies and gentlemen. Three minutes before we hit the epic question.
So the thing to go look for is Slow in the App, Fast in SSMS by Erland Sommarskog. This comes up constantly, which is why you hear us kind of joking about the length of time on here. Erland’s written this epic post about parameter sniffing that goes into the reasons why you’ll get different execution plans from different places. This isn’t the only reason though. There are many possible reasons for it.
What advice would we give them? I would say, first off, in terms of the data type difference, which one’s varchar and which one’s nvarchar? Is it the table is varchar and the variable coming in is nvarchar? In which case, you’d want to make the data types match. But if the table is nvarchar and the incoming variable is varchar, that’s not the reason why you’re not using the index.
Erik Darling: A good thing that people should go to, that user voice item, the one I posted in … earlier about the reason why – adding the reason why indexes don’t get skipped.
Brent Ozar: Let me go find it in Slack.
Erik Darling: It shouldn’t be too hard.
Brent Ozar: It wasn’t that long ago. It was a really good link. Here we go – explain the scenario behind that…
Erik Darling: So our dear friend Josh was reading a blog post about a different database engine. I think it was RavenDB or something. And part of RavenDB’s execution plans is that they’ll tell you why an index wasn’t used. Like, the optimizer considered it and then said, oh, we didn’t use it because the key columns are in this order, it was missing some other stuff. So the gist of the user voice item is to explain in SQL Server execution plans why an index wasn’t used. Now granted, on tables where there’s a bajillion indexes, this might be a little painful to sift through. But on most normally indexed tables where people aren’t sure why an index isn’t getting used for some reason, too many key lookups, key columns in the wrong order, sort something, all the different weird reasons that SQL Server might not use an index get surfaced out in the query plan so that you at least have a reason from SQL Server why it didn’t get used and you don’t have to sit there and go, why didn’t you use it, damnit?
Brent Ozar: Jan says, “To clarify, I’m looking for proof that the index is not used when the query is run by the application.”
Erik Darling: I would use sp_BlitzCache and I would look in the plan cache for that query and then I would look at the query plan for that query. And I would look at what indexes I used.
Richie Rump: Plan cache…
Brent Ozar: Richie’s spending an absurd amount of time looking at execution plans lately. And it has parameters in there where you can filter for specific stored procedure names, for databases, all kinds of stuff, and then find your most resource intensive queries from there.
Erik Darling: Who knows, that query might not even be a problem. You might have much worse other queries, who knows?
Why isn’t my backup history getting purged?
Brent Ozar: Donna asks, “Sp_blitz reports that MSDB backup history is not purged. I swear I’m running sp_delete_backuphistory. What am I missing?”
Erik Darling: I guess I would look for job failures first. I think MSDB backup history was the one that I wrote a blog post about making fun of the code in. and it’s just so spectacularly bad that I’ve seen it A, deadlock, B, get hung up. I’ve seen multiple instances of it clash with each other; just all sorts of bad things. So there’s a chance that it’s just not running, that it’s failing, that there’s deadlocking out there from something. I don’t know, maybe you’re not running it frequently enough. I know that there’s some…
Brent Ozar: Filtering on parameters…
Erik Darling: Well I just mean in Blitz, I know that there’s something on how far back we check for history. I think it’s like two weeks or something like that.
Brent Ozar: I want to say, we only throw it if it’s more than 90 day’s worth of history.
Erik Darling: Yeah, but I forget exactly what that is. But it’s terrible code. There are all sorts of reasons why it may not be working.
Brent Ozar: Plus, whenever you see stuff in sp_BlitzIndex, whenever you see stuff like the MSDB history not purged, there’s a URL next to the warning. You can copy paste that URL, go into your browser, and we tell you what we’re checking, like the MSDB backup history is older than 60 days, so you can go run the same queries that are inside sp_Blitz. You can go look at the tables and see for yourself that the data is not being deleted, and then that helps you kind of dig into what’s going on there.
What should I do about implicit transactions?
Brent Ozar: Oh, Donna has another question. She says, “Sp_Blitz is reporting implicit transactions on a vendor app. Is there anything I can do to fix or work around this?”
Erik Darling: Well first, I mean, you should figure out if it’s actually causing an issue. We warn about that up front because there have been several cases working with clients where implicit transactions, for something as simple as a select, has caused really, really bad blocking chains; like monstrously bad blocking chains for hours. This is because when implicit transactions open, no matter what the statement is, it’s basically saying begin tran, do some stuff, and then I’m going to hang out and wait for the next thing to tell me what to do. And you just sit there with these open transactions and other queries are trying to come in and do stuff. Other queries might want to insert, update, delete data, other stuff going on. And they’re all opening these transactions and then they’re all finishing. Like, that statement will finish, but the transactions are still open. Any locks they took out are still open.
So first, what I would start with is to analyze the wait stats on the server where it’s happening. I would run sp_BlitzFirst with the since startup parameter. Look for lock waits. The first results set will be the wait stats and there will be a column of wait types, and there will be the category they fall into. Look for any locking waits you see in there. Bonus points, look at the average milliseconds per wait column and see that those lock waits are happening for a very long time. And that could be a sign that implicit transactions are there and causing a problem.
Implicit transactions on their own aren’t necessarily harmful. I wouldn’t design an app that uses them and I wouldn’t encourage anyone to use them for a particular reason, but them just happening, they could be very well cared for. They could be opened and closed quite responsibly, but that’s a good thing to know offhand, especially for us as consultants when we’re coming in and looking at a brand new server because we have no idea what kind of nonsense might be going on there. And with stuff like implicit conversions which can cause really bad blocking, it’s a good thing to know about.
Brent Ozar: Yeah, so here’s sp_BlitzFirst, since startup equals one. That first results set is your wait stats. What you’re looking for is any LCK waits that are in here in this wait stat column. Now, if it’s way down near the end, it may not be that big of a deal. But when you do see them in here, you want to move across over towards the right hand side, like Erik was saying here. And it will show you, here’s the average milliseconds per wait. Wait stats can be really cryptic too, so we include URLs over to Paul Randal’s excellent wait stats library over at SQLskills, where it will explain…
Erik Darling: Which he maintains manually…
Brent Ozar: Which he maintains manually. When he’s not doing the accounts payable for his company over SQLskills…
Erik Darling: He wrote every single one of those.
Brent Ozar: Yeah, so it’s excellent stuff there. So you can see right at a glance if they’re a big deal for you or not. The reason why we throw it too is that sometimes, you’ll go and spend all of your life doing index tuning or query tuning, not understanding why things aren’t getting better. And it’s because some yoyo is leaving a transaction open by default. It’s usually me more than you, Richie.
The other thing I would say is talk to the vendor and see, did you mean to have this setting on? Because, sometimes, they want to mimic the behavior of Oracle or somebody else’s default databases. Vendor databases are sometimes built to be cross-platform and may be trying to mimic somebody else’s database, not understanding that it’s a good idea over there, it’s just a really crappy idea over here in SQL Server land.
Erik Darling: And you know, those open transactions aren’t just an effect on locking. If the vendor application also wants to use an optimistic isolation level like snapshot or read committed, you know, you’re looking at the version store being alive for the length of basically any one of those because they’re all going to be doing, effectively, begin tran. So you’re looking at bloating up the version store pretty badly too while all those transactions hang out wide open. On top of the locking, there’s that and there are other, sort of, downstream, effects that I would probably want to look at too.
My manager refuses to use third-party scripts and tools.
Brent Ozar: Chris says, “My manager is against using any third-party stuff – procedure, jobs, whatever – doesn’t want me to install the sp_BlitzProcs. Same thing with Ola Hallengren’s jobs. Have you run into any customers not installing your procs due to compliance, security issues, et cetera? How can I convince them otherwise?” it’s so funny, I was just working with a customer who had the same reaction. And I said, well unfortunately, for us to get data out of SQL Server, we need to be able to query it and get lots of metadata out quickly. Would you rather write your own scripts from scratch, learn all the hard stuff? For example, we’ve got thousands of hours in sp_Blitz, BlitzCache, BlitzIndex.
Would you rather start and work now for thousands of hours to do it, or would you rather use something free that’s open source that’s being used all over the world? If you’d like to go build your own, reinvent your own wheel and start from scratch, that’s one of those discussions I’ll just have with managers and go, okay, so I’ll start working on that now. It’s going to take me, I don’t know, six months to get to a point where I can build something good. I’ll see you back in six months… Like, no, no, no, you need to get started now.
Well, I can’t, I don’t have it, I don’t have any scripts to go through and do it. And I’ll tell you a dirty little secret; for a while there, we were having people register via email to download our stuff. Microsoft Premier Support, Premier field engineers even downloaded our First Responder Kit. We could see them going through and getting our stuff. If they’re going to download it and use it, that probably tells you something.
One of my favorites was a customer sent us a report from Microsoft where they ran sp_Blitz and copy pasted the results in, and it said, based on this, we recommend that you change these things. And I’m like, well that’s kind of cool. It’s kind of strange, but kind of cool at the same time.
Erik Darling: I think, you know, if I were in that situation, my question would be, like, okay so what should I be using to troubleshoot problems? Like, if they have a monitoring tool in place that they’d prefer you use, or if they have some stuff already in place that they prefer you use, then you might not have too much of an argument. But if you’re just handicapped, like nothing going on in there, like, nope, figure it out when it happens, that’s when I’d probably start pushing back a little bit on that.
Maybe there would be a good use to spin up a VM, go through some demonstrations of how this stuff works. Show that it’s not a security issue or anything like that, we don’t introduce any vulnerabilities into the system, things like that. that’s probably what I’d go for. Also, Brent has a good thing at brentozar.com/go/askbrent, that walks through how to give non-SA users permission to run our stored procedures so that they get the full breadth of the results back without having to have full control of the server. So to kind of take the sting out of some of the – because we do require higher privileges to look at some stuff, we don’t want everyone to have to get higher privileges to do the same thing. So there’s a way to create a certificate, grant, and users be able to run this stuff without needing to have elevated privileges on the server.
What’s the best way to troubleshoot high CPU?
Brent Ozar: And Evans asks, “What’s the best way to troubleshoot CPU spikes?”
Erik Darling: Stop spiking your CPU.
Richie Rump: Stop looking at my Aurora database.
Brent Ozar: We’ve been running into some problems there. I see that we had another failover too, and this one wasn’t my fault. So finally, it wasn’t my fault.
Richie Rump: That wasn’t yesterday, that was the other night when I was screwing with it. Oh, that was totally me. It was midnight and I was messing with it.
Brent Ozar: So the way that I would start is, speaking of third-party scripts and how easy they make it, I would start with sp_BlitzCache @SortOrder = ‘CPU’. You don’t have to turn anything on. You don’t have to change any configuration settings. Your SQL Server’s just constantly gathering which queries are the most CPU intensive on your SQL Server.
Erik Darling: Yeah, those look like tough ones.
Brent Ozar: Wow, hold on a second, this is going to take me some time to troubleshoot here. So then, you can go see which ones are the most resource intensive, the problems that are occurring with them, the query plans. You can scroll across and see how many times they’ve run, how much CPU they use on average; all kinds of stuff.
Now, it’s not perfect. For example, your server may be under very heavy memory pressure, may not be able to cache execution plans for a long period of time. You might be using option recompile on queries. I can think of a long list of times when you won’t get accurate information out of here. But, this is usually just the easiest place to go start. Go shoe me the top queries in my plan cache ordered by CPU.
Erik Darling: And if that doesn’t give you what you want, then you’re kind of stuck waiting for the next CPU spike and, you know, hopping on the server and being able to run sp_WhoIsActive or sp_BlitzWho to get a snapshot of what’s currently happening on the server and just hope that the CPU wasn’t so blown out that they made just looking at stuff impossible. So if it’s not in the plan cache or you don’t have the immediate issue in the plan cache, that’s when you have to go start looking at other ways to get information out.
Because, unfortunately, right now, unless you have a monitoring tool or unless you have your own, kind of, homegrown logging stuff going on where you have whatever queries that are running gathering at an interval and syncing to table, wait stats going into a table, it’s just going to be really hard to correlate exactly when a CPU spike happened to what was going on when it happened. That’s kind of why monitoring tools make the billions of dollars that they do, because it would be really hard to do that on your own; at least efficiently, and then make graphs.
Brent Ozar: That reminded me too, if you’re going to wait for the next time for it to strike, here’s one of the queries I would run when it strikes; sp_BlitzFirst. It takes a five second sample of what’s going on, on your SQL Server, and then gives you a prioritized list of things that that might be causing you problems right now. Like right now with mine, it says no problems are found because there’s not a lot going on, on the SQL Server. But you’ll get CPU utilization, it will tell you a little bit about where the CPU use is coming from. Like, for example, if thee use is coming from outside of SQL Server, if there’s a backup running, a query is rolling back, there’s a CHECKDB – it will tell you all kinds of things that are common that will cause high CPU usage.
It doesn’t get you to the root cause of which query is causing the problem if there are end user queries. There is a parameter you can add for that – check procedure cache – where it will look at the queries that were running at the start and then again at the end, at the last five seconds; which queries ran the most in that five second time span and tell you which ones were causing the problem. I don’t have any load going on mine right now, so…
Erik Darling: Expert mode is another way to get more visibility into what’s going on.
Brent Ozar: Yeah, it shows you which queries are running at the start. So there’s nothing in here now because there’s no workload on my system. Then it will wait five seconds, take its sample, then it will also tell you which wait stats were big during that time, which files you read and wrote to, all the perfmon counters that changed during that five-second sample. So if you’re looking for something specific like forwarded fetches per second. And then again, which queries that were running at the end of it as well, all totally free.
So that’s the end of Office Hours. Y’all ran out of questions early today, so we’ll let you go play around with the Blitz scripts. Have fun and we will see y’all next week, adios.
Erik Darling: Goodbye.