This week, Brent, Tara, Erik, and Richie discuss error log issues, issues with moving a 2-node AG to a different VM, adding a rowversion column gotchas, using indexes, Docker & CI/CD, RESOURCE_SEMAPHORE query compile, table locks, Query Store, and best places to eat in NYC.
Enjoy the Podcast?
Office Hours – 8-15-18
How should I track down login failures?
Brent Ozar: So we’ll start with Lee. Lee says, “A vendor made a change on an app server and now my error log is full of these errors; login failed for user X, could not find a name matching. Any ideas on what I should tell them to look for? I don’t have access to that app server.
Tara Kizer: I used to have to just ignore that message when I worked at Qualcomm. There was some issue with the SCOM server, something like that, and they couldn’t figure out why it was doing it. But eventually, I just stopped looking at the error log. It was just clogged because – can you turn off failed login attempts? You can turn off successful; I don’t know that you can turn off the failed ones. So at some point, the spamming of the error logs – I guess I can’t use that tool anymore, except to filter it to find what I want to look for. But start with the – you’ve got the IP address, so you know what box is doing it. It’s really hard to say what to do from there, but look for a scheduled task, and application running…
Brent Ozar: And leave it. Who cares?
Tara Kizer: Well, the only thing I care about is the spamming of the error log. The error log is supposed to be used for troubleshooting issues. My client this week has successful logins going into the error log and it’s happening multiple times per second, I think it was, or per minute maybe. I was like, this is unusable now. I can filter for the four things I filter for, but that’s not going to find anything else.
Brent Ozar: Do you know about the minus sign trick with error IDs?
Erik Darling: Well that’s only event viewer…
Brent Ozar: Oh event viewer, not the error log, that’s true.
Tara Kizer: Even event viewer’s becoming a not usable tool.
Brent Ozar: Yeah, the minus sign filter helps in there.
We’re thinking about changing a lot on an AG…
Brent Ozar: Glen says, “We have to move a two node AG to a different VM. I’ve been told that the servers have to be powered down in order to migrate…” I already have so many questions, “Any thoughts on the best way to bring down the AG with a cluster to accomplish the move without going to hell in a handbasket? Also, a new IP for the heartbeat network…” Oh come on, man.
Tara Kizer: At that point, I think I would just build new servers on this new VM. It sounds too complicated and I suspect Availability Groups is not going to be happy with the exchanges.
Erik Darling: No, I’d probably just want to build out whatever new environment I’m going to migrate to and set up AGs over there and then reset them up again. There’s just too many changes all at once. That’s a lot of moving pieces, VMs, AGs, IPs.
How do you order an autographed copy of Erik’s book?
Brent Ozar: Michael Tilly asks, “How does one go about ordering an autographed copy of the book Great Post, Erik?” So there may be a book signing event at PASS. We’re waiting to see how that goes. But even if not, what you should do is bring your book to the bar whenever Erik’s around there, and he will sign Itzik Ben-Gan.
Erik Darling: Mister, buy me a drink…
Any gotchas with adding a row version column?
Brent Ozar: Chris says, “I’m going to be adding a row version column to some of our tables for sending back to a data warehouse. We used to use a changed on UTC column and trigger on that. are there any gotchas I should be looking out for when adding a row version column?”
Erik Darling: I mean, adding any column to a table’s going to have – not like adding the column. If you add a NOT NULL column to a table, you’re not going to do too much damage. But if you have, like, a default for it or if you need to go populate that column later, obviously, you could run into some issues with locking and all that good stuff. So I would be pretty judicious in how I populate that column. I wouldn’t want to just have it all filled in at once.
Brent Ozar: Chris also says he’s going to New York City, “What are Brent’s favorite places to eat there?” Well, Erik lives in Brooklyn, so he should answer this too. Erik, what are your favorite places to eat in New York City? He says he’s staying near Times Square.
Erik Darling: So tip number one, get the hell out of Times Square. Like, just leave, avoid at all costs, don’t go to Ruby Tuesday, don’t go to Bubba Gump Shrimp or whatever the crap it is…
Richie Rump: Don’t take a picture with Spiderman…
Tara Kizer: Visit it for tourist things but leave 15 minutes later. It’s just crowded.
Richie Rump: Go to Midtown Comics and then leave.
Erik Darling: Times Square Elmo will mess you up. He has a dirty gym bag…
Brent Ozar: I would say, my huge resource – all my favorite restaurants have closed. This bums me out so much, but nyeater.com – this has their best restaurants of New York – and they do this for all kinds of cities; San Diego, all kinds of places. And out of there, I’m going to scan down and see if I’ve hit any of them. I haven’t had Ping Seafood…
Erik Darling: I’ve had Ping.
Brent Ozar: Was it any good?
Erik Darling: It’s okay…
Brent Ozar: Katz’s Deli…
Erik Darling: It’s okay. If you want a really big $20 sandwich then go to Katz’s. I don’t know that I’d want it.
Brent Ozar: Momofuku – that’s the real name, yeah. What’s the dessert one? Momofuku has a dessert one – Milk Bar, I think it’s called. I really like that one.
Richie Rump: Is that David Chang’s restaurant?
Brent Ozar: Yeah, so anyway, I would start here. Start there and run from there.
Can an index really make that big of a difference?
Brent Ozar: Sheila says, “I added an index last week. This week, my batch process is running far better. The index shouldn’t have made that big of a deal. Would a query plan change occur from the index and make it that much better?”
Erik Darling: You know how indexes work, right?
Brent Ozar: No, probably no.
Erik Darling: If I’m thinking about ways to change a query plan, an index is going to be one of the first things. If I’m looking at clogging a weird process like that, indexes are going to be one of the first things I look at; not just adding them but getting rid of ones that – if I have a write-intensive process, getting rid of a whole bunch of indexes that aren’t living up to their duty, I’m going to get rid of those too. But adding an index is, like, hands down, one of the top things that you can do, even before rewriting a query, using a temp table, doing other stuff, adding an index is, like, what I’m going to go for. That’s like my first stop. What index can I – to make this less horrible…
Tara Kizer: And maybe you didn’t think it was going to help, but maybe it got rid of an expensive key lookup and that was the whole bottleneck of the query.
Erik Darling: Sort, key lookup, improved join, improved some sort of aggregation. Why knows? It could have even helped more than one query. That’s the beauty of indexes. It’s not like you add an index and you’re like, you’re for this query; no other query can use you, you’re special. No, lots of stuff can use them. So if you found that query in your missing index DMVs or something then it’s totally possible that more than one query was able to benefit from it.
Brent Ozar: Or it might have been close enough that other queries, even if it wasn’t their ideal, it was good enough.
Tara Kizer: And to help answer this question for the future on her server, set up logging to a table via sp_WhoIsActive, and if you had that in place, you can go back in time and look at the execution plan from that process and then compare it to what it is now and you’d be able to answer it yourself.
How many includes are too many?
Brent Ozar: Speaking of indexes, Steve says, “When it comes to missing index advice from execution plans, what’s the best rule of practice or a good rule of thumb for includes when you think there are too many?”
Tara Kizer: I know Brent’s rule of thumb…
Brent Ozar: What is Brent’s rule of thumb?
Tara Kizer: Well just see, are there more than five indexes per table and no more than five columns per table, and that includes the includes, you know, the key plus includes. My restriction isn’t that low, but when it’s recommending 50, I’m like, okay, that’s enough. That is way too many. Just chop off the includes at that point and then check the execution plans, if you can figure out what query it was targeting and see is there an expensive key lookup. Do you really need to return 50 columns from this query?
Erik Darling: You know, I think my rule is probably a little bit closer to 10 in 10, but that’s because I don’t do a lot of pure – at least historically, I haven’t done a lot of purely OLTP work. My stuff is always, like, a big hunk of reporting on top of stuff. So I’m a little bit more kind to having some extra indexes around. But yeah, Tara’s right about that. I also have a session at GroupBy that’s free that you can go watch about improving select star query performance. So if you go watch that, you can learn a way to change the way a query is written so that you can take advantage of narrower indexes without having to worry about adding all 50 includes because those missing index requests are kind of idiots. They’re just like bad teenage cries for help. They’re going to ask for every single included column. There’s no filter on the kind of columns that get included. You can end up with these long string columns in there, XML columns, like any idiotic data type. Anything that the optimizer is, like, oh but it will be cheaper, it will just, yeah include it in the index, I don’t care. Like, no penalty – everything’s free. It’s just an include. Don’t worry.
Tara Kizer: I mean, some of those are going to fail, you know. Varchar max, that’s just not possible in the index.
Easiest way to reinitialize merge replication?
Brent Ozar: Paul asks a question I think we’re all going to arm-wrestle to answer. We’ll be so excited…
Tara Kizer: it’s going to be Richie for sure…
Brent Ozar: Paul says, “When I’m running merge replication, is there a way other than initialization to re-sync all the data from the publisher to the subscriber?”
Tara Kizer: If you don’t mind a full copy of your database over there, just do backup restore. Sync it up that way and you could apply transaction logs to get it more in sync. And then once it’s in sync with the publisher, then set up replication and tell it, I’m already ready to go, I’ve already manually synced it on my own. But, you know, if you’re only going to be replicating some of your tables, that might not be a good solution. But if you’re going to be replicating all of them or most of them then backup and restore is a really good solution for that. And if you need to drop tables, you can do that at that point, but it at least gets you past the point where the initialize is going to take you several hours. And it affects the publisher as it’s happening, so backup and restore wouldn’t affect the publisher.
What are your thoughts on Docker and CI/CD?
Brent Ozar: Sri asks – now we’re going to make Richie come back to the microphone…
Richie Rump: Are the bad questions on?
Brent Ozar: Only the first one. Sri asks, “What are your thoughts on the Dockers and CICD?” Richie, what do you think?
Richie Rump: I love continuous integration, I love continuous deployment. And not yesterday, as the team could tell you, as things started breaking and I couldn’t figure out why – here’s a quick hint, it was someone else’s software. It wasn’t ours. It wasn’t Brent’s either.
Brent Ozar: For once.
Richie Rump: Exactly. Continuous integration and continuous development, they’re phenomenal. They’ve been, I think, a boon to us developers as far as being able to get things out quickly and with a high level of confidence of quality in that bugs will not be infecting our code. So now that we’ve got Amazon publishing, what, a new release every 12 seconds or something crazy like that, we couldn’t do that before. So CICD is something every software team – I mean, I’m a software team of one right now and I’m using CICD, so there you go, there’s the value right there. Dockers, I haven’t used Docker; sorry. I haven’t had really that much of an opinion. So again, team of one needs to be sliding things in and out or do anything like that, haven’t really needed it. I’ve got no opinion on it or the Kubernetes or whatever the cool kids are doing these days; don’t know. I am blissfully ignorant.
Brent Ozar: It feels like there’s – if you listen to the Microsoft buzz, it seems like they latch onto any buzzword that’s flying by and they try to stick it to the SQL Server product with Velcro. Artificial intelligence, got it, let’s smear some of that on there. R and Python, yeah, smear that on there. Oh, here comes Docker; grab that…
Erik Darling: Linux, machine learning – because they’ve just missed the boat so terribly on so many things. It’s like – I’m surprised that SQL Server doesn’t have like an internet browser in it.
Richie Rump: The internet, that will never be a thing. What are you talking about?
Erik Darling: Outlook for SQL Server, I’m like, what the…
Richie Rump: I don’t know, for me, as far as Docker on the server, it still doesn’t make a lot of sense for me, right. Maybe because I’ve been playing in the cloud too long and I just let the cloud vendor handle how they want to do implementation, but if I was in-house and somebody said, hey why don’t we just throw a docker out there for SQL Server, I’m like, why? What is it really buying us? And I haven’t really been able to grok my head around that quite yet.
Brent Ozar: Especially compared to platform as a service, where they just manage everything for you.
Richie Rump: Yeah, and it’s like, well why don’t we just throw it out there? I guess then we start talking business reasons of why we would do this, Docker versus the cloud or something like that or VM in-house versus something else. But I just don’t think it’s that big of a deal, when we’re talking about a server, to run the installer.
Brent Ozar: Okay, so hopefully, Sri, there’s your answer.
What causes resource_semaphore_query_compile?
Brent Ozar: Rakesh – Rakesh experienced an issue in production with lots of queries waiting on resource semaphore query compile – wants to know if that’s the cause or the effect. What you start seeing resource semaphore query compile, what do you look at next?
Erik Darling: I look at what’s running and I look at how big that query plan is. So when resource semaphore query compile hits, to back up a little bit on that, when queries compile, there are different classes of query depending on how much memory they need. There are queries that don’t need any memory to compile. That’s very tiny low-cost plans and plans that are already in cache. So they can just go and compile immediately. Then, there’s small gateway queries, which require, I think, like 380K of memory to get the query plan compiled for them. And then they kind of step up from there. And as you step up, you can have fewer and fewer queries that go into that gateway.
So, there’s no memory, small memory, medium, and then big. Up to 2014, you could only have one of those big queries going at a time. 2014 had a trace flag. I forget what the trace flag is. But then 2016 and up had this different algorithm where they scale up the number of big queries you could have compiling at once depending on how much memory you have. So with a 768GB server, I’m really concerned, not only because you have queries coming in that need to compile all the time, and enough that you got blocked up on that. Queries that, with 768GB, your plan cache should have the queries you need in there. Maybe forced parameterization is a good idea. Maybe optimized for ad hoc workloads is a good idea to help, kind of, reduce that.
But resource semaphore query compile is when you hit one of those gateways and you just have too many queries trying to go through it at once and they sit around waiting to get additional memory to compile a query. Like, they’re not running, they’re not getting data, they’re not going out and getting locks, they’re not doing anything. They are stuck waiting just to get a query plan, so it’s definitely a big enough problem that you’re going to want to address that and you’re going to want to find a root cause on.
Does lock escalation happen with deletes?
Brent Ozar: Mark says, “So table locks happen if you update 5000 rows or more. Does the same locking happen if you have 5000 or more deletes?”
Erik Darling: Yeah.
Brent Ozar: There we go. That might be the very first time we’ve been done with a question in…
What are your thoughts on Query Store?
Brent Ozar: Keith says, “What are everyone’s thoughts on Query Store?”
Erik Darling: The new Query Store? What’s the new Query Store?
Brent Ozar: This question says the new Query Store…
Erik Darling: There’s that old Query Store that’s been around since 2016. [crosstalk 0:16:02.9] It’s nifty. I like it, but I can understand people’s reticence in using it because you can’t really choose where that data gets stored. You might be storing some crazy PII in there. It doesn’t quite have the management features, I think, that a lot of people would want in order to start using it. That, and everyone who I talk to – not everyone, but a lot of people who I talk to at conferences are like, you know – because I talk about sp_BlitzQueryStore because I wrote a whole stored procedure. I was that excited about it, I was like look, I’m going to write a stored procedure. It’s going to do the same stuff at BlitzCache but with Query Store, and no one uses it because no one has Query Store turned on.
When I talk to people about it at conferences, they’re like, we turned on Query Store and CPU use went through the roof, bad stuff happened, like the drive filled up. I’m like, man, I wasted how many hours of my life writing a stored procedure for something that makes CPU go through the roof and fills up drives. I can understand why people don’t use it. I like it in theory. I like the prospect of being able to have long-term plan data in there and be able to trend queries over time rather than just depending on that one plan that’s in the plan cache. But you know, like, a lot of pushed out the door features for things in SQL Server, I’m not sure that it was quite 100% ready, like extended events.
Brent Ozar: Oh, that’s mean…
Erik Darling: Extended events is the mobile browsing of user experiences. There’s a reason that everyone has an app instead of making people use a mobile site, because it’s just miserable and painful. No matter what you do, you’re just in for pain and suffering and disappointment.
Brent Ozar: So why is it that the people who evangelize Query Store also evangelize extended events? There’s something in common. I want to believe in Query Store. Like, I think that if I was a database administrator, I like to think that I would turn it on on all my servers and just watch out for all the CUs, because this just came out in the most recent cumulative update; database performance is bad without this cumulative update. So clearly there was a performance problem, but if it wasn’t that big of a problem, I’d sure like to enable it.
Richie Rump: Can we get you a poster of, like, SQL Server Query Store in the background and it just says, I want to believe.
Erik Darling: I mean, it’d be nice because, like, the plan cache is just so temperamental. It clears out, it doesn’t have all the plans in there, it doesn’t keep a lot of historical information, so it would be beautiful to have that kind of stuff. I’m glad that it seems to be working for the robots up in Azure or whatever. It’s helping Microsoft choose between plan A and plan B, you know. For regular people, it’s…
Richie Rump: I just was cranking on a little bit a few weeks ago…
Brent Ozar: What were you cranking on?
Richie Rump: Plan cache.
Brent Ozar: Oh yes…
Erik Darling: Yeah, tell us more…
Richie Rump: We started collecting in the new version of ConstantCare. We haven’t implemented any rules for it yet, but we’re getting there. Getting the data is the first part, and then start applying rules to those plans.
Erik Darling: So you don’t know how we’re going to go through those plans, what we’re going to do with them?
Brent Ozar: One of the users replied in because we sent out an email, hey there’s an update to is, just to, like, early access beta users. Hey, there’s an update. If you want, you can send us your query plans, you don’t have to. One of the users who emailed in copied in all his database admins and he goes, “I don’t care what it takes, you get the Ozar people everything.”
Following up on resource_semaphore_query_compile
Brent Ozar: Rakesh follows up with his resource semaphore query compile and says, “We have both optimized for ad hoc and forced parameterization on.” He also mentioned that he started a case with Microsoft. That’s awesome, it’s just that you’re not going to get an answer down to root cause analysis inside a free webcast. There’s just no way we can pull that off. However, I want to leave you with a couple places you can go for help. If you go to – let me go find the page for it – dba.stackexchange.com, you can post a multi-paragraph question. Just be cautious there because it’s going to be super-specific to your company. They’re going to want evidence that you’re going to have to be able to post publically. It’s not free consulting; it’s free help, but you’re getting to a point where Microsoft couldn’t solve the problem. What you may ask form the community might be kind of tough to do. The other thing you can do is, we have actual consulting. It just so happens that this is what we do for a living. If you go to brentozar.com and click on Critical Care up at the top, we do this three-day consulting thing where we get to the root cause of your performance problems. So it may be the point where you need that as well.
Tara Kizer: He also mentioned that the change to the latest cardinality, you know, resolved the issue. So if I wanted root cause analysis, I would switch it back and start troubleshooting what are the queries that are having this issue, what are the queries that have large unused memory grants? And maybe specifically, on problematic queries, switch those guys to the legacy cardinality estimator, but not the whole box.
Brent Ozar: And to some extent, if you flip the CE and it suddenly started working, there’s your root cause. You’ve got queries that don’t work well with that CE. You can either change the queries or you can change the CE.
Erik Darling: That is also valuable feedback for Microsoft. If flipping the CEs has that profound of an effect on a server where you go from being at a complete standstill with resource semaphore query compile waits to not having any and everything being fine and dandy, that’s valuable feedback for them that should be shared, I think. You know, as much as we poke and prod, we do like to see a good competitive product that we have to work with day in and day out…
Brent Ozar: And they want these edge cases too. They want to know when these edge cases hit that are so bad.
Erik Darling: So that’s connor.cunningham@…
Brent Ozar: And his home phone number is… So that’s it for Office Hours this week, everybody. Thanks, everybody, for hanging out and we will see y’all next week. Later.