This week, Brent, Tara, Erik, and Richie discuss load balancing, preventing auto-updates, compatibility modes, capturing deadlock chains without tracing or extended events, blocking, which user should own databases, asynchronous processing, compressing databases, reindexing on Always On AGs, and “Making a Murderer” spoilers (not really).
Enjoy the Podcast?
Office Hours Webcast – 2018-10-31
Should I use a load balancer with my AG replicas?
Brent Ozar: Jim asks, “Hey, y’all, do y’all have any tips or suggestions related to load balancing of SQL Servers? We’re currently testing putting read-only AGs behind a load balancer, not to use the built-in AG load balancer. What would you prefer?” I don’t think any of us have put an AG behind a load balancer.
Erik Darling: Nope.
Tara Kizer: I wonder if it’s even needed. Are you really needing to load balance your reads? It’s not usually – I’m not even considering load balancing the reads. I might be talking about a completely different architecture of the system if I’m having to go down that route.
Brent Ozar: The thing I’d really worry about is, does your load balancer have the ability to run DMV queries to see how overloaded each replica is and then prioritize them? If all it’s doing is spraying round robin requests, then I don’t know how useful that’s going to be. You really have to have some kind of load gage in there.
Erik Darling: I wonder, because inside some SQL – like, whatever CPU DMV, there’s the node wait and exactly how much stuff is going on, on each CPU, and it would be really interesting if someone could figure out how to get, like, timely information out of that and balance things on that. But even that, like, that would be really advanced and really difficult. So I would kind of question the wisdom there of even attempting the load balance. It’s like load balance based on what, I guess?
Brent Ozar: That’s hard. And then the other thing you’ve got to think about too is how far behind each replica is. What happens if the plumbing breaks on one of them? So y’all send in another diagnostic query to check that to know when to pull a replica out of the pool.
Erik Darling: If you’re just basing it purely on, like, number of queries, then you’re in trouble if three or four big queries end up on one and a few small queries go through on another. It’s not going to…
Tara Kizer: And why not just go for bigger hardware on the one?
Can I pick my SQL updates in Windows 2016?
Brent Ozar: And Lee asks, “We’ve got Windows Server 2016 and you have to install all of the updates; there’s no picking and choosing which ones. So if a Cumulative Update comes out for SQL Server, you cannot stop it from installing. I don’t know how to prevent it installing on my production servers. How should I get around doing that?”
Tara Kizer: Does Windows 2016 really do that, all or none?
Brent Ozar: I think it does for Windows, but not for SQL Server for the Cumulative Updates.
Erik Darling: Yeah, I mean, I don’t see mine doing that. I have to patch mine manually. I have, from my whatever Visual Basic net thing, I have a Windows Server 16 thing on my desktop and I don’t have that same issue.
Brent Ozar: You know what it is? I bet it’s his systems administrators have put it in group policy. Your sysadmin should have a list of exclusions based on server name or…
Erik Darling: Ooh, or unplug the Ethernet cable…
Tara Kizer: That’ll work. I don’t let SQL Server Service Packs or Cumulative Update patches happen on my production servers through some automated process. That is all turned off.
Brent Ozar: There’s a good reason for why too. Man, this just hit this morning. I’m going to go to SQL Server Updates. So we just recently got 2016 Cumulative Update three, and when you go to read the KB article now at the top, “Important, due to a known issue with the…”
Tara Kizer: It’s never good when there’s a yellow section [crosstalk]…
Brent Ozar: “If you use the auditing feature, do not do production deployment of the present CU. If you’ve deployed already and are hitting any issues, please contact Microsoft support.” Whoops.
Tara Kizer: I wouldn’t even install it then. I mean, I don’t ever use an auditing feature, but I’m like, nope, unsubscribe.
Brent Ozar: It’s okay. But, Tara, look it says right here, SQL Server CUs are certified to the same level of Service Packs and should be installed in the same level…
Tara Kizer: Are they really? I mean, isn’t that the whole point of them moving away from Service Packs is because it took them too much time to test them?
Brent Ozar: See, that’s what I thought. And I think what they really did is they just stopped testing service packs, and that’s how they say they’re now at the same level. Just as good…
Does SQL 2014 have 2005 compat level?
Brent Ozar: Glen asks, “Can I run a SQL Server 2005 database in compatibility mode on a 2014 or newer server?”
Tara Kizer: I mean, for sure not on 2017. I think the lowest on 2017 is 2008. But 2014 maybe, but why? What do you have in there that’s going to break – can you go up to 2012 compatibility level? Because many people on 3014 have to stick on a lower compatibility level due to cardinality estimator performance issues. I just wonder why you’re stuck on 2005. Is it a vendor thing that they’re requiring 2005? And if it is, you’re not allowed to upgrade SQL Server to 2014 anyway.
Erik Darling: Yeah, depending – if it is a cardinality estimator thing and you have any control of the code, Microsoft did just add a bunch of use hints that allow you to use the cardinality estimator, or use the cardinality estimator from a specific version. That’s not helpful to a lot of people because you’re either a vendor who controls the code and cares that much, and you have to change your code to use the hints, or you’re you and you don’t control the vendor code and that just doesn’t help you at all.
Can I capture deadlocks without tracing or Extended Events?
Brent Ozar: Ouch, next up, Matt says, “Is there a way to capture deadlock chains without tracing or Extended Events?”
Erik Darling: Yes.
Brent Ozar: It just so happens there is. I’ll fire it up and you talk about it, Erik.
Erik Darling: Yes, since, I want to say, SQL Server 2008, SQL Server has, behind the scenes, been running something called the system health extended event. So it’s not anything you have to set up and look at. It’s not as good as a prebaked Extended Events session. Specifically, they capture deadlocks because it doesn’t keep a good full history in there or doesn’t persist terribly long. So if you are the type of person who has a lot of stuff going on in that background Extended Events session, then your deadlock information could very quickly run out of there. There’s a perfmon counter that tracks deadlocks, or some counter that tracks deadlocks in there.
And there are lots of times where, with a client, we run sp_Blitz and it’s like, wow, you had 500 deadlocks a day. But then we go to run at the system health session and it’s kind of unfortunate that they’re all gone from there. There’s other stuff filling that thing up. I wrote – while I was drunk on a plane last year going to PASS – a stored procedure called sp_BlitzLock. And sp_BlitzLock will examine either the system health Extended Events session, if you have your own Extended Events session spun up to do it. It will examine that on SQL Server 2012 and up.
SQL Server before 2012, the deadlock XML was kind of garbage and not very helpful, but the 2012 and up one is a lot better, has a lot more information in it. So when you go look in there, you can get a whole lot of information about deadlocks from there, that XML…
Tara Kizer: What I like the most out of sp_BlitzLock is it gives me an easy way to check the isolation level, because sometimes, for clients, they’re running serializable and they don’t even realize it. There’s just no concurrency on that. And then second of all, you may be able to quickly see in one of the columns that it’s a select versus an update; I’m like, oh hurray, I get to recommend RCSI. You know, if it’s between two writes, it’s a little bit more challenging. It’s almost never between two writes though.
Erik Darling: Yeah, sometimes it’s tough to get stuff into that Extended Events session. I remember when I was setting up – when I was trying to set up deadlocks to test BlitzLock with I was like, wait a minute. I would have to run things like five or six times before it would accurately register and get in there. It’s kind of a pain in the butt, man.
Tara Kizer: Was it just not in the system health session yet?
Erik Darling: Yeah, it took a while to flush out to there from wherever it lives and sometimes it took a few tries before it would go in there. I don’t know, it’s a whole thing…
Tara Kizer: That’s probably what Brent’s experiencing.
Erik Darling: Yes.
Tara Kizer: You need to do it 10 more times.
Erik Darling: Either that or Brent’s most recent pull request to remove columns broke everything.
Brent Ozar: It’s very possible
Tara Kizer: Maybe he has the system health, you know, disabled.
Brent Ozar: Anything is possible on my crappy system. For y’all who are watching the video or watching it live, what I’m doing is I’m reproducing deadlocks like crazy. And then, in an effort to show sp_BlitzLock, showing it in the history, and… yes…
Tara Kizer: Finally.
Brent Ozar: Alright, thank god…
Tara Kizer: And there they all are too.
Erik Darling: Yeah, they all go in at once. It takes them a while to flush out. It’s aggravating as hell.
Tara Kizer: That makes me suspicious about Extended Events sessions. What kind of delay is happening there? Is it just due to volume, you know, because there’s no load here?
Erik Darling: No, it’s just it takes a little while for stuff to just make it into the session, for the XML to refresh or whatever. It’s sort of like how when you go and look at the CPU usage in system health, like the ring buffer. It’s like every minute, but sometimes it’s like five minutes behind and you’re like, wait a minute [crosstalk].
Brent Ozar: Yeah, so that’s really slick. You don’t have to turn anything on to get it. It’s already built in by default; 2012 and newer, so very slick stuff.
Erik Darling: So it gives you all that deadlock info up top, tells you where the transaction is. Down the bottom, we try to do some, like, metadata analysis and tell you which stored procedures, tables, indexes you have the most stuff happening in. It gives you commands with BlitzCache and BlitzIndex to take a closer look at those, maybe see, oh there’s an index missing, I need to do something else. There’s something weird in the query plan, who knows? I don’t know, I was drunk on a plane when I wrote that.
Brent Ozar: And it works pretty well.
Erik Darling: I’m going to be really helpful if I’m going to be drunk on a plane, and here we are.
Richie Rump: Those first class Brent Ozar cross-country flights.
Erik Darling: When you can really stretch that back…
Brent Ozar: We try to only fly first class because we just end up getting drunk on planes and writing T-SQL that y’all can use, so it’s kind of like a tax write-off.
Erik Darling: You’re welcome.
Brent Ozar: Mike pipes in with, he just wanted to say that, “The First Responder Kit with the PowerBI piece has been great for recording and validating app query issues. Keep up the good work.” Great, glad you like it.
Brent Ozar: Jim says, “Our blocking monitor occasionally shows system processes blocking each other…” Oh god, I’ve got to read these before I start talking, “It started shortly after we introduced Service Broker. How can I [crosstalk] to see if the blocking is coming from the activation proc we’re using in Service Broker?”
Richie Rump: Don’t use Service Broker. Problem fixed.
Erik Darling: A-plus.
Tara Kizer: Just fix the blocking monitor – the [update carrier], that two system processes are blocking each other. It’s going to resolve itself, I mean, so just fix the blocking monitor to exclude blocking between SPIDs less than 50.
Brent Ozar: Makes perfect sense.
Why don’t all my queries show clearly in the plan cache?
Brent Ozar: Pablo says, “What does it mean when I look in the plan cache and there are executions but some of them show the SQL text as empty?”
Tara Kizer: SQL text – I mean, the execution plan might be missing if you’re using BlitzCache, but the SQL text is never empty for us. Maybe he needs to use a better tool, like BlitzCache.
Erik Darling: So when I’ve seen that happen, it’s been, like, encrypted procs or sometimes what’s a real son of a gun about it is if there’s restricted stuff in there like passwords, sometimes SQL Server screens that stuff out.
Tara Kizer: But just a portion of the query, right? Not the – I mean, the entire query…
Erik Darling: Yeah, but if that’s all the query is, who knows? I don’t know what these crazy people do with their SQL Servers.
Tara Kizer: Use sp_BlitzCache and see if it happens there.
Erik Darling: That would be my vote too.
Richie Rump: They use Service Broker…
Which users should own databases?
Brent Ozar: Let’s see, next up – I swear I’m going to start reading these questions first. Alright, I’ll take this one…
Erik Darling: You say that every week…
Brent Ozar: Jeremy asks, “Is there a best practice for what users should own databases?”
Tara Kizer: SA.
Brent Ozar: And why? Why SA?
Tara Kizer: I don’t even remember the reason anymore, but I know that I’ve encountered an issue with .NET CLR objects when the owner was not SA and we moved servers and something happened there. SA fixes everything; job owners and database owners. Even if you don’t use SA, they count SA, have SA be the owner.
Erik Darling: It’s not a big deal if you have an agent job owned by someone else and that account gets disabled and it might stop working or whatever. But for me, it’s just that if someone else owns – like, if a different user owns a database, they have all the SA level privs on there, I might not want that. I might not want that person to own the database just because they happen to restore it that day. So I just like to keep everything under SA.
How should I do async processing without Service Broker?
Brent Ozar: Jim follows up with, “I know y’all don’t like Service Broker and you recommend not to use it, but do you have another suggestion to handle asynchronous processing. We’re using Service Broker to do asynchronous data processing instead of triggers.” Richie, how do you like doing asynchronous data processing?
Richie Rump: In the cloud.
Brent Ozar: In the cloud? And do you do it in the relational database, or where do you do it?
Richie Rump: I do because Brent told me I had to.
Brent Ozar: No you don’t. We use a queue… Service Broker is a queue, right, and we’re totally down with queues, we just don’t like doing them in your $7000 core relational database.
Richie Rump: Yeah, and there’s a lot of free open source queueing systems out there.
Tara Kizer: I’ve worked with a lot of Java developers who love TIBCO for cueing.
Erik Darling: What was it Jeremiah used to like, RabbitMQ?
Brent Ozar: Yeah.
Should I use row or page compression?
Brent Ozar: Let’s see, Marvel asks, “We’re looking at implementing compression on some of our largest databases, however, I can’t find information on which level of compression is best. The clearest article I’ve found is this one article that tries to tell me when to use row or when to use page.” Really, the easy thing to do there is just test. There are stored procedures you can use to test and what they do is create a compressed copy of the table over in tempdb. So then, you can check and see which one is better for you.
Tara Kizer: And it’s built-in. And there are some handy scripts that can help you interpret that data too. I forget who the person was, but it’s a female that’s got a really good script. It’s been years since I’ve looked at it…
Richie Rump: Erika Darling?
Tara Kizer: Maybe Michelle Ufford, I can’t remember though.
Erik Darling: Yeah, I know exactly what you’re talking about too. It’s a shame that SQLBlog isn’t really such a thing anymore because [Linchie Shay] used to have a bunch of great posts on compression about, like, how it affected reads versus writes. You have to go to like archive.org to try get that, it’s a pain. If you just go to the website, it’s like login failed or something.
Tara Kizer: Really?
Erik Darling: Yeah, it’s terrible. But it might not even be up there anymore. It doesn’t come up in search results.
Brent Ozar: I bet you’re right.
Erik Darling: If you went to archive.org, you can track old copies of them down, but those posts were great. And these days though, if I’m seriously thinking about tables that really extra need compression then I’m considering column store because the compression is so much better than page or row compression. But you know, there’s just as many gotchas and guidelines to follow there too.
Tara Kizer: But not for OLTP systems, right? Or at least a table that has lots of changes happening…
Erik Darling: Yeah, yeah, I mean, you know, you could make an argument for throwing a non-clustered column store index on there and just, you know, a place where maintenance actually matters with column store indexes, and then just performing fairly regular maintenance on it. But if you’re talking about, you know, a clustered column store, then it’s kind of a whole different ballgame. I would avoid that for OLTP – not be your friend…
I don’t have a lot of space left in this data file.
Brent Ozar: Thomas says, “Sometimes, when I click on the properties in my database, it shows the space available as being something I am not comfortable with, like only 4GB. However, there’s a ton of space available on the hard drive.” So it shows not a lot of space in the database, lots of space available on the hard drive, “Should I be concerned?”
Erik Darling: Leave it alone. It will grow.
Tara Kizer: I like that too. I mean, I’ve always let auto-growth just kick in, as long as I have instant file initialization in place for the data files. I know that some DBAs have scripted this out though. They’ll check for free space, and if it’s below their threshold, they automatically, later on, during a low period of the day, they will grow out that file to add additional space. But that’s too much work. I don’t care about auto-growths happening during the day, as long as I have instant file initialization enabled and the auto-growths are for the data files.
Erik Darling: Yeah, I mean, the one place where I do care about that is with log files, because they don’t get instant file initialization. So those ones, I’m okay with having empty space in and growing those out a little bit further than maybe some people would want. Some like some percentage size of the database just to make sure that I’m not waiting on those things to grow into the error log space, because who knows – you set that to a percent, you know, the bigger your file gets, the bigger that percent gets, the bigger that growth gets, the more zeros you have to write, stuff waits around. It sucks. It’s not fun. So log files, grow them out.
How can I rebuild indexes without hitting the log file?
Brent Ozar: Gus says, “What’s the best way to reindex on an Always On Availability Group database without running out of space on a transaction log?”
Tara Kizer: I thought for sure you were going to skip that question because we’re just going to make fun of the question.
Brent Ozar: It’s great. I love it.
Tara Kizer: I’m not going to answer it.
Brent Ozar: So you can’t. Rebuilding an index is a logged operation, and it’s not like you get the choice to do it in small phases. It’s a big honking transaction. So the classic example is, we had one client who had large databases, but almost all the database is one table. It was like a big, say, you’d have a 1TB database but 800GB of it would just be in one table. Man, you want to rebuild that thing. You better buckle up and take some time. It’s going to be a while and it’s going to need the whole space of the transaction log. So if something goes wrong, you’ve got to be able to rollback.
Erik Darling: Tempdb’s going to suffer. Everything’s going to suffer. A lot of people will just not do that kind of maintenance on an AG anymore just because it gets so difficult. It’s not like you have the option to switch the database into simple or bulk-logged and then run it and then not deal with that. But that doesn’t even help that much because you’re doing it all in one go. So you can’t do that with an AG anyway. I don’t know, the whole thing is just rough; not fun. And when you own an AG, you have to change the way that you think about the way you do things with data.
And it’s not just index rebuilds, like doing large modifications. You have to start batching things really smartly if you’re going to use an AG, not try to update entire tables at once or insert a bajillion rows all at once. You have to change the way that you handle and manage data because if those things fall behind, they get very, very unhappy, very unhappy.
Brent Ozar: Theoretically, there’s a new feature out in SQL Server 2019 and in Azure that will help you with that. I don’t know if it works in Always on Availability Groups or not yet, but it’s accelerated database recovery.
Erik Darling: It works with everything, according to my friend who did stuff with it.
Brent Ozar: Yes, I like that. You can sign up for the preview now. It’s available in the 2019 preview. You just need a magic secret, I would guess trace flag, in order to turn it on…
Erik Darling: Memory bit…
Brent Ozar: The debugger, yeah.
Erik Darling: Hold a magnet next to the…
Brent Ozar: Just flip the debugger, how hard can it be?
Erik Darling: Yeah, and to follow up a little bit on Thomas’s thing about the index rebuilds, it’s like, you better be solving a really, really serious problem with that index rebuild that you can’t solve with something a little bit easier on the server, like update stats or, you know, even a re-org or something, if it’s a smaller table, because re-orgs will choke on big tables. But I would make sure that I’m solving a really, really big problem if I’m taking that much of a chance with my AG falling behind or tipping over or whatever awful things happen when you throw the smallest grain of sand into the cogs of an AG.
Brent Ozar: Gus follows up with, “Do sort results in tempdb help with the space on transaction log?” Not at all, unfortunately.
Erik Darling: Nope, it does help with the space that you would need in the data file to sort the results, but it does not help with the logging of the transaction.
Does sp_BlitzCache have a divide by zero error?
Brent Ozar: And then Thomas asks, “Sometimes I run sp_BlitzCache and I get a divide by zero error. Any thoughts there?”
Erik Darling: Yes, use the latest version where that error was fixed.
Brent Ozar: We fix a lot of stuff. I mean, holy cow, we have a lot that goes through that First Responder Kit. It’s insane how fast we run through fixes in there.
Richie Rump: Well I think it’s all the unit tests you have against those things…
Brent Ozar: Yeah, you catch so many things…
Erik Darling: User tests, user tests. You people are our units and you test things in units for us. Thank you for that, we appreciate it. But yeah, stuff like this, it just makes you really kind of paranoid about when you need to start being more defensive in the way that you write code, you know. Anytime there’s division, make sure you have some protection on that divider side about zeros. If you’re using, like, substring, be really careful about when you might have a negative value passed in that will make that throw an error.
Set yourself up with some protection there so that you don’t just run into these crazy ephemeral errors that are impossible to reproduce, or even to conjure up, to think, like what bad thing could happen here? Because the error he’s talking about was when I added stuff to look at compile memory for a query plan and I was like, there’s no way anyone can have zero compile memory. It’s impossible. You can’t possibly have zero memory to compile a plan with. But you people, you people found a way…
Richie Rump: And that’s when Erik earned his senior developer wings.
Brent Ozar: God, people are creative. Alright, well thanks, everybody, for hanging out with us this week at Office Hours. And next week will be weird. Next week is the PASS Summit. We’ll see if Richie and Tara are around to see if they want to do one. I think Erik’s going to be in-flight or running around Seattle. I’ll be out in Seattle attending the Summit, so…
Erik Darling: I don’t know if I can say where I’m going to be on Wednesday. Is it Public knowledge yet?
Brent Ozar: That means I don’t know where you’re going to be on Wednesday. Oh no, yeah, sure…
Erik Darling: So lovely Claudia Coleman from Quest has sponsored a book signing. So she bought the first and only copy of my book, and she bought a bunch of those. I’m going to be sitting in the vendor hall signing those. So if you’re at PASS and you are too cheap to spend $30 on my book from Amazon, you can get a free copy with my signature, if you want it. No selfies, but you can get a signature and a copy of the book for free.
Tara Kizer: No selfies? What are you, Erika?
Richie Rump: It is a requirement that you take a selfie with Erik if you take his book. You have to take a selfie.
Erik Darling: I will punch you if you selfie in my general area.
Richie Rump: You know what you do? If he won’t take a selfie with you, then you just kind of angle your way out. As you’re walking back with the book, click-click…
Erik Darling: I’m going to wear my Richie mask so no one recognizes me.
Richie Rump: Horrifying. Straight horrifying. That is not attractive.
Erik Darling: Good times.
Brent Ozar: That will be us next week so see y’all later, adios.