This week, Tara, Erik, and Richie discuss using Docker in dev environment, in-place upgrades, disaster recovery, the minimum amount of CPU and RAM for new SQL Server VMs, troubleshooting long-running SQL agent jobs, dbcc checkdb running longer than normal, and more.
Enjoy the Podcast?
Office Hours Webcast – 2018-11-14
Have you used Docker for SQL Server yet?
Erik Darling: “Have you used Docker in SQL Server at all? What do you think about using it for dev instances? Short answer is absolutely no, I have not touched it.
Tara Kizer: Can barely spell it. I know they sell it at like Coles and all those kinds of places.
Erik Darling: Yep, you can get pleated ones, boot-cut ones, all sorts of things.
Richie Rump: Brent’s probably wearing some right now…
Tara Kizer: He might be, with a polo shirt… (Edit from Brent: worse: a buttondown.)
Erik Darling: With a nice weaved belt, something like that. [crosstalk]
Richie Rump: Oddly enough, pretty frequently, we don’t really do a ton with Docker or anything like that. The closest we get is Lambda, which is automatic. It does all that stuff for us. My opinion is, I don’t know what it buys you in production unless you’re really turning out a lot of servers all the time for different clients or whatnot. Maybe that might make sense in your environment. But for dev, maybe, but remember, just talking about the install, now you’ve got the data on top of that, which will change probably a lot more frequently than how frequently you create a dev server. It’s just easy. It’s more overhead, a little bit, so try it out, see if it works for you guys.
Erik Darling: It would almost make sense for us where we use a pretty static copy of Stack Overflow over and over again, but then I’m creating a new SQL Server and have to remember the settings and, like, the desired state with SQL Server is a crap-show. Nothing about it seems appealing to me.
When will the Summit recordings be available?
Erik Darling: Anyway, next question. There’s one from Colin, “Do you happen to know when the PASS Summit recorded sessions will be posted?” I’m not wearing a PASS shirt. I do not know. Please ask someone from PASS. They don’t tell us, mere mortals, these things.
Will in-place upgrades become more common?
Erik Darling: Joshua asks, “Do you foresee in-place upgrades becoming more common and acceptable for production systems given the shortened release cadence?” Tara, take it away…
Tara Kizer: I mean, absolutely not, for me. In-place upgrades are for a test environment. For production, I mean, I’ve supported some very critical systems and I will not take that risk of doing an in-place upgrade. I mean, it is not happening for critical production systems.
Erik Darling: Not for me either. You know, especially because with each new version now, they are making bigger and bigger changes to more and more critical parts of the system. Like, if you look at the stuff that came down the pipe for 2016 and then for 2017 with adapted joins and the more batch mode stuff. And now 2019, where you’ve got batch mode for row store and you’re getting all the stuff with UDF inline. If anything goes terribly horribly wrong, there is no back button that says, no I want it the way it was before. You are stuck if you just do an in-place upgrade. And if you are stuck with unhappy users and customers, you are most likely stuck on Monster.com for a while trying to figure out what happened.
Tara Kizer: Just think about your maintenance window that you would need for a rollback if something goes wrong there. I mean, I don’t have that kind of downtime in maintenance windows. I’m usually, you know, very small maintenance windows. In-place upgrade, not happening because of rollback reasons.
Erik Darling: No, everyone says, oh it’ll be so easy. We have a VM. We’ll just flip it back. Okay, fine, if anyone touched any data right leading up to that, good luck to you. No thanks. Not the kind of risk I’m willing to take with production SQL Servers.
Richie Rump: Yeah, and I think, you know, when you take a look at risk – I’m putting my project manager hat back on – that you need to actually write them out and say, hey this could happen, this could happen. And what’s the probability that can happen? Is it high? Is it medium? Is it low? And then how much you’re willing to swallow when you go to your management and say, hey if this, which is a medium risk happens, and we are down for five days, are you okay with that? [crosstalk] the risk assessment and then you go off and do what you need to do. So I find that, once they see it on paper, they’ll be like, oh really that’s a high risk, I’m not comfortable with that. Like, yeah, you know what, neither am I…
Erik Darling: You don’t mind if we don’t make money for a week, right? That’s cool, okay, we can just do this then.
Richie Rump: Yeah, no worries.
Does VMware SRM eliminate the need for AGs?
Erik Darling: Alright, there’s one from Marci now. She’s using SRM for VMware, that’s site recovery manager. She wants to know if that makes using Availability Groups for disaster recover redundant. How is that a different strategy form using, I’m going to guess, async AGs for DR? If anyone wants that, they can have it.
Tara Kizer: At my last company, we used Availability Groups, synchronous and asynchronous for HA and DR, and we were also virtualized. But on the systems where we didn’t need the really low RPO, we did use SRM. But that was like for an SSIS server and we were okay with – there wouldn’t be data loss in between. I see Availability Groups, database mirroring, failover cluster instances as lowering your RPO at the Windows SQL Server level. Because what happens when SQL Server crashes or Windows crashes, downtime is longer for those and, I don’t know, SRM, I certainly used, but it was not for my critical systems that had low RPO and low RTO.
Erik Darling: Yep, it was always for – so what I dealt with a lot working with relativity, which has – it’s not just like you have these SQL Servers. You have this sort of army of VMs that all do different units of work. Like some will index different things. I don’t mean like create SQL indexes – like create index files, some will hold files. Workers do all sorts of crazy different things within the relativity environment. So you end up with 200 to 500 VMs – tiny VMs, not big ones – that just do little things. And SRM was great for those because there was no, like, in-flight data on them. They were just doing a task when they got called upon. SRM was awesome for that. for SQL Servers, you know, if you have short RPO and RTO windows for SQL Servers, then I don’t think SRM is the solution for you. If it’s for dev stuff or it’s for things that are not critical where you have, like, 24-hour RPO or eight-hour RPO, then that might make more sense, because I just don’t think you’re going to be doing the snapshots. And I don’t think you’re going to be replicating data at a pace that’s fast enough to meet a short RPO window with it.
Should I set fill factor based on row versioning?
Erik Darling: Alright, next question. I don’t understand this one but I’m going to try to make sense of it. Joe has a job that’s set to change fill factor for row versioning where if most fill factors are between 75 and 90 on the same table, “What are your thoughts?” No. No is my thought.
Tara Kizer: I mean, how are they picking what number they want to use? Per index? That’s a lot of testing you would have had to have done.
Erik Darling: Yeah, it’s a lot of careful thought consideration and testing and not at all just picking numbers at random, I would say. If you could clarify that a little bit and maybe, you know, sneak in why they’re different for certain indexes, that might be helpful too.
What’s the smallest SQL Server I should build?
Erik Darling: Jeremy asks, “What do you recommend the minimum amount of CPU and RAM when standing up a new VM running SQL?”
Richie Rump: All of it.
Erik Darling: Is it on your laptop? What’s it for? It kind of depends. If I’m paying for the licenses then I want to use the minimum licenses that I paid for, CPU-wise. As far as memory goes, as much as I can possibly take.
Tara Kizer: As much as the Windows team will give me. As a production DBA who does not set up the hardware or Windows, I want as much CPU and as much RAM as you’ll give me, and I want more than that, actually. It depends on your system, what your system requires. Maybe eight – I don’t think I’d go any lower than, say, 64GB of RAM. But what happens if you have a 1GB database? It doesn’t need that much.
Erik Darling: No, so there are a lot of, I call them, local variables that might change the answer there. But you know, for me, I just want to make sure that I have enough hardware to meet the needs of the application. So it’s an application by application thing. If it’s a thing no one cares about, then I don’t know, two cores, 8GB of RAM. If it’s a thing that everyone cares about, well I’m going to have to have a bigger talk than that. someone’s going to have to get the credit card out.
Richie Rump: And those needs are going to change, and they most likely will change.
Erik Darling: Data grows. User accounts grow. Everything tends to get bigger and no one out there is archiving. It’s always just bigger and bigger.
I have this cursor…
Erik Darling: Let’s go here. Adam has a question about SQL Server 2012 Enterprise. He has a cursor based application and sometimes he runs into a problem when he’s doing stats maintenance when the cursor could not complete because the schema changed after the cursor was declared.
No, I haven’t run into that, but I don’t use a terribly large amount of cursors. Sorry about that.
Tara Kizer: I suspect that the app using cursors is completely irrelevant to what Ola’s stuff is doing. I think maybe you’ve got a concurrent job running that has changed things and – I don’t know. I don’t think that the application has anything to do with the error that they’re encountering. I think check for other jobs. See what else is running at that time.
Erik Darling: Sounds good to me.
How do I stop people from using regular tables as temp tables?
Erik Darling: Samuel asks, “How do I stop these DBAs…” he you-peopled us, “From using regular tables as temp tables, scripting out drops and adds. This feels wrong. I believe they are afraid of tempdb bottlenecks.”
Tara Kizer: Are you sure that they’re DBAs. I think they’re using that term loosely.
Erik Darling: I’m curious about that. So I guess, for me, it depends a bit on what they’re using those tables for. It’s kind of a wide open one. Okay, well it’s a staging table and they need to import some data, and maybe they want to keep that data around for longer than just the session they’re in. if you could do something like you could make them – if they’re going to do that – either use their own schema so that you know that they’re in a separate place, or prefix the table names with something to let you know that it’s not, like, a real table. And you could just have a job that cycles through, like once a month or once a week, and looks for tables that haven’t been touched in some period of time and just drop some out.
Richie Rump: I guess the regular tables and temp tables, it depends on what you’re doing, right? So if I’m loading like a large amount of data, I’m probably not going to do that in temp tables. And I’m processing that, I’m going to create my own regular table and do that. it depends on what we’re doing.
Erik Darling: It all gets weird. Tempdb is a strange place. Usually, DBAs aren’t afraid of it, though.
I need to troubleshoot long-running Agent jobs…
Erik Darling: Let’s see, Ronnie asks, “I’m suddenly encountering long-running SQL agent jobs that are executing for several hours overnight. Can you recommend a starting point for me to isolate the issue, other than SQL agent logs? Not all jobs are affected.” Do you have a monitoring tool, Ronald? I feel bad calling an adult Ronnie, so I’ll call you Ronald.
Tara Kizer: I would probably just log current activity to a table using sp_whoisactive and then see what was going on. Maybe log it every 15 to 30 seconds, just for this time – although I normally log 30 to 60 seconds in production always. But just get some monitoring in place. And if you don’t have monitoring, sp_whoisactive is a good place to start, and we also have BlitzWho that can do it too. I just like the additional detail whoisactive has.
Erik Darling: That’s a good place to start. Agent jobs are tough because, you know, if it’s a maintenance task like a DBCC CHECKDB or index and stats stuff, then there’s not really a whole heck of a lot you can do about it, so start there. And it might be something that you’re not expecting too. It might be like locking, blocking. It might be something that is not necessarily that the job is performing poorly, it’s just that the job is held up waiting on something else.
Tara Kizer: Yeah, and check what the wait stats are during that time window. Are you now waiting on memory, disk, see what’s going on.
Erik Darling: All that jazz.
Should I tweak fill factor for RCSI?
Erik Darling: We have one last question here. Oh, it’s a follow up. Joe says, “They have this job to adjust fill factor to account for additional bytes for RCSI. They swear…”
Tara Kizer: What? Who are these rocket scientists? I have been using RCSI for years. I mean, I’m talking not quite 10 years, but nine years, and I have never ever done that.
Richie Rump: I’ve never even heard of that. That’s amazing.
Tara Kizer: There is additional overhead for RCSI. I forget what it is. It’s really small though, like, I don’t know, four bytes or something, but I’m not changing fill factor. This is crazy. Oh, man…
Erik Darling: Well you know what – I would set fill factor back to 100% for a week and be like, if you can find me a change, if you can show me on the application where things got worse then we can go back to your way. But until then, we’re going to stick with this. That’s a whole lot of, like, we’re going to fix everything but what the actual problem is.
Tara Kizer: Oh my god, we have page splits…
Erik Darling: Exactly…
Tara Kizer: What year are we in?
Erik Darling: Everything but the problem.
How should I tune CHECKDB?
Erik Darling: Alright, one last one. Marci has another one, “If you have time for this, I’m wondering what to do checks for when DBCC CHECKDB runs for longer than normal.” Corruption, that’s what I would check for. Let’s see – disk I/O, and the error log. I would look for 15-second I/O warnings or messages about saturation or flush cache and just – because usually, when that starts taking a long time, it’s usually a sign that CHECKDB, reading all that stuff from disk, is not having a good time of it.
Tara Kizer: And check with your SAN admins if this is a SAN, because maybe they could see another server running something big and it was just the timing.
Erik Darling: Yeah, because most DBAs, when they set everything up, even though they’re hitting a SAN, all the backup jobs are at the same time, all the CHECKDB jobs are the same. Everything is very regimented. [crosstalk]
Tara Kizer: Of install scripts and you have to create the CHECKDB. We would just run them and then we wouldn’t go back and change the job scales. It was just a template that we would use basically. And we had 700 servers at that company, so you can imagine how many CHECKDB…
Erik Darling: Uh-oh, Tara was consumed by the flames. You know, at least she froze smiling. That’s nice. I’m going to take a picture of that for her…
Tara Kizer: Jobs we had all launching at the same time…
Erik Darling: She’s back.
Tara Kizer: Am I back? I need to call our internet company. It lasts like 30 seconds, so it’s not just Webex, because I could hear you guys still.
Erik Darling: Wild, ain’t that something?
Richie Rump: Look at that.
Erik Darling: Alright, well that’s all the questions we have this week, folks. We should get out of here before Tara’s internet apocalypse continues; before it infects all of us. Alright, thanks for joining us. We will see you next week for another wonderful edition of Office Hours. Goodbye.