This week, Brent, Tara, and Richie discuss what they think about having no more service packs for SQL Server 2017 and newer versions, shrinking log files, how many servers can one DBA manage, log backups, mirroring vs Availability Groups vs log shipping, and their next company retreat location.
Enjoy the Podcast?
Office Hours 10-4-2017
What if I have a really long question…
Brent Ozar: Doug says – Doug has a whole paragraph; my goodness. He says, “We have several SQL Servers running agent jobs and they each record their own history in tables. I’m trying to centralize everything… I don’t see anything that links job steps together in any system table or DMV; is there?” I don’t know; I’ve never tried that.
Tara Kizer: What are we linking with? Job steps to what?
Brent Ozar: I don’t know, job steps through the same job ID?
Tara Kizer: Yeah, I mean that’s in there [crosstalk]
Brent Ozar: I would see – ask what you’re going for on Stack Exchange. Go to DBA.StackExchange and then break out what information you’ve got and the DMV script you’ve got so far. Also, the other thing with Stack Exchange is, as you’re pasting in the question, or as you’re typing in the question, they fill in for you other related questions. And I know I’ve seen a bunch of questions about how to write query for job history in there.
What are your thoughts about no more service packs?
Brent Ozar: John says, “What are y’all’s thoughts about no more service packs for SQL Server 2017 and newer?”
Tara Kizer: Well I think that we’re going to find out in ten years when everybody has upgraded to 2017, finally. What’s the adoption going to be anyway? So I don’t think we’re going to see much of an impact at all? I mean, who’s going to be running 2017?
Brent Ozar: I didn’t think of that until you had said it in the company chat room, and then I went back through and looked today. If we go to – and I’ll fire up a web browser and go pull it up – if we go to SpotlightEssentials.com, Spotlight Essentials is Quest’s free monitoring software, and they post the data that they gather. If you click on Collective IQ up at the top, it shows you all kinds of stuff. Like right now, the current adoption rate for SQL Server 2016 is 9%; 9% out of all the servers that they monitor, and that’s more than a year old now. 2016 is more than a year old, so…
Tara Kizer: And I would bet that most people aren’t running – this is a free tool, right? So I would bet that a lot of people who are running SQL Servers aren’t even using free monitoring tools because a lot of people running SQL Servers don’t even have IT staff, so they wouldn’t even know to look for some free monitoring tools or what to do with these things. So this, probably, is just for people that have IT staff, and it’s probably a small percentage of the SQL Server world.
Brent Ozar: That’s true, because people with money, budget and time and all that, they’re more likely to buy monitoring tools. They’re also, I think, more likely to go early cutting edge on adoption stuff. Yeah, Richie feels the pain of constant upgrades. Richie’s in the midst of upgrading PasteThePlan and had to deal with all kinds of dependencies this week.
Richie Rump: Yeah, it’s interesting, y’all complain about an SP, you know, once every six months or whatever. Meanwhile, I have – PasteThePlan has a good 12 to 15 different packages that it’s using via Node Package Manager and every single one of them need to be updated; every single one of them. So yes, that’s what I did for the upgrade…
Brent Ozar: And it breaks everything. Like upgrading one of those breaks everything else.
Richie Rump: It could. I mean, it definitely could. I got lucky with this one; there wasn’t much breaking. But that’s why you have unit tests, to make sure that when you upgrade, the tests run and everything kind of works. But it gets kind of dicey when you’re just doing blanket updates all over the place.
Brent Ozar: My feelings about no more service packs – I love it, it’s fantastic. Cumulative updates make everything so simple, there’s just one train of updates. Before, people were like, “Should I be on this branch or this branch? Which one’s the most current?” Makes it easier for vendors; they just test their stuff on one path straight through the code. I’m just a huge fan. They’re now coming out monthly, which I’m a huge fan of if the quality is good. So cumulative updates will now come out every month for the first year. Most people don’t install them in the first year, so it’s not heartbreaking. It’s only going to be the cutting-edge people. But every month for a cumulative update – when I read through the list of what gets fixed in some cumulative updates, I think it’s kind of scary. But then you read what gets broken in some CUs, like CU 4 recently broke NOLOCK – I say recently it was about a year ago, 2016 CU 4 broke NOLOCK. I’m like, that’s not the kind of risk you want to take.
Richie Rump: Oh, and there was a bunch of stuff that happened when 2012 was released.
Brent Ozar: Service Pack 1 was a mess.
Tara Kizer: I wonder if this change in policy by Microsoft is going to even delay people further from upgrading because it’s going to be 2017 and all newer versions. So if you work for a company that only will apply Service Packs, maybe they’ll even avoid 2017 or higher for a while.
Richie Rump: Yeah, or maybe just say, “You know what, if it’s been around a year, we’ll install it.” And the one year is our, you know, “We’ll start playing with it then. Until then, we’re just going to let them get the bugs out.”
Brent Ozar: Which makes sense, I mean, if at the one year point it switches from every month cumulative updates to every quarter, so it’s just easier to manage.
Richie Rump: But that’s not us here.
Brent Ozar: Yeah, we’re rough.
Why is my transaction log so big?
Brent Ozar: Chris Fair says, “I ran sp_Blitz and I got the message that my transaction log was larger than my data file. This database is in bulk-logged recovery mode and we only do a log backup once a day at noon…” I love the next sentence that comes out of his mouth. “I’ve been here a month and I don’t know the reasons behind either decision. Backups have been only 1GB in size, yet the log file is 67GB. I know you generally frown upon shrinking log files, but in this case, is it okay? And I’m only going to shrink the log file.”
Tara Kizer: It’s okay, as long as you fix the reason why it’s so big. The reason why it’s so big is because you’re only doing one log backup per day. So fix that, run maybe hourly or even more frequent, and then see what size is needed. So shrink it down. I’d probably shrink it down to 0.5GB, and then with those hourly or more frequent log backups see what size it grows to. It might still grow a bit past your 1GB data file, but maybe you need every five minutes log backups. I’ve had now four clients that have done one log backup per day, and it just is crazy. I don’t even know what the point of this is.
Brent Ozar: If you’re only going to recover to one point in time, that’s what simple model is for, simple recovery model. If it’s like a data warehouse where you’re only loading stuff once per day, yeah, maybe simple recovery model is okay. But odds are, Chris, they brought you in because the last person was either incompetent or there was no last person. So good news, you’re asking very good questions. I bet money, the last person didn’t even know about sp_Blitz toop, because that happens all the time. Even when we come onsite with customers, sometimes they’ll be like, “sp_Blitz, tell me more…” [crosstalk]
I have over a thousand SQL Server 2008s…
Brent Ozar: Chris says, “No question – here’s my sadness. I have 39 2008 SP3 servers, I have 1326 2008 SP4 servers…” And then goes on with the rest of his version numbers. But you have enough time to sit here and watch webcasts when you’ve got over 2000 SQL Servers. You can’t be that bad off.
Tara Kizer: I imagine he’s working for a large company, and I would hope there’s a large DBA team for this amount of servers. The most I’ve had at a company was 700 servers, but we had five people at one point, I think at the lowest, and then eight people near when I left.
Brent Ozar: We got this blog post that I did a while back: How Many Servers Can One Person Manage? Where I talked about – I’m going to show a little graphic up here on the screen. If all you manage is hardware, so if you work for Facebook, Amazon, Microsoft, Google, whoever, if all you manage is hardware, you’re running through racks and data centers, one person can manage thousands of servers. When you layer on an OS, Windows, Linux, whatever, as soon as you layer on the OS and you’re responsible for installation, patching, security, configuration and requests from end users, one person can typically manage hundreds of boxes. When you layer on SQL Server, or really any database, one person can effectively manage 50 to 100 instances of SQL Server.
But you’ll notice, as you get to this layer you’re often giving up the previous layers. If you’re managing SQL Server and you want to manage, say 100, instances by yourself or per DBA, you’re probably giving up hardware, maybe even OS, to other teams, and then one person can effectively manage more instances. When you layer on high availability and disaster recover, clustering, replication, log shipping, always on availability groups, one person can really manage 10 to 50 instances of SQL Server. And then when you layer on performance, one person can effectively manage between one and five applications. This is where it gets a little weird. Sometimes, if you’re kCura Relativity DBA, one kCura relativity DBA may be able to manage performance across 50 SQL Servers, because they’re all running the same app. Performance tuning is exactly the same across all of them. But it just lays out a good idea of how many servers one person can manage. Sometimes when I see that, people will go, “Trust me, I manage more than that.” Well yeah, but you do a crappy job; I looked at your servers…
Tara Kizer: And Christopher followed up, they do have one DBA guy. Christopher’s not a DBA guy, he’s a developer. One DBA guy that also handles the hardware – he’s also the sys.admin. So I can’t even imagine what a hot mess these servers are in if there’s this many SQL Servers.
Brent Ozar: And imagine having a failover from one data center to another.
Tara Kizer: They don’t even have that in place – or it’s some kind of VM snapshot or something.
Updates just introduce more problems
Brent Ozar: Raul points out, “Updates sometimes just introduce more problems and complexity.” Yeah, I’d agree with that. I mean, a lot of times, you look at a lot of the cumulative updates, they’re fixing amazing stuff, don’t get me wrong, I’m really happy with what Microsoft’s doing out there with the product.
Richie Rump: That’s one of the things I like about sp_Blitz, is it will tell you if you’re running a dangerous build. If you apply one of those patches or whatever and it has a problem, running sp_Blitz will tell you you’re running something that’s bad; you probably should upgrade.
Brent Ozar: We even warn you if you’re on 2016 and you’re not on Service Pack 1, if you’re in Standard Edition, we’re like, “Yo, dog, get up to 2016 Service Pack 1, it’s amazing.”
Should we be in simple or full recovery model?
Brent Ozar: Steve asks, “We back up databases and logs every hour. Is there any reason for us to be in full recovery model?”
Tara Kizer: Is this the one that was in simple recovery model, same one? No?
Brent Ozar: No, it’s a different guy.
Tara Kizer: I mean, if you’re doing hourly log backups you need to be in bulk-logged or full recovery model in order to do that. You can’t do log backups in simple, so which recovery model you’re going to pick depends upon the business. It’s not for us to decide. I mean, does the business require you to do point in time recovery to, say, data loss RPO goal of say five minutes. You need full recovery model. I don’t even like bulk-logged recovery model on critical databases. I mean, it reduces my recovery point, so why would I want that?
Brent Ozar: I know what it is, and I’ve been through certification tests and I’ve seen what the purpose of it is, I’ve just never actually used it. I know there’s somebody out there who makes good sense of it, but it just has never been …
Tara threw out there the letters RPO and RTO; we’ve got a blog post on that – The Nine Letters That Get DBAs Fired. So we talk about what RPO and RTO is, we talk about how to CYA using our disaster recovery spreadsheet, where you check out what your RPO and RTO are and hand that to the business to that everybody’s on the same page. That’s our post, The Nine Letters That Get DBAs Fired.
Tara Kizer: My client this week, as well as last week, their RPO and RTO goals have been one day across the board for all eight boxes. It’s like wahoo, it’s so easy. [crosstalk] At five minutes it gets a little challenging. How much money do you have?
Brent Ozar: Everybody, usually when we show them that spreadsheet is like, “I don’t want to lose data and I don’t ever want to go down.” What’s your budget? “$5000.”
What’s the right HA/DR technology for me?
Brent Ozar: Raul asks, “If your organization has the means and budget, their RTO, recovery time objective, is 45 minutes. Would you go for AGs or mirroring?” I know what Tara’s answer is.
Tara Kizer: Well yes, I’d go with availability groups, just because it has more flexibility and is not deprecated. But as far as AG or mirroring, I don’t care one way or the other for the specific question. I want AGs if I want multiple replicas, because with mirroring you only get one mirror, and you have to pick asynchronous or synchronous; you can’t have both. So availability groups give me HA and DR and the option to have reporting as well. So, either one is fine. As far as a 45-minute RTO goal, that’s fine. [crosstalk]
Just go for log shipping at that point. I’m a fan of log shipping, but it’s harder to failover. So if your organization is going to be testing failovers, that is why we moved away from it at my organization. We moved to mirroring and availability groups because we did failover during disaster recovery sites and ran production out of there on purpose. And so we were trying to reduce our downtime for these failovers, and log shipping’s a lot harder to do.
Brent Ozar: It is. In that HA and DR worksheet that’s – on The Nine Letters That Get DBAs Fired, it’s got this three-part worksheet and down the left-hand side you find how much data you’re willing to lose, and then across the top is how long you’re willing to be down for. 45 minutes, you’re basically in the one-hour category, and if you go down through there, the methods are very different depending on how much data you’re willing to lose. If they’re not willing to lose any then it’s kind of tougher. You’re really down to either synchronous AGs or synchronous mirroring, and even those have some gotchas there. But if you’re willing to lose, say an hour, maybe if you’re willing to be down 45 minutes, maybe you’re willing to lose half an hour, an hour’s worth of data, log shipping can be pretty cool.
You mentioned that failover has to be automatic and seamless to the clients. There’s a thing called a DNS CNAME that you can use to point around the server from one place to another; it’s just way more work than availability groups. I say way more work, that’s probably a stretch.
Tara Kizer: Also, client redirection needs to transparent, that means on failover the user can keep working and doesn’t see any interruption, and neither availability groups nor mirroring give you that, or anything really.
Brent Ozar: I read that as like we can’t change the client app, but it’s going to be down for 45 minutes. It is not going to be transparent, they’re going to see right through your 45-minute delay.
Brent Ozar: Raul says – oh come on Raul… He follows up with, “Log shipping requires a lot of DBA intervention plus failover testing.”
Tara Kizer: So do the others.
Brent Ozar: If you’re saying that you think you don’t need testing for AGs or mirroring, think again, they are kind of tricky. Alright, well that’s actually all the questions that we have, folks. Y’all got done in 16 minutes. I’ll give you guys another minute or two to throw out any questions you’ve got there, otherwise, we’ll let you go early to lunch today.
Richie Rump: I realize that I have an ampersand on my shirt and so do you. It’s weird. It’s like ampersand day.
Tara Kizer: I do not.
Richie Rump: You didn’t read the memo.
Brent Ozar: Tara’s the only one in the group with taste.
Tara Kizer: Actually, this is my pajama top, so I don’t know about that…
Brent Ozar: Somebody asked when I was in the Isle of Man last week, they were like, “Do you guys actually get ready for work and get dressed?” I was like, “Well we have a webcam and we work with clients on webcam, so we have to put shirts on at least.”
Tara Kizer: It was funny when we were out in Austin to work with Dell, the Quest people now, and we all dressed in whatever we wanted to wear and a lot of the people at Dell were a little bit fancier and Claudia one look at my attire and said, “I wish I could dress like that here.” I don’t know if she was insinuating…
Brent Ozar: I wish you could dress like that here too.
Is it possible to roll back sp_changepassword?
Brent Ozar: J.H. asks, “Is it possible to rollback sp_changepassword?” No, not that I know of.
Tara Kizer: Rollback if he wrote down the password somewhere, the old one.
Brent Ozar: Michael asks, “What’s a good way to filter bad data in a big ETL job? Someone fat-fingered a date of January the first of 2107 instead of 2017.”
Tara Kizer: Well, data validation, I guess, it’s probably part of your packaging.
Brent Ozar: Richie, do you still have that sp data profile?
Richie Rump: Yeah, I haven’t touched it in a while, but that won’t tell you – it will tell you the min and max of some stuff…
Brent Ozar: Just that you have wacko edge case data.
Richie Rump: Yeah.
Brent Ozar: Alright, well thanks everybody for hanging out this week and we will see you next week on Office Hours. Adios everybody.