This week, Brent, Erik, and Richie discuss replication, SQL Source Control with Git, installing service packs in cumulative updates in SQL Server, query tuning, write errors, log backups, Azure-managed instances, 3-rd party backup software, and DBA porn.
Enjoy the Podcast?
Office Hours – 3-21-18
Erik’s speaking at SQL Saturday NYC
Erik Darling: I know that we’re due to start soon but I wanted to just let anyone watching know that I’ve been accepted to present at SQL Saturday in New York City on May 19th. So if anyone is, I guess, anyone from the tri-state area wants to come on down and watch me babble about stuff, I’ll be there. One session I know that got selected was Query Optimization Explained in 15 Demos. I might have a second one since they said that there are still some open slots, so I don’t know.
Brent Ozar: Awesome.
Erik Darling: So I’ll be doing at least one there. So I look forward to maybe seeing all of you from a distance.
Brent Ozar: And we’re also always ready – or we’re always interested – to hear what people would want to hear us talk about too. So if you’ve got ideas for sessions you’d want to hear us give, feel free to put that into the questions as well.
Erik Darling: Replication. Fixing replication disasters.
Brent Ozar: Damn it…
Erik Darling: What stinks is you’re going to say that and people are going to say, like, a lot of HADR topics are going to come up and I am just gasless when it comes to HADR topics. I have nothing to say about failover clusters anymore. They exist; I’m sorry.
I have replication and Always On Availability Groups…
Brent Ozar: Speaking of which, so RimJim starts with a question. He says, “Hello, I have a replication question. We have Always On set up and the primary is also the publisher. We don’t have an AG installed. If I want to remove the main publisher…” None of us in here know. This is the funniest thing about doing the replication. One of the slides [inaudible] talks about going to dba.stackexchange.com for posting multi-paragraph questions, and we just tend to avoid replication like the plague; really because it is the plague.
Erik Darling: It is. It is a pox upon your servers. It’s such, like, an outdated technology at this point. There’s just so many replacements for it that are easier to deal with and are less brittle and break less.
Brent Ozar: And it’s not like it’s bad. I get that people go, oh this works for me and I want to keep using it. That’s cool, just you get that, as you’ve tried to ask questions because the thing breaks all the time and you’re trying to get changes made to it and nobody’s around to answer, you go – man, go look at Amazon.com – go look at the books on replication and then look and see what the authors are doing now; it’s not replication. They’ve all gone somewhere else.
Erik Darling: If you need, like, a primer in some troubleshooting stuff, Redgate has a free PDF about replication, but it’s old. It’s from 2009 or 2010 or something. Like, no one’s keeping up to date on that. Like, replication just hasn’t changed where there’s like a constant need for replication experts to keep their skills up to date. It’s like all this is the same problems for the last 15 years.
Richie Rump: Replication does not bring to the boys to the yard.
Brent Ozar: It keeps the boys from the yard. RimJim follows up with, “The reason we have replication is – we don’t care… “Because the subscriber, we have custom scripts that do not replicate all the delete commands.” You can do that with SSIS too. You can do that with stuff like continuously copying data from one place to another. You’re right in that this is what it was made for, that’s a good thing, but it’s just painful. It sucks. Richie, I don’t think he wants to ask questions.
Richie Rump: No, no, no I was waiting for the green light here.
Brent Ozar: Green light, go for it.
How can I get started with source control?
Richie Rump: Alright, so Anon would like to hear more about SQL source control with Git for a total newbie.
Erik Darling: There you go, Richie.
Richie Rump: No, you guys are the SQL experts. Please, go ahead. Talk to me about Git; I’d love to hear what you have to say.
Erik Darling: So, like, the Venn diagram of SQL expertise, and then like Git and source control expertise – if you can picture my hands as circles and then picture the Venn diagram slowly expanding as you add these words into the equation, they are far apart.
Richie Rump: And once you get Pluto, keep going.
Brent Ozar: Pretty far. My thought was, I couldn’t get into it until Jeremiah kind of forced me into it. He’s like, look here, let me show you how it works. And I think I would go – instead of coming at it from a SQL source control angle, I would go to your developers. Talk to your developers because they’re already really comfortable with Git, GitHub, TFS, whatever kind of distributed source control that you want to go use.
Richie Rump: Yeah, and there’s different levels, right. Because you could just use Git and just throw things in there and everything’s fine, but when you start working with other people – the Git workflow and there’s different ways of doing it and branching and merging – it can get pretty confusing pretty quick. So I understand why, you know, all these DBAs who are now forced to use Git, they’re really confused and they don’t want to deal with it. I understand it.
Erik Darling: I mean, if you wanted to give someone a pillow-landing for git, you could try using GitHub Desktop, which, kind of, GUI-izes a lot of the stuff that you have to do. That’s what I use because I’m soft. So it helps in that regard because I can do stuff and I can fix stuff. I’m still a little gun-shy when it comes to fixing a conflict on a merge or something, but for the most part – for the scripts we use, I know them well enough. I have enough domain knowledge about the scripts where it’s not awful for me to use the GUI interface for it. If you wanted me to use a command line, like actually use a Git, I would run screaming. [crosstalk]
Richie Rump: Yeah, even I don’t use the command line. Visual studio code has a Git lens hook-in – plug-in – and it works great. Source Tree – I use Source Tree for visualizations so – I mean, I could use the command line just like anyone else but I don’t need it.
Erik Darling: No, good stuff.
Should I stop SQL before installing CUs?
Richie Rump: So Don asks, “What are the best steps to installing Service Packs and Cumulative Updates on SQL Server; change to single user, stop agent service, stop service?” Blah, blah, blah, go on, go on, go on.
Brent Ozar: I’m a huge fan of – like if you have any high availability – patch the secondary first. Do it during the week, when you’re sober, like between nine and five. Once you’ve done it a few times, you get used to what you’re able to do without a reboot and what actually requires a reboot. But then that way, if the thing’s all patched up and ready to go in, it’s just a secondary. Then you can failover to it during your outage window; make sure that everything’s okay.
And in that scenario, the whole thing about stopping the service and agent and all that is much less relevant. You just don’t have to worry about it as much. If you only have one server and it’s up and you have to do the patching live, oh my god, it just makes me nervous. Like, you can do it but the whole concept of stopping agent and stopping the service and all that is a giant pain in the rear. I’ll tell people, if they want me to performance tune and get it right to that level, just give me a secondary instead.
Erik Darling: The other thing you’re looking at is cutting off access from the application so that no users can try to sneak in and hook in, or whatever. You know, if it is just a standalone instance, you really have no choice but to ask for whatever maintenance window you think you need to install and then rollback if the install, something goes awfully bad with it.
How can I change this scan to a seek?
Richie Rump: Okay, so [Tishal] – if I mispronounced your name, I’m so sorry. I write code all day, I’m… “I have a TOP1 ORDER BY descending query that has an index on it, but it does an index scan and reads about eight million records. sp_BlitzCache shows parallel warnings; no missing indexes. It’s one of the most expensive procs in terms of CPU. What can be done to change an index seek? Could cast be a problem?”
Erik Darling: There is a cast function in it, like…
Brent Ozar: There’s casting UTC reads…
Erik Darling: I mean, that’s not going to change much.
Brent Ozar: Yeah, I have bad feelings. Everything about this gives me bad feelings.
Richie Rump: Why? What are the code smells?
Erik Darling: I don’t know. What I want to do is see the query plan. I want you to take the XML and put it on PasteThePlan and then, if you want like super detailed help with it – because the amount of information you’re giving me just isn’t enough to tune a query with. When you say there’s an index on it, I’m like, okay, there’s an index on lots of stuff. Post it in dba.stackexchange.com – what a nice time for that slide to pop up. I don’t have anything to do for the rest of the day, so if you post it on there, I’d be happy to take a look and see if there’s anything…
Richie Rump: Well I have some issues that you could help me with, Erik.
Erik Darling: Okay [crosstalk]. Just try it in Postgres, Richie. Write a unit test.
Brent Ozar: The one thing that does stand out in there is you’re like, why can’t I get an index seek? If SQL Server is trying to compare two different fields because you’ve got two different fields in there I-date and B-date. Two different fields and either of them could be less than some number. My guess is that SQL Server is going, this is going to match a large percentage of the table; why bother doing index seeks on it? You could try rewriting it as a pair of unions so that one searches for I-date and one searches for B-date, but it’s not going to be a small change.
Should I partition to fix blocking problems?
Richie Rump: Okay, so Pablo asked, “Ola, amigos…” Ola Dora, [speaking Spanish]…
Erik Darling: It would be racist if anyone else did that. Richie gets to do it, using his Spanish privilege against us.
Richie Rump: The guy who is third-generation Mexican American and speaks a little Spanish who lives in Miami. It’s my burden to bear. “I have a third-party app that is currently making a huge amount of transactions per second on a particular table. It is getting heavily locked and the app is giving timeouts. Do you think that some kind of partitioning in the table may help the locking problem? Or maybe another suggestion?”
Erik Darling: No, not partitioning anyway. Just because if you’re not getting adequate chunks of data eliminated with the indexes you have now, I think trying to add partition elimination into the mix is not going to make that any simpler or give you any extra guarantees that it’s going to work.
Brent Ozar: And always define what huge is, because numbers mean different things to people. Is it like 10,000 batch requests per second, 50, 100,000 batch requests per second? You know, give me a rough idea there. Plus, too, if you’re running into lock escalation, when you say huge numbers, that immediately makes me think, if you’re tripping up past 5000 rows locked – because we’ll end up doing index scans instead of seeks – partitioning it probably isn’t going to help too much because you’re still going to escalate pretty quickly when you’re locking thousands of rows.
Richie Rump: Or when you’re popping and locking.
Brent Ozar: When you’re popping and locking. You did say, “Do you have another suggestion?” I’d say, depending on what the primary – if it’s readers blocking writers or vice-versa, RCSI might be a good fit; read committed snapshot isolation. If you go to brentozar.com/go/rcsi, we have posts about that. it’s a setting that you can just flip a switch and magically, blocking starts to disappear. There’s just so many gotchas with it that that’s why we have the whole page about it.
Richie Rump: Well, like tempdb would be the biggest one, right?
Brent Ozar: Yeah, the tempdb throughput and size. If someone doesn’t BEGINTRAN, their tempdb can just start growing like crazy. Richie, you’re going to kill me but your microphone’s doing the thing again, the static-y thing. I know right. Hold on – so that’s that. We’re all playing chicky. Go ahead…
Richie Rump: Is it better now?
Erik Darling: Maybe.
We have this problem with Windows failing to write…
Brent Ozar: Mike Chubbs asks, “We’re having this error that our Windows folks have no idea where to look at. The requested resource is in use when I try to write. During a write, it offset…” Blah, blah, blah, blah, “In file.” Two words – except I guess it’s really one word – antivirus. So it’s typically a filter driver; something that’s intercepting writes. Antivirus grabbed a lock on something. A filter driver – so you’ve got some kind of filter driver that’s trying to do compression or encryption or snapshots. Those are all common causes for that.
Erik Darling: I know that Semantic, or however you pronounce it, is kind of notorious for that sort of thing. So check and see if you have Semantic running anywhere. I know it’s going to sound like the awful, awful thing to say, but if you run Xperf, you can usually find the filter drivers that are using a whole bunch of CPU and doing stuff and figure out which ones are, you know, messing with your files. Good old ETW trace.
Brent Ozar: Nice. Richie, you want to test your microphone?
Richie Rump: Am I back? Hello.
Brent Ozar: You are back. You sound great.
Richie Rump: Okay, I need to stop using Chrome. I think Chrome is my problem.
Erik Darling: You have a Chrome problem?
Richie Rump: I may – I make everything chrome…
Brent Ozar: Justin Bieber.
Can a log backup cause a stack dump?
Richie Rump: Alright, so Gordon has a question. “Is it possible for log backup to have an exception access violation and subsequent stack-dump?”
Brent Ozar: Sure, corruption. Corruption will do that. If it tries to backup corrupt data, that will totally do it. I would check CHECKDB for starters. Anytime you have any kind of error-running backup, I would probably be worried about CHECKDB.
Erik Darling: I’m trying to think, that isn’t like guaranteed to be a CHECKDB thing. It could be an underlying – is there a different disk that you can try taking the log backup to, just to make sure it’s not a pure hardware issue. That might be something to do if the CHECKDB comes back clean.
Brent Ozar: And if you’re backing up to a network, try local. If you’re backing up local, try the network.
Erik Darling: And, you know, it might be a situation where, like, you have to add a second log file and get rid of the first one because it’s on a bad bunch of storage or something, too.
Brent Ozar: I should have also thrown out too, weirdo backup software, I’ve got a rash of calls in lately going, I use such and such backup software, and when we look at their data, they’re not doing backups. I’m like, I don’t know what that thing does. But generally, if you paid less than $100 for your backup software, it’s probably not doing what you want it to do. Richie laughs – it’s true. Big customers have emailed in like three times during sales prospects and they’re like, I use blank, and I go to the website and I’ve never heard of it before. I’m like, $49 with a 90-day trial?
Erik Darling: You just can’t software that costs less than $100.
Richie Rump: My copy of DataGrip cost more than that.
Erik Darling: I was bummed that didn’t work for me. The connection string worked first time in Postgres – [crosstalk]…
Richie Rump: But there’s something going on there. You just have to dig into it a little bit. Since you’re not doing anything for today, you could…
Erik Darling: That’s not true. I have to answer that person’s question about the query.
Richie Rump: Oh okay. I mean, that’s Robert’s question, right, who wants to know how to troubleshoot an OLEDB wait type.
Erik Darling: Ooh, ole.
Brent Ozar: It’s early for Cinco de Mayo. Linked servers is one way to do it. I’d say, hold up though, what makes you think OLEDB is the problem. Because often, people will run wait stat scripts like sp_BlitzFirst – we’ll do the same thing. We list OLEDB because sometimes it actually can be bad, it’s just super-rare. If it’s one to one with clock-time – like if your server’s been up for 1000 hours and you have 1000 hours’ worth of OLEDB waits, it’s harmless. It’s a monitoring tool like SQL Sentry, Quest Spotlight, IDERA SQLDM. He says, “I get it in the monitoring script.” There’s so many. I get it in the monitoring script too.
Erik Darling: Don’t we all? Like Brent said, remote queries will cause OLEDB, DBCC CHECKDB – it will register OLEDB waits – it just might not be that bigger deal, unless you’re running CHECKDB 24/7, just in a loop.
Brent Ozar: Oh god, he follows up with, “I’m moving 180,000 rows via linked server.” Yep, well, that will do it.
Erik Darling: We found the problem.
Brent Ozar: He says, “The waits go to two seconds and then drop.” Yep, that’s exactly what it does, is reset every two seconds. It still could be waiting on OLEDB, but if you’re trying to move 180,000 rows faster, I usually keep a USB floppy drive or something like that around. You know, a CD-ROM maybe might be faster than trying to move that kind of data around.
Richie Rump: Or SSIS?
Brent Ozar: SSIS – that’s a good one.
Erik Darling: Or, you know, just not moving 180,000 rows at once, you know. Move like 1000 – 5000 rows at once; something that’s a little bit easier on your servers and on your network and all that other stuff.
Any opinions about DACPAC deployment?
Richie Rump: Yep, okay. So, Brian has a question. Somewhat of a follow up to the SQL and Git source control question, “Have any of you worked with, or have any strong opinions, about the DAC-pack/SSDT based development and deployment?”
Brent Ozar: Yeah, so I tried it when they first came out. It sounded like a really interesting thing. And I try to keep optimistic about, hey, Microsoft’s got a new technology. I should play around with it. And the problem for me with DAC-pack based deployments was, if I wanted to make a schema change to a large table, it was really weird how it worked and it generated a ton of log data. So it seemed like it worked across small databases. And I’m going to use, as a number for small, less than 5GB of data. But it totally fell apart at like 50GB of data. The deployment was just brutal. Plus dropping columns was horrific.
I found that I could accidentally drop columns during the deployment and it was way too easy to go and do. So I was burned out on it within the first six months. I kept trying it and would go, ah it’s just not working for me. I haven’t heard from anyone else that they swear by it, so I don’t know.
Erik Darling: A lot of people who have to do that sort of thing just use Redgate SQL Compare and have much better luck with that.
Richie Rump: Yeah, I prefer SQL Compare. I am not a DAC-pack person whatsoever. You just don’t have a lot of control, from what I’ve used of it, in those deployments. At least with SQL Compare, I could say, here’s A, here’s B and give me the deltas and then I have a script and I can review it and whatnot, as opposed to, I’m just throwing this thing out there and it’s going to do something and hopefully it will work. I don’t know.
Erik Darling: So I’m going to give Redgate SQL Compare the best accolade that I can give a piece of software. It was so easy, I could use it.
Brent Ozar: I’d agree. The wizards are nice, the side-by-side comparison thing.
Erik Darling: It highlights where things are different.
Richie Rump: Does that work for Postgres? Because lord knows I could have used it yesterday.
Erik Darling: Why don’t you use – what was it you told me about? KDiff3.
Richie Rump: KDiff is essentially just for get and compare text files. It’s not going to go into your database and say, okay this is this and this is that. So what my problem was, yesterday, I did a deployment to Postgres and I had some scripts. And as I usually do, I’ll go ahead and put all these updates into a different script and I’ll keep track of it that way, if it’s gone into production or not.
I didn’t update the scripts from some of the updates that I had made in source control, and so I went ahead and deployed an old version of updates. And that’s when Brent saw all of this red and started laughing at me and things like that. But I fixed it. I fixed it.
Why do we need backup software?
Richie Rump: So Steve has a question, “Why backup software? Can’t we do this with SQL Server tools?”
Brent Ozar: My thought, from working – used to work for Quest, and so I was always worried about that question when people would come and ask, what do you need LiteSpeed for? Can’t we just do a native backup? The compression options, the encryption options, the monitoring, the log shipping GUI, all that kind of thing is phenomenal with third party tools.
Now, the ability to do integration with storage snapshots – there’s all kinds of – smart differentials, or one of my favorites, object level recovery. If you’re doing backups with LiteSpeed, you can pluck an individual object out of the backup. Because, newsflash, people don’t call you to restore a 10Tb database. They call you to restore one table out of the 10TB database, and then you don’t have any space to go do it. You have to shuffle it around from one place to another. Man, third party backup tools make that kind of thing way easier.
If you don’t have – I may make a generalization – if all your databases are under 1TB and you’ve got, say, less than 100 servers, then you could probably get by with native. But as you go past 1TB, man, the third-party stuff is phenomenal.
Erik Darling: I look forward to object level restore, people in Azure start having to restore large things in Microsoft, all of a sudden realizes, oh yeah, maybe it’s not so much fun to babysit a 25TB restore when I just have this 50,000-row table that I need to get back.
Brent Ozar: And especially if they’re paying SLA refunds to people when it all starts to take three days.
Erik Darling: Another cool thing about tools like LiteSpeed, beyond what Brent said with the object level restore, is transaction reading. So again, what no one ever tells you – all sorts of like great DBA porn – no one ever tells you to restore just one table. No one tells you to restore that table just to the last transaction log backup. There’s always, like, you know, we’re taking hourly transaction log backups and something happened earlier this morning, and you’re like, cool. I don’t know what that was. You don’t know when that was and you don’t want to have to go through that crazy restore process to get things back up to here and then realize you went too far and have to start the whole process over again. You really just want to get to the point where something bad happened and stop.
Richie Rump: So is DBA porn like, look at the IOPS on that server?
Erik Darling: Yeah, stuff like that.
Brent Ozar: Scan paged, PFS pages.
Richie Rump: Look at the waits on that guy…
Brent Ozar: Or when you go into DBA cubicles, it’s always the PerfMon counter poster up on the wall.
Erik Darling: It’s so true.
Brent Ozar: It is. That’s what we put. That’s exactly it.
What’s up with Azure Managed Instances?
Richie Rump: So you guys were playing with a new service that Azure just released. Why don’t you tell everyone about that and, kind of, what you found out about it?
Brent Ozar: So, Azure managed instances – this is kind of, I’m going to use the term replacement – but I think it’s the thing that people are going to light up in excitement for over Azure SQLDB because you get cross-database queries, you get easier backups and restores, you can log ship up to it, in a weird kind of way, if you want to restore full backups up in Azure Managed Instances. It has more compatibility with more features that Azure SQLDB doesn’t allow you to support. You can query DMVs, you can go into the master database, you can put stuff in the master database for that matter, if you want.
It’s relatively affordable. It’s pretty cheap when you think about it and you can even buy it in eight cores, 16 or 24 cores. You don’t have to worry about memory. You don’t really have to worry about storage even. There’s just a slider for how much storage you want. It’s not provisioned by price or anything; it’s just a slider for storage – how much storage space you need.
Erik Darling: Does storage size tie into storage speed?
Brent Ozar: No, there’s just a size thing and that’s it. So it’s pretty impressive. Easy to set up. Some of the things that we found were that there’s all kinds of new DMVs. There’s these new feature switches where it looks like they’ll be able to turn things on and off more gradually. There were thousands of features in the feature switches table. It’s pretty neat stuff. It gives me excitement. It’s DBA porn, I suppose you could say.
There was an issue that I caught around database corruption where you could corrupt a master in a way that your entire in managed instance would be unrecoverable. I have every confidence they’re going to fix that before go-live. But it’s like anything else, all software has bugs.
Erik Darling: I mean, this is brand new when we looked at it, too. The stuff that we found isn’t stuff that is going to be there forever or stuff that you have to be eternally vigilant about. Like, there’s a whole bunch of stuff in SQL agent that doesn’t work. Like, you can’t call jobs that use command shell. You can’t call jobs that use PowerShell. You can’t call SSIS job steps from it. There’s no maintenance plans in there. You can’t use Ola Hallengren’s scripts. There’s stuff that doesn’t work yet; all stuff coming down the pipe. The T-SQL things are different between managed instances and real live T-SQL, but you know, it will get there. I have a lot of confidence in this so far. I think they’re doing a lot of things right.
Brent Ozar: Man, for a preview, it was freaking phenomenal. Just awesome.
Erik Darling: My biggest gripe was – sorry, go ahead…
Brent Ozar: Your biggest gripe was what?
Erik Darling: The memory didn’t go up when the core count went up, but again, previews.
Brent Ozar: Alright, well thanks everybody for hanging out with us this week and we’ll see y’all next week on Office Hours; Adios.