This week, Tara, Erik, and Richie discuss 3rd-party tools for object-level restore, patching, stringing multiple CTEs, Meltdown and Spectre, instance stacking, organizing DBA teams, DBCC checks, moving local databases to Amazon RDS, CTEs vs temp tables, and backups and restores.
Enjoy the Podcast?
Office Hours – 1-17-18
Erik Darling: Let’s start with this first question, “Hi, we have Idera SQL Safe, but it failed for very bid DBs when using object level restore. Redgate has expired this functionality recently. Can you please suggest other third-party tools to achieve object level restore?” I like Quest LiteSpeed. Quest LiteSpeed will do it.
Tara Kizer: I think that’s a good [crosstalk] – I think that’s the number one reason – one of the number two reasons that people like that product; object level restores, and then the log shipping is better with it than IDERA.
Erik Darling: Also, it’s super cool to – like they have the log reader built in, so you can read through your transaction logs and see which transactions did which bad thing. So, not only can you restore a particular object, but you can restore that object to the point in time to when something went awful. So that’s a really good tool. That’s like all my – that’s like everything that I’d want. Because I’m going to level with you guys; I hate doing restores. I hate it. It’s like the worst.
Tara Kizer: It’s boring.
Richie Rump: Why would you have been a DBA if you didn’t want to do backups and restores? That’s like the whole job.
Erik Darling: What I wanted was SA access so I could do all the cool stuff I wanted to do with perf tuning. I didn’t actually want to do backups and restores; that’s boring as hell. There’s just no glory in that. Like, no one is ever like, hi-five, dude; that was a great backup you did.
Richie Rump: Should have been a dev DBA, or just a database developer.
Erik Darling: Do you have any idea how hard it is to get SA when you’re that guy? It’s like impossible. No one gives that guy SA.
Richie Rump: I never wanted SA. That’s the furthest thing from what I want. I’ve dropped too many tables in production; no.
Erik Darling: That’s where all the good trace flags are; with SA. [crosstalk] trace flags – do stuff, get crazy.
Erik Darling: Let’s see. This one we’ll go right to Tara on – limited values of quality answers will occur. “Best way to do SQL Server patching with merge replication. What order should I patch my servers in?”
Tara Kizer: I don’t have any experience with merge replication, but we have some clients that are using it. As far as replication in general though, there is no order. It doesn’t really matter. And I’ve even patched all three at the same time, just because I want to get patching over with. If the publisher’s down, the data’s not flowing anyway so might as well also do the subscriber distributer at the same time. I don’t really see that there needs to be any kind of, you know, one server at a time. I care about if it’s an Availability Group, failover cluster instance – I want to make sure that the down time’s like 30 seconds and I’m bringing up the instance as quick as possible. That I pay attention to, but not as far as the replication instances go.
Erik Darling: Sounds good to me.
Erik Darling: Wes asks, “Can you string multiple CTEs together?” Wes, I can tell that you have the internet – I’m pretty sure…
Richie Rump: No, don’t…
Erik Darling: That the documentation is available to you. But since I’m not entirely sure, what I’ll do is I’ll show you how to do it. Now, I’ll open up management studio and we’ll get cracking on this conundrum; can we string multiple CTEs together? I should have done this when I saw it coming, but I was dumb.
Erik Darling: While we wait for this, there’s another question, “Do you have any reasons why someone should not or need not do Meltdown or Spectre patching?” Jeez, personal question.
Tara Kizer: I’m just glad I’m not a production DBA for this. The companies I’ve worked for would have been all over this; especially when I worked at Qualcomm, you know. Security team and patching monthly, we had this whole process and I can’t even imagine being a production DBA with that going on; especially as there are issues with performance because of it.
Erik Darling: Yeah, so my advice would be, if you’re not virtualized – like if you’re running on bare metal and you’re not using CLR – this is the advice from Microsoft. If you’re on bare metal and you’re either not using CLR or all the CLR is written by people in-house that you trust, then you’re not really all that vulnerable to Meltdown or Spectre. I’m just going to say anecdotally, I did patch everything for that. Some of my patches were against my will. Microsoft installed them while I was asleep and then I couldn’t do anything about it. They even restarted everything. And just anecdotally, creating indexes has been taking a lot longer.
Tara Kizer? Really?
Erik Darling: Like when I go to create – usually I can create indexes on the post table in about 30 to 45 seconds on my nice computer. Now it’s taking like a minute and a half, two minutes, to do some of these. I did a presentation on 2017 at the New York User Group last week and I had this great spill demo. I left my house with the laptop, and right before I left I tested everything and everything worked great and I shut down my laptop. When I restarted it at the user group, patches got installed and it, all of a sudden, wouldn’t finish.
Tara Kizer: Oh wow.
Erik Darling: It just stopped. Like, I was in high-performance power mode, my laptop was plugged in, the stuff was in memory already from doing other stuff. It, like, did a count query to make sure what I wanted would be in memory and it just failed completely. Like it ran for two and a half, three minutes – I was sitting there staring at it like, this took 15 seconds before. Oh man.
Erik Darling: Anyway, Wes had this question, “Can you string CTEs together?” And yes, Wes. You can string CTEs together all day. And we will do CTE 1 as – and I’ll pretend that I remember all the syntax off the top of my head – we’ll just do the SELECT TOP 1 from users – we should probably pick a column though, right? TOP 1 ID from users – and depending on what you want to do you can also – I’ll drop that down a line to make it look more clear – is use a comma; advanced comma technology. You can do CTE 2… [crosstalk] This is why I do all my typing before I leave the house usually. Look at this, I’m on this badly.
Tara Kizer: How much alcohol has he had today?
Erik Darling: Not enough, because usually when there’s a little booze in me I’m flying through this stuff. To run this, we can either SELECT* from CTE 1, or we could just, you know, forget that and we could just do SELECT from a completely different table, like posts. And we could do something else there. So you can string CTEs together just using commas and separating the CTE names with a comma. Then you can select either from the prior CTE or do a completely different query in there, and then eventually when you’re done you can SELECT nonsense from crap; and whatever else you want to do – whatever data you want to get.
Richie Rump: [crosstalk] CTE crap?
Erik Darling: Yes.
Richie Rump: So I didn’t even know you could do that until last week when I was going through a few things in sp_Blitz and I saw that there was a CTE that had strung three things together. And I’m like, “Wow, I didn’t know you could do that.” And the second thing that went through my head was, “Wow, I bet you Erik did that.”
Erik Darling: I’m sorry…
Richie Rump: That had an Erik code smell to it.
Erik Darling: I’m sorry, man. Which check was it?
Richie Rump: I’ve got to look it back up. I’ve gone over about 20 of them over the past two weeks, so…
Erik Darling: Because there’s one – rather there’s like a couple of them in particular that I remember having to do that with, where I was going through maintenance plan stuff. So I remember, I had to like shred the package XML and then do some other stuff, and I just remember having to do that with those. So that’s probably where it is.
Richie Rump: I think that’s exactly where it was, because I think I was bitching to Brent about having – stringing the multiple CTEs, but also having shredding the XML in there as well.
Erik Darling: Sorry about that.
Richie Rump: Am I asleep? Is this a nightmare? I can’t – it took me a while just to go through, like a half a day just to figure out, okay, well what the hell – first of all, what the hell are those system tables. And then, what the hell is all this XML crap and how does it all work?
Erik Darling: Sorry about that. I’ll try to make ity up to you somehow. I don’t know how…
Richie Rump: It’s not you, it’s Microsoft.
Erik Darling: It is. It is their fault. They did give me the ability to write code that bad, and then stuck all of their maintenance plan package details into almost completely undocumented XML schema.
Richie Rump: It worked, and I was shocked.
Erik Darling: I’m always surprised when my code works. Okay, I’m preemptively getting a link for a question that I’m going to ask.
Erik Darling: Someone named John – someone not named their actual name is saying that their boss…
Richie Rump: John Snow…
Erik Darling: Yeah, that is his actual name – is saying that they want to – their boss is buying big servers as part of a consolidation project. And they want to stack instances on this one big server and just have everything live on there on different instances. And we actually have a post about why you shouldn’t instance stack like that. And I’m going to point you to that URL, then if anyone wants to chime in and say something else about it, well, go ahead now.
Tara Kizer: And then also, you had a client that wanted to put tons of databases on a really beepy server, and even that wasn’t going to be recommended because they were all just so different. They weren’t the same application. So definitely, this IT leader – I don’t know.
Erik Darling: So whenever people do that, they really just shoot themselves in the foot because it’s almost impossible to do meaningful performance troubleshooting. And I know that you think you’ve bought your way out of having to do meaningful performance troubleshooting by buying like one big beefy server, but you’re going to hit a lot of walls really quickly. Namely, I think the biggest one I see is around CPU scheduling. So like when you license SQL Server, it’s by the core. And for each core, you get a certain number of threads. I forget what the exact calculation is. It’s like the number of cores minus eight times two, or something like that. It’s bizarre. But the bottom line of that is that your thread count doesn’t double when you double your core count.
Like, if you go from four cores to eight cores, you go from 512 threads to 576 threads. And when you run out of threads, you run out of basically workers to run your queries or perform backups or do any background synchronization tasks. Really, it’s just an added strain on the system that’s unnecessary. If you’re going to pursue something like that, I’d much rather see you virtualize and just have separate VMs for things. So at least then, if you had to dig into a problem, or if you wanted to try separating a part of your workload off, you’d have a much easier time doing that than you would if you just crammed everything into one big server. It’s really hard to move a single instance when you’re doing something like that; anyone else – anything fun?
Richie Rump: No, that was good.
Erik Darling: Alright, cool. I felt okay about that one. I felt like I didn’t completely swing and miss on that. [crosstalk]
Erik Darling: “We have five DBAs in the database team and we plan to divide up the roles and responsibilities. Do you have a source for definitions on how DBA teams are typically organized?”
Tara Kizer: So I can answer this, since I’ve been on a lot of database teams…
Erik Darling: I’ve always been so lonely.
Tara Kizer: Lonely, yeah; not me. We had ten people at my last job and five at one of my jobs. So all of these jobs I’ve been at, it’s been divided by application. So if you’ve got an application that requires a lot of a DBA’s time, performance tuning, whatever it is, hand-holding with the developers, then assign that to one person and then maybe they get some smaller applications. So you split up the applications that are using the SQL Servers to your five DBAs, and just making sure that they don’t have too many applications to support.
When I say applications, I’m talking about the applications the user is using and then the DBA supports that SQL Server, you know, that database, and provides help to the developers and maybe does all the application releases for maintenance windows and things like that. It’s always been either application based or, you know, when we would have a shared server, there’d maybe be a shared server assigned to a specific DBA. But as far as responsibilities, no; that’s not how it was split up. And I’m talking about almost 20 years of experience on database teams.
Erik Darling: yeah, another common way to do it, so I hear, is you actually split things up by responsibility. So like, you might have your infrastructure DBA who is responsible for like the backups and the patching and making sure that HA and DR is working well. Then you might have a more development-oriented person who’s like monitoring the server performance issues, tuning queries, tuning indexes. And then the actual developer DBA, who’s like designing tables and writing initial queries and maybe tinkering with .NET type stuff. So I don’t know – two approachable ways to split things up. I think that which one you choose is going to depend on the strengths of your team and kind of how they want to take things on.
If someone on your team is like, “Dude, I really want to do performance troubleshooting. This is what I’m into.” Don’t be like, “Well congratulations, you inherited this AG. Go patch it.” Then you’re going to only have four people on the DBA team pretty soon. Don’t do that. Don’t pigeonhole people. I think some of it too might depend on how granular you want them to get with things. If you want them doing performance tuning, clearly you can only set them on so many servers before they lose track of exactly what server does what, what application does what and all that stuff. So a good monitoring tool would be pretty necessary for somebody following that route.
Richie Rump: From a development perspective…
Erik Darling: A what?
Richie Rump: Development, have you heard of development? Exactly – I kind of lean towards the direction where we have a DBA per app. And the reasoning is, if I have to explain a problem to one person, and then something else happens a few days later or a week down the road, now I’ve got somebody else coming in, now I’ve got to re-explain all that stuff. I like to have my DBAs involved in my stand-ups and have them involved in all the decisions that are being made about the application which will affect the database. S so if we’re able to get that one person, or even two people if we’re super lucky, into the project team and into the application, I think things will run a lot smoother. Opposed to, hey, I’ve got five people but I’m dealing with Bob today, Mary another day and Tim another. Now we’ve got three people doing the same-ish type of thing and everybody needs to be brought up to speed every time.
Tara Kizer: Yeah, we had it like that at my last job. We were assigned to agile teams, and then production DBA work, which was not application related at all – so not talking about performance tuning in production, just production DBA stuff; looking at alerts, disk space, things like that. That got split up by whoever was on call that week. That person had to take care of patching if the maintenance window fell on their week or any kind of disk space lurch; things like that. Because usually, the production DBA work isn’t as involved as the other stuff, the application type stuff, where you’re attending meetings to, you know, get them to learn the application and know what the project timeline is and where you fit in that. But, you know, your mileage may vary, of course.
Erik Darling: Alright, there’s a question here form Colin, which is pretty interesting. “My DBCC checks take a long time…” First, congratulations on taking DBCC checks. That’s a good step you’re doing; you’re doing a remarkable thing…
Richie Rump: The remarkable 1%.
Tara Kizer: You are in the minority of the clients.
Erik Darling: You are in a very elite position since you’re doing that. But they take a long time and Colin is getting these messages in the error log saying that I/O requests are taking longer than 15 seconds to complete. And sometimes, they’re numbering in the tens to hundreds of thousands. What could possibly be done to correct this issue; anyone?
Tara Kizer: Are you going to answer it?
Erik Darling: Okay, usually when I see that, CHECKDB does a fair amount in tempdb. And usually what we find when people are running into that is that either they have bad disks; there are slow disks behind tempdb. Or, if they’re on a SAN, then they have really crappy pathing from the server to the SAN. It’s not actually the I/O requests. It’s not actually hitting the disk that’s taking 15 seconds; it’s queuing up all these I/O requests and sending them across this one connection between the server and the SAN. And that’s what’s slowing things down tremendously.
You could try running CHECKDB with physical only, because that skips over a whole bunch of checks. Obviously, it’s a less thorough check, but it might get you a reduction in those I/O requests. Other than that, you would have to talk to your storage team, figure out – if it’s a VM, even if it’s a bare metal server – what kind of path you have between the server and the SAN and if it’s all local storage, and figure out what kind of god-awful garbage bin sale your boss picked those [jives] up from, because they are not doing you well. That’s about all I have to say on that.
Tara Kizer: And sometimes when you see the 15-second warnings in the error log, it’s going to only be at night. Not always, but it’s only at night – I mean, you’re having some issues with I/O, but is it impacting your users? So if I/O is fine during the day but it’s bad when CHECKDB is running – I don’t know, a lot of times I’d ignore that. It just depends on your workload and if you’re 24/7; like truly 24/7.
Erik Darling: Or if, you know, something like that is causing your checks to run so much longer that they end up in the production day. That’s something I might look at as well. But generally, I agree with Tara. If it’s not really impacting users then it’s not really something I’d spend a lot of time messing with.
Erik Darling: Alright, there is a question from Ron, which I think we’re all dreading. “Are there any tools to move a local database to Amazon RDS?”
Tara Kizer: Gosh, I don’t remember Amazon RDS. Amazon would provide the tools, I would think. I know Azure does. I don’t know.
Richie Rump: Ouch…
Erik Darling: Yeah, right; low blow there, Tara.
Tara Kizer: I’m just saying, out of my knowledge. I know that Azure has tools, I just don’t know what anybody else offers.
Erik Darling: I know that there’s a migration tool. I don’t know of anyone who’s used it or done anything successful with it. Typically, when people are moving stuff up to RDS, databases are either really small, so it’s not really that big of a concern, or they’re like hiring one of those Snowball mechanisms to come so they can put data on it and then ship those drives off to Amazon to add.
Tara Kizer: Also, if you don’t have a DBA, it can make sense to use the non-VM solutions in the cloud.
Erik Darling: The reference I found – use the Azure tool, but the link is dead – Ron, I’m sorry, I’ve got nothing to do on that. If it’s that big of a database, call Amazon about a Snowball. If it’s not that big, I don’t know, do bulk export and import or something. I don’t know. I think you can probably use S3 buckets to stage some files if you wanted to load them in.
Richie Rump: Yes, have you ever put Stack Overflow Database on an RDS?
Erik Darling: An RDS? No. We have EC2 instances.
Richie Rump: Okay, so you’ve got the VMs.
Erik Darling: That’s what the – our AWS lab is all EC2. I am, admittedly, not up to date on migrations to either RDS…
Tara Kizer: But EC2 is just a backup and restore because it’s a VM.
Erik Darling: Yeah, exactly. [crosstalk]
Richie Rump: That sounds like an interesting thing to do would be, hey, how will we get the Stack Overflow database into AWS?
Tara Kizer: Isn’t it too big? I don’t know [crosstalk]
Erik Darling: No, size-wise it’s fine. The only limitation with RDS would be like the number of databases, 30. Usually, I would kick against all those pricks if someone was trying to get me to migrate a database that size up to RDS. I’m like, “No, I think I’m going to do this myself.”
Richie Rump: Right. And I haven’t done any RDS migrations. Everything I’ve messed with RDS has been greenfield; so sorry.
Erik Darling: Alright, “Are there basic guidelines to help you choose between using CTEs and using temp tables?” Anyone? Buelers? Buelers?
Tara Kizer: Do you actually have a blog article on this topic?
Erik Darling: I don’t know, maybe?
Tara Kizer: Is there something, like, you’ve written?
Erik Darling: Yeah, something stupid. So what I would say…
Richie Rump: I bet you he’s published more than he’s actually put in the draft queue.
Tara Kizer: Definitely.
Erik Darling: So one thing I would say is, if you need to reference a common table expression more than once – if you need to join out to it more than once, that’s when I would probably want to use a temp table. And I’ll give you a quick example of why…
Tara Kizer: Typing again…
Erik Darling: Oh shush, you. So, SELECT 1 as X. So I have this CTE and I can select from the CTE. And this works all quite well. Turn on query plans so this makes a little more sense. So if I run this with query plans turned on, I get this thing where I do a constant scan of that one value once. If I join as C1, and I join CTE as C2 on C1 X equals C2.X, I will get a slightly different query plan. Oh, no I won’t. Never mind. We’ll do this differently.
SELECT ID from – this will help it sink in a little better because it’s actual tables. Users and – why is that – okay, no it’s not…
Richie Rump: X…
Erik Darling: Your mom. We do this – that’s going to be annoying. Let’s just do the TOP 1. Let’s not waste anyone’s time. So I do the SELECT TOP 1 here, I will get that back, and I will have two scans of the users table. If I add another join in and I join CTE as C3 on C3.ID – wow, this is just bad. Someone should give me typing lessons. I’m like worse than Richie.
Richie Rump: Someone needs Mavis Beacon, like quick.
Erik Darling: I do, badly. If Mavis Beacon had a SQL course I’d be much better off. All of a sudden, my execution plan will have three scans of the users table. So every time I need to go and touch that common table expression again, I’m going to have to re-execute the syntax in there. So if I have to touch that CTE many times, then that’s when I’m going to want to look at using a temp table instead, because a temp table is persisted. When you persist data out, you don’t have to re-execute that syntax over and over again, then you can even index the temp table in a way that helps your query, rather than relying on base table indexes.
Richie Rump: I think when using your temp tables, the code’s a lot more readable than using CTEs, because a lot of people don’t use CTEs very often. I mean, even you stumbled on the syntax just a little bit. But you did pretty good there. That was pretty impressive.
Erik Darling: Well I wasn’t stumbling on the syntax; I just have dumb fat fingers that I can’t type with. There’s a disconnect about right here where things just get lost in translation.
Richie Rump: I could rip out a create table statement, but CTEs I just don’t use as frequently as I would a select or select into, or an insert into. So I think there’s something to be said about, hey, we’re only doing this once, it’s a temp table. Just go ahead and get it out because the junior DBA will be able to read it, or even probably the junior dev will be able to read it, as opposed to CTE and then explaining all the intricate reasons why you would use a CTE over a temp table.
Erik Darling: Yep, including that, you know, if tempdb is a disaster for you, then adding to the tempdb disaster may not be the best choice.
Erik Darling: We’ll finish up on this question from Hanan. He asks, “My current client…” Are we answering questions for another consultant? Is that what’s going on here? “My current client uses VM backups through VMware and quiesces applications. Would that be suitable for VMs running SQL Server? Thanks.”
Tara Kizer: Yep, as long it’s quiescing, you know, freezing the I/O, taking the snapshot and resuming I/O, that is a valid backup. It’s a good backup, especially when you end up having really large databases, because you have backups – [inaudible] backups can take hours. So if you get like 20TB of VM snapshots [crosstalk]… So that is how you back up large databases.
Erik Darling: Yeah, I did write a blog post pretty recently about some of the problems that you can run into with VSS snaps; namely around the amount of time that you are waiting to freeze and thaw I/O for them. So we’ve run into this a whole bunch of times with clients where they’ve taken VSS snaps, which I’m not like fundamentally opposed to. But some part of the process is just taking a long time, and we’ll see like 30, 60, 90-second pauses between the I/O being frozen and the I/O being thawed. And that whole time, no one can really do anything in the database. The database is also kind of frozen, so you know, if you’re doing that once at night, who cares? If you’re doing that every 15 minutes or ever four hours during the day, you might want to reconsider that, or at least keep an eye on the freeze and the thaws. Because if you’re freezing data for that long during the day, all of a sudden you’re adding the amount of time that data is frozen onto every query; that is no good.
Tara Kizer: And it also creates a backlog of work for SQL Server too, because everything’s pending until the I/O has resumed. So I had a client that was running them hourly and it would take like half an hour – it was happening – I figure the snapshot was fairly quick; not great but fairly quick. And it would take like a half an hour before the backlog would finally complete.
Erik Darling: Yeah, all the writes would start screaming to get through and, I don’t know, plug each other up or keep blocking or something. Anyway, we have miraculously somehow filled another entire session with questions and answers. And I’m amazed because I had my doubts at first. But it’s 12:45 and that means it’s time to go. Thank you all for joining us, asking questions, keeping us honest; we will see you next week.
Tara Kizer: Bye.