[Video] Office Hours 2018/02/28 (With Transcriptions)

Last Updated 6 years ago

This week, Erik, Tara, and Richie discuss cloning databases without 3rd party software, tips for upgrading from 2005 to 2017, archiving databases, partitioning databases, version control, SQL Server settings, SSRS, moving database from stand-alone server to an AG cluster, limiting memory use at the query level, and mirroring.

Here’s the video on YouTube:

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

Enjoy the Podcast?

Don’t miss an episode, subscribe via iTunes, Stitcher or RSS.
Leave us a review in iTunes

Office Hours – 2-28-18

How do I clone production with less data?

Tara Kizer: Alright, we’ll start off with [Narav’s] question. He has a production database that’s 500GB. He wants to create a clone database with limited data size for developer and testers in local environment. He wants to know how he can achieve this. “No third-party software allowed in my company.”

Erik Darling: Oh boy, no third-party software. It’s almost like they want you to recreate the wheel every day; what an awful place to work. You should get a new job. But on your way to get a new job, you should go look at this blog post by Tom LaRock.

He wrote this – it shows you how to go and script out a stats only copy of your database so that you don’t have to use a third-party tool. If you’re on a newer version of SQL Server like 2014 – or I think even 2012 with one of the service packs – you can use DBCC CLONEDATABASE to get a stats only clone of your database for developers to use. But, at least when I checked it. There were still some security issues around that – like if you’re okay with your developers seeing the contents of the histogram, there can still be personally identifiable information in there.

Richie Rump: Alright, so I think the key word there is developer and tester – and as a developer and tester, I’m probably going to be running some stored procedures and looking at data and doing that kind of stuff; so this doesn’t really help you that much. So my big thing here is that one, you should have already had your database scripted out and at least all your prime data scripted out as well. And that should be checked in into some sort of source control. So if you’re not doing that, you’re not even at, kind of, base-level yet.

You’re trying to get way up high on the database pyramid of needs and you need to get to the bottom, so get that first. And then when you get that, then you can start eyeing out the data that you want to move over, and that probably should be scripted out as well because, at some point, you’ll want to refresh all that data. Moving data over from production to development is usually a bad idea because production doesn’t really represent, a lot of times, that testing that you’re going to want to do. Testing, you want to do a lot of oddball things; a lot of strange things. And production data is usually pretty uniform. It’s usually pretty good because everything’s going through the application and everything kind of works the way it should. In test…

Erik Darling: You don’t see a lot of applications, do you?

Richie Rump: Listen, I see a lot of stuff, but you don’t really see a lot of that – you see this stuff in outliers in production, but the great majority of it is pretty good data because it’s gone through the process of cleaning and whatnot. So you’re on a long trek, and if you haven’t started your application doing it this manner, it’s pretty painful. But once you get there, then you can actually do the automated deployments and automated builds and automated database testing and things of that nature, and you’ll start getting really high up on that database pyramid; hierarchy of needs or whatever that you need to do. But it’s hard work, it’s all scripting out stuff. And that’s really the best thing we have right now, especially if you’re not using third-party tools.

Erik Darling: If you just need to, like, move chunks of data over, if you’re too lazy to script it out, you could use SSIS on another server and just move some new stuff over like every night so it was fresh for dev. You’re still going to run into the stuff he talks about, where data might be a little too sanitary and a little too easy, but you can always add stuff on top of that as well if you need to test some edge cases.

Richie Rump: Yeah, and if you need to automate any of those tests, those are going to continue to be changing as well and that’s not going to help at all. So yeah, there’s no easy answer to this. We’ve been trying to do this now for well over 15-20 years, and we still don’t have a good answer for it. There’s some third-party tools that kind of help, but there’s still no panacea and great answer for all this.

I feel your pain. I just went through this whole thing this last month of automating database testing and getting everything into scripts and doing all that stuff. So yeah, it’s just hard.

Erik Darling: Alright, let’s get that next question up.

Why don’t I see SOS_SCHEDULER_YIELD in sp_WhoIsActive?

Tara Kizer: Alright, Pablo asks, “Monitoring tool says that the principal wait on a server is SOS_SCHEDULER_YIELD with greater than 70% waits, but when I run WhoIsActive during the day I see so few activity or none. Does it mean that there’s a problem or just few activity?”

Erik Darling: Right, so just because that makes up 70% of what you’re waiting on, doesn’t mean you’re waiting on it all that much. I don’t know who your monitoring tool is; you didn’t mention the name of the company that does the monitoring in your question. But a lot of monitoring tools are really bad at showing you what your server is actually waiting on and if it’s waiting on it a lot. So really, monitoring tool dependent, try to dig into when the server is busy and see if you are actually spending a significant amount of time waiting on anything. Otherwise, you’re just kind of sitting there staring at a bored server.

SOS_SCHEDULER_YIELD is one of those waits where it’s like, that’s cooperative scheduling. That’s SQL in the OS saying queries are going to run for a little bit, they’re going to step off, they’re going to go back on the scheduler. There’s really not a whole heck of a lot you can do to fix that, aside from faster processors or having queries run less often or other goofy things.

Tara Kizer: Yeah, and also run sp_BlitzFirst during the day when you think that you have the highest load. It might be at night that you have the greater than 70%; it’s hard to say. But run sp_BlitzFirst @ExpertMode equals one, and then @Seconds, say five minutes, so 300. Then you might see the SOS_SCHEDULER_YIELD waits there. Running WhoIsActive might not capture it because that’s just a moment in time, whereas you need to be able to have something run and then run again and then do a diff between the two result sets; like sp_BlitzFirst can do.

Erik Darling: Yeah, it’s just when I see that as being 70% of the waits on the server, I don’t think that server’s doing a whole heck of a lot else.

What should I encrypt, and how?

Tara Kizer: Alright, so Rick – oh, he said he wanted to be anonymous, but just because he’s currently on SQL Server 2005. Alright, I didn’t say the whole name, so that’s alright. They’re finally upgrading to SQL 2017 from 2005. Some people are suggesting applying encryption and compression and who knows what else, “Any advice on what to stay away from and what to slowly apply?” Are you testing this stuff? I mean, why are you implementing – I like compression, but I’m not going to be implementing encryption unless I need it, because that brings up all sorts of problems.

Erik Darling: Yeah, especially around TDE – what kind of encryption? There’s like five different kinds of encryption now. Alright, we’ll come back to that one when Rick gives us some more – I mean not Rick – gives us some more information.

How can I archive data fast?

Tara Kizer: Alright, [Narav] has another question, “What is the best method to archiving databases because archiving database scripts take long time when we remove the data from the database?”

Erik Darling: Are you talking about archiving an entire database or just part of the data? I don’t know…

Tara Kizer: I imagine historical data.

Erik Darling: Okay, I like partitioning for that; anyone else?

Tara Kizer: Yeah, definitely…

Richie Rump: Yep.

Tara Kizer: Table partitioning. It’s been around a long time and you now get it with SQL 2016 Service Pack 1, right, even in Standard Edition. It’s been an Enterprise Edition feature since before that.

Erik Darling: Yeah, so like, definitions of archiving are always funny because I’m never quite sure if it’s like, we’re going to get rid of this whole database, or we’re just going to take this chunk of data and move it to a different database, or if we’re just going to delete this data. S-o partitioning, and even partition views, work out fairly well for a lot of that stuff. But if you’re just wholesale deleting data or putting it into a different database, then obviously some sort of data-moving mechanism is the best way to do it. Richie probably knows a few of those…

Richie Rump: Yeah, man. Partitioning, as much as I love it, you’ve got issues; especially when applying queries or indexes and things like that. It can definitely be an issue when you go ahead – are they coming after you? Is that me or is that you? I don’t know. But it can definitely be an issue when you’re doing some query tuning and what not and you’re trying to apply an index across the entire table, and then you do your partition swap and you can’t do it because the index is going across and they’re not aligned with your partitions. So that’s a problem with the partition swapping.

As far as why your archiving is slow, I would take a look at your indexes and try to work out why those queries are particularly slow. Is there something that you’re missing? Do some perf tuning on your archiving queries and see what’s going on there.

Can you combine a failover clustered instance and an AG?

Tara Kizer: Alright, New York City sounds like London, by the way. There were always sirens. It was never police; it was always ambulances. Alright, Eric asks, “Can you combine a failover cluster instance and an Availability Group so you have a failover instance with an Availability Group for reporting?” That answer is yes, but I’ve never done it so I don’t know how complicated it is. Do you have any info, Erik, on it?

Erik Darling: Yeah, me and all my Availability Group experience…

Tara Kizer: I’m the only one at the company that has done Availability Groups in production, and I’ve never combined the two but I’m fairly confident that this is possible.

Erik Darling: No, you can totally do it. We’ve had clients in the past who have combined them, but as far as, you know, HADR typography goes, it’s pretty complicated; especially if you’re going to stick a reporting layer on top of it.

Tara Kizer: As a matter of fact, Brent has done it on past clients that I know of – are public names. I’m not going to mention them here, just in case there’s any kind of issue. But I know, when we do the HADR stuff with clients and we talk about Availability Groups. In there, it shows some pretty complicated setups that do combine the two technologies.

How can I integrate version control with SSIS?

Tara Kizer: Alright, Anon asks – and I think that’s anonymous – “Have any of you used version control with SQL? If so, how easy do you think it is to integrate when there is no current version control? We have a lot of SSIS packages and changes.

Erik Darling: Richie, take it away…

Richie Rump: What kind of version control are we talking about? Yeah, I’ve used version control with pretty much every version of SQL Server that I’ve dealt with. Even just starting from just scripts, we go and check into VSS – visual source safe for those old guys – and then all the way up through TFS and now pretty much use GitHub.

Tara Kizer: Have you used Subversion though?

Richie Rump: I’ve used Subversion. I didn’t like it at all.

Tara Kizer: Yeah, we’ve used Subversion a few jobs ago and I wasn’t a fan.

Richie Rump: Yeah, there was a really old version when I first joined up; DVCS or something like that. That was a nightmare. But yeah, so I’ve used a lot of different source control kind of stuff. Yes, you can get it into SSIS. I think with SSIS, as I recall with TFS, there was an issue where only one person can work on it, really, at once because of – the merging stuff wasn’t as easy because it’s all one big XML file [crosstalk]. And merging it just wasn’t easy at all. It’s not like working with c# text files and doing all that stuff. And merging is fairly simple. Merging everything together when you’ve got one version here form one person, another version here and you’ve got to bring it together. That was pretty difficult, from what I recall, with SSIS.

So we had a rule that only one person can edit SSIS at one time. I’m not quite sure with the newer Git type stuff because I’ve never used Git with SSIS, if that’s still the case. But I’m assuming so because the file format still hasn’t changed and whatnot. So, I say go ahead and throw it out there. And it’s real easy to get things in, it’s just a matter of getting your team and your workflow optimized as far as getting everybody up to speed and things like that. There’s no downside to version control.

Erik Darling: Hopefully it’s not still a single-threaded process then.

Richie Rump: Oh god, TFS was, when it was first released, was the worst. It was just so bad.

How should I set Max Degree of Parallelism?

Tara Kizer: Alright, Tom asks – he’s currently on SQL 2016 Enterprise; 40 cores with 1TB of memory. Yay…

Erik Darling: jealous, right…

Richie Rump: I mean, who isn’t really, right.

Tara Kizer: Jealous…

Erik Darling: I’ll gladly take that off your hands [crosstalk].

Tara Kizer: “What are the best settings for max degree of parallelism?” He’s currently set to eight. And then cost threshold for parallelism, currently set to [five].

Erik Darling: Please, you spent all that money on a server and no money on training. What happened?

Tara Kizer: I like his settings, personally, but…

Erik Darling: Yeah, I’d stick with those. I wouldn’t change those one bit, unless I observed some sort of issue.

Tara Kizer: Yeah, and if I was going to change anything where I wanted to affect MAXDOP or cost threshold for parallelism, I would do it at the query level. I would keep your server settings as they are. At SQLBits conference last week, and somebody’s session – Erin Stellato’s query store session, the topic of cost threshold for parallelism came up and she asked the audience who sets theirs to 50, or some number around there; not five. You shouldn’t have it set to five. And half the people raised their hand. Then she asked, who sets it to a different number based upon what you see as the workload, you know, examining the plan cache, and half the people raised their hand; the other half.

Which I’m, in my head, I’m calling BS because there’s no way that many people are doing this because I don’t know that you need to do that kind of work. I know that there’s a blog article that was in the Slack, in our company chat room yesterday about setting cost threshold for parallelism to a value based upon what you see is the workload. But I’m a fan of setting it to 50 and then adjusting specific queries with the option query – whatever it is – query hint or query option. Whatever it is, you can change max degree of parallelism at the query level. So I don’t think you should change server settings because there are generally good.

Richie Rump: So you’re okay with MAXDOP at eight?

Tara Kizer: For a 40 core server.

Richie Rump: For a 40 core?

Tara Kizer: Yeah.

Erik Darling: Yeah, I’m fine with that.

Tara Kizer: I haven’t gone higher than that.

How can I make SSRS go faster?

Tara Kizer: Alright, Steve asks – he’s moving a batch of reports from 2014 to 2016. They were originally developed for 2008. He has serious performance problems with one report. What’s the best resource of SSRS performance issues? I wouldn’t even think that the issues in reporting services. I would be suspicious of the cardinality estimator since they were developed in 2008 and you’re going from 2014 to 2016. Oh, but 2014 to 2016 is going to have the same cardinality estimator, if you upgraded your compatibility levels. So I would be looking into compatibility level here if it’s suddenly slow. But from 2014 to 2016 – what is the database engine version? That’s what I need to know. Is it 2014 SSRS, database engines lower? I’d be suspicious of the cardinality estimator.

Richie Rump: Yeah, or possibly you’ve just got a bad plan in there.

Tara Kizer: Yeah. I would not be suspicious of reporting services being the culprit. I would look into performance tuning the actual query.

Erik Darling: SSRS is always a bad plan. [Git tag, low].

What’s the best way to move a 1TB database…

Tara Kizer: Alright, Dorian asks, “What is the best way to move a 1TB database form a standalone server to an Availability Group cluster to minimize downtime?”

Erik Darling: You ‘neek up on it… Oh wait, that’s not it.

Tara Kizer: What?

Richie Rump: Postgres.

Erik Darling: Ooh, SAN snapshots.

Tara Kizer: [crosstalk] restores and then apply transaction logs. So do a full backup and a restore differential as you get closer, and then apply transaction logs. And you can make that switcheroo within seconds if you’re fully scripted. So make sure you’re applying transaction logs. So you can use database mirroring or log shipping to do this. You don’t have to do it manually. But yes, you want to do some kind of method which allows you to do restoring of transaction logs. That way, you’re mostly up-to-date once the maintenance window hits, then you’ve just got to apply your final transaction log.

Erik Darling: Yeah, for stuff like that, I’ve done either log shipping or mirroring. Either one is fine. I liked mirroring, just because it was a little bit easier to manage and I could keep it in asynchronous mode until I needed to switch over, then flip it to sync and boom. I’m done. I moved about, I don’t know, 15 or so terabytes in one night with that once, and that worked out well for me. So I would do that, but what do I know?

Tara Kizer: No, it’s a good solution. It’s really easy to switch another synchronous move, failover, and boom, done; two commands. Getting from async to sync can take a little bit of time, depending on how busy your system is.

Erik Darling: Yeah, but if you wait until you’re in a lull, you’ll generally be okay. Or if you’re on a SAN and you can take SAN snapshots or you can do something else that is, you know, a pretty quick snapshot and remount of the data. That’s sometimes pretty good too, It really depends on how busy your system is and what your maintenance windows look like. I would assume that you’d get some kind of maintenance window to flip over to a brand new set of hardware like that, but again, what do I know?

It’s a crazy world out there. I love it when people ask questions on Stack Exchange and they’re like, “I have this problem with a thing that doesn’t work” And you’re like, “Here’s a pretty easy solution.” And they’re like, “I’m not allowed to use temp tables.” Like, where do you live? What’s going on in your life? Are you on Oracle? I don’t know.

How can I tell what tables someone is accessing?

Tara Kizer: Alright, Ronnie asks, “Is there a way, in SQL 2012 without running profiler in the background, to identify tables that a specific login has accessed?”

Erik Darling: Audit.

Tara Kizer: Tell us more. One word isn’t sufficient.

Erik Darling: I don’t know; that’s it. SQL Server audit, you can set it to, I think, like a single login or a single set of tables or all logins or all tables. You can just figure out who’s run a select or insert or update or delete on them. I’m not promising that it’s pain-free and everything’s going to work quickly and nicely and, you know, you’re going to make a lot of friends by turning that on, but you could do it. I think that would be kind of your best shot.

How are jobs handled in an Availability Group?

Tara Kizer: Alright, Tom asks, “With Always On, do agent jobs exist on all nodes in case the primary fails? Do they all run simultaneously?”

Erik Darling: 40 cores, 1TB, an AG…

Tara Kizer: Alright, so the first part is, Always On – you’re talking about Availability Groups, but that also covered failover cluster instances. So failover cluster instance is part of Always On, and yes, the agent jobs exist in the instance and that fails over. But you’re really asking about availability groups here. Do the agent jobs exist on all nodes? That answer is no, you are required to maintain those other replicas. Now, what we did at my last job was pretty nifty. SO we didn’t have to pay attention to all these other replicas. We had another server, which had to be licensed, of course, that was the job scheduler.

And so we used the Availability Group listener to connect to wherever the primary was. It didn’t matter. It could be in San Diego, it could be on a disaster recover site. We used Availability Group listener and that follows wherever the primary goes. So that was a nifty way of getting away from having to keep all the jobs up-to-date on all the other replicas; so having another server that’s just responsible for running jobs for Availability Group databases.

Erik Darling: I want to say that Brent has a post, that I’ll throw in the chat in a minute, that’s called Using SQL Agent as an App Server. Basically, it talks about a similar concept where you just have that one SQL Server that sits out there. All it does is run SQL agent and you can just point those jobs anywhere you want.

Tara Kizer: And then the thing to note, if you aren’t going to use this other server, like we mentioned, because you don’t have the licenses for this other server and you are going to have other jobs all in sync across all replicas – in order to ensure that they run on failover, each job needs to have an if statement. Am I the primary replica? And if so, then I run the job step. If not, I just bail out. So they all have this exact same job schedule. They’re all enabled. They all have the same job step, but you just need to do a quick check – am I the primary replica? And if not, bailout.

So yes, they all will run simultaneously, but all the other replicas, all the secondaries won’t actually do any work.

Erik Darling: Yeah, and if you want to see a pretty good example of jobs that take those kinds of precautionary steps before they run, go look at the source code on Ola Hallengren’s scripts, because all those will do certain checks to make sure that the databases either primary and not read-only and some other stuff that makes running maintenance tasks difficult.

Can I limit memory use by a query?

Tara Kizer: Alright, Mahesh asks, “Is there a way to limit the memory use by query, other than resource governor? When I do sp_WhoIsActive and I see a couple of queries use a huge chunk of memory, like 23GB, and causes resource semaphore waits for all other queries.”

Erik Darling: Ooh wee, yes. Well, if you’re on SQL Server 2012 or up – well, 2012 SP3 or up – you can use the max_grant_percent hint and you can cap, at the query level, what percentage of memory a query is allowed to use. By default, It can ask for 25% max server memory, so you can use max_grant_percent to bring it down lower. If you are on a version of SQL Server prior to 2012 or you’re uncomfortable with that hint, for some reason, you would just have to get in and do some necessary query index tuning to make that query ask to less of a memory grant.

Tara Kizer: And if you can’t modify the queries, you can add a plan guide on top of it to apply what he just said. That gets a little tricky.

Erik Darling: Or you could just stab him?

Tara Kizer: Or just use resource governor. I know the question says other than resource governor, but that is one of the reasons to use resource governor…

Richie Rump: That’s what it’s for.

Erik Darling: He might not be on enterprise edition.

Tara Kizer: Oh, that’s right; got you.

When I upgrade from a standalone to an AG…

Tara Kizer: Nestor asks, “Do you switch roles for the final move or do you just remove log shipping or mirroring?” So he’s talking about upgrading from a standalone Availability Group. The thing is, once you failover or you restore to the other server, you can’t go back. So yeah, you just drop them. But they are gone, there’s no rethinking in log shipping in place anymore.

Erik Darling: But I’ll usually leave that other server online for a bit, just in case I do need to say, like, abandoning this upgrade, whatever data is there, we’ll try to manage loss on and we’re going to go back to the old one…

Tara Kizer: Yeah, and the upgrades I’ve done, the abandoning our upgraded server occurs only in the maintenance window. So maybe, we’re four hours into our maintenance window. Maybe our maintenance window was supposed to be shorter, but we kept troubleshooting, troubleshooting, troubleshooting, and then we abandoned it and went to the old server. So there’s no data-loss because we kept the system down while we did this. And if we had to abandon it the next day, that just never was part of our plan. We would always just keep troubleshooting and fix the issue; whatever it took.

Erik Darling: Smoke tests are a beautiful thing.

Can I mirror from a cluster to something else?

Tara Kizer: Alright, Thomas asks, “I have some databases in a cluster. I’m moving them to an Availability Group. Can I set up mirroring from the cluster to one of the instances, failover when I’m ready to migrate and then add them to the availability group?” Sure.

Erik Darling: Sounds right to me.

Tara Kizer: Yes, definitely. He wasn’t sure about taking databases in the cluster and then mirroring them. I’ve done tons of mirroring with failover cluster instances and Availability Groups. They’re all separate technologies – just think, if you could do mirroring and Availability Group; probably, but that wasn’t the question anyway.

In my 2-node cluster, how do transactions fail over?

Tara Kizer: Alright, some of these I’m skipping because they’re too long. Alright, so J.H. asks, “In a two node cluster, server one is active, server two is passive. If server one gets rebooted and server two then becomes active, do all connection transactions get persisted via clustered MSDTC or other clustered process. No connections, transactions, get hard dropped…” You get hard dropped because that instance is coming down. It is severed. So you need to have mechanism in your application to retry queries once the instance comes up on the other node.

Erik Darling: Yeah, that will definitely sever whatever connections you have. If the shutdown is kind of graceful then some of them might do okay. But in general, if you have a hard stop on that server, whatever was in-flight is not going to survive.

Why doesn’t Richie like Subversion?

Tara Kizer: Alright, and I did see a question. I don’t remember who asked it. But someone wanted to know from Richie why he didn’t like Subversion.

Richie Rump: Personal preference; just didn’t like it. So I’ve…

Erik Darling: So what do you like?

Richie Rump: I was just about to get there. Thank you very much [crosstalk]. I’m a big Git guy these days; pretty much Git…

Tara Kizer: How can you like Git? Oh my goodness, I can’t stand it. It’s too complicated.

Erik Darling: What do you like, Tara?

Tara Kizer: Well, if I had to pick, I’m going to pick nothing. I’d rather not have source control. I’m kidding. I was okay with TFS at my last job. I thought that that was pretty fancy. Learning curve, I thought, was much lower than GitHub. I cannot – if I have to do anything in the Blitz stuff, it is going to be a very painful day. I’m going to need some Motrin; I may need a cocktail. And I have to go through the article again on the steps I have to do because it is just too many steps.

Richie Rump: I’m in it all day long, so it’s…

Tara Kizer: You guys use it more often, yeah.

Richie Rump: It’s easy for us. The other thing is, if Git is not an option or is something that you guys don’t want to go to, I like Mercurial as well. They’re very close to one another, but definitely, I would go mercurial or Git, I’m fine with these days. TFS, it’s okay. It doesn’t have that Git-iness where you can work everything locally. Everything’s kind of shared up top. And the server – there’s just a lot of power when I can work on something local, pull it down, merge it myself and then push it back up and rebase it; whatever I need to do, and everything just kind of works. The branching is so much easier than TFS. For the branching alone, Git wins for me.

Tara Kizer: Got you. Alright, that’s it. We’re all done for the day. We will see you guys next week. Bye.

Erik Darling: Time to go keep drinking.

Of Windowing Functions And Where Clauses

How to Get a Random Row from a Large Table

4 Comments. Leave new

Corbett
March 6, 2018 1:17 pm

You say… after you’ve removed the bad plan from the plan cache, but how do you remove a bad plan?

Reply
- Brent Ozar
  March 6, 2018 2:10 pm
  
  Corbett – the easy way is to run sp_BlitzCache @ExpertMode = 1, and then you’ll see a DBCC FREEPROCCACHE command for that particular plan over on the far right side of the results.
  
  Reply
Dave Shaw
March 7, 2018 3:46 am

TFS Version Control is fine for SQL Database Scripts. We have 1,000’s of scripts worked on by loads of developers.
Multiple checkout isn’t an issue and Merging has been a lot better for many years.

Reply
Edwin Sarmiento
March 10, 2018 4:03 pm

Allow me to add on to some of the HA/DR-related questions:

1) Can you combine a failover clustered instance and an AG?

Yes, you can. I’ve done my fair share of deploying FCI + AGs in the past due to my naive understanding about SQL Server licensing 🙂 I actually use a FCI + AG configuration for one of the troubleshooting exercises in my SQL Server Always On Availability Groups: The Senior DBA’s Field Guide training class.

Configuration is a bit different since the secondary replica can be a standalone instance. You don’t want to “Add all eligible storage to the cluster” when adding the node running the standalone instance. Also, the Possible Owners list for the FCI does not include the node running the standalone instance.

Be warned that you lose the ability to do automatic failover on the AG-level when you introduce an FCI as a replica. You still maintain automatic failover on the FCI-level.

2) How are jobs handled in an Availability Group?

One thing I don’t see being mentioned in conversations about SQL Server Agent jobs in AGs is the concept on change management process. Everyone wants to automate the process of creating the jobs on all of the replicas. I’m a big fan of automation and scripting. Heck, I was doing DevOps even before it was even called DevOps. But automation and DevOps should not replace proper change management process. Creating a SQL Server Agent job on the primary replica is considered a change and, therefore, has to be properly documented and tracked as part of change management.

3) When I upgrade from a standalone to an AG…

When you’re doing an upgrade (lower version to higher version) and converting a standalone instance to an AG, you have to properly plan for a rollback in case something goes horribly wrong. This applies to any upgrade project, including service packs and cumulative updates installation. What Tara and Erik said are spot on. Dropping both mirroring (you can’t have mirroring and AG at the same time anyway) and log shipping means you have your old server available in case the upgrade and migration process fails. I even go to the extreme of dropping the mirroring/log shipping and changing the databases on the old server into read-only mode before finally cutting over to the new server environment. This guarantees that no changes are done on the old server’s databases until we can decide to either move on with the upgrade or rollback due to a upgrade failure.

4) Can I mirror from a cluster to something else?

If the goal is to eventually configure Availability Groups, you can simply add the server running the standalone instance as a node in the failover cluster, configure AG and make the standalone instance as a secondary replica. This assumes that all of the servers run the same version and edition of the OS (you cannot combine different versions and editions of the OS in a failover cluster unless you’re doing a cluster rolling upgrade which I don’t really like)

Reply