SQL Server 2016 adds Availability Groups in Standard Edition. While the licensing details can obviously change right up to release date, here’s what was announced at the Ignite conference:
- Limited to two nodes only (a primary and a secondary)
- Like mirroring, you can’t read from the secondary, nor take backups of it
- But like database mirroring, you can take snapshots of the secondary for a static reporting copy
- Each database is in its own Availability Group
In CTP2, this is referred to as a Basic Availability Group.
Yes, a BAG. I can’t make this stuff up people. But here’s what I can make up:
- If you use HammerDB for load testing of a BAG, that’s a BAG of Hammers
- If you have ten databases, each in its own BAG, that’s a Dime BAG
- It could be worse – it could be a Database Basic Availability Group, or a D-BAG
Enough laughing – let’s implement it. For this scenario, I’m already in production with one SQL Server 2016 machine, and I’d like to add a second one to give me some high availability or disaster recovery protection.
I’m setting up a simple AG with two standalone machines:
- SQL2016A – this will be my primary instance where my databases will normally run.
- SQL2016B – this will be my secondary, with less horsepower. Could be in the same data center, or in a different one. (To keep things simple for this explanation, I’m going to stick with the same data center and the same IP subnets.)
This checklist will focus on the Availability Groups portion, not the SQL Server installation, so go ahead and set up your second SQL Server using our SQL Server Setup Checklist. Make sure the second one uses the same user database data/log drive letters and paths as your primary server. If you use different paths, then whenever you add a data or log file to an existing database, it will succeed on the primary but then fail on the replica where the drive letters are different. (Only the user database paths have to match – not TempDB, system databases, binaries, etc.)
In theory, you might be able to do this in production without a major outage.
In practice, I wouldn’t bother trying – you’re probably behind on Windows Updates, and you need to apply these prerequisite Windows patches. Because of that, I’d schedule a separate (long) maintenance window to get up to speed on Windows patches, the latest SQL Server updates, and do the first few steps. Even when you rehearse it in a lab, it’s not unusual to run into several restarts-and-check-again-for-updates in the wild, or brief outages to
Let’s get started!
ADD THE CLUSTERING FEATURE TO BOTH WINDOWS SERVERS.
“But wait, Brent – I don’t want to build a cluster!” Unfortunately, SQL Server’s AlwaysOn Availability Groups is built atop Windows Failover Clustering Services, and you have to join your AG replicas into a single Windows cluster.
Fear not – this is nowhere near as scary as it sounds, and it can be done after-the-fact to an already-in-production SQL Server. (I’d recommend you set up a lab or staging environment first to get used to it, though.)
After installing those clustering prerequisites, open Server Manager and click Manage, Add Roles and Features, select your server, and on the Features list, check the Failover Clustering box:
You can move on to the next step after this feature has been installed on all of the servers that will be involved in the Availability Group.
Validate your cluster candidates.
Open Server Manager and click Tools, Failover Cluster Manager. (I know – you thought Failover Cluster Manager would be under Manage instead of Tools, but it’s these clever switcheroos that keep us all employed as IT professionals.) Click Validate Configuration on the right hand side to start the wizard:
Put in the names of all the servers that will be participating in the Availability Group, and hit Next. It’s going to ask if you want to run all tests – this is another one of those gotchas where, in theory, you could do this in production without an outage if you chose to run specific tests. I don’t recommend that – just run all tests. If SQL Server is running on any of these nodes, it’ll likely stop while this wizard runs, so do this during a maintenance window.
The validation wizard can take several minutes depending on the number of replicas, environment complexity, and phase of the moon.
Your cluster will probably fail, and that’s okay. This isn’t like school. Nobody has to know if you fail several times. (Chemistry was hard for all of us.) Even if you think you passed, you probably failed a portion of it. (The wizard, not chemistry.)
Click the View Report button, and Internet Explorer will launch because why wouldn’t we take every opportunity to open a browser on a production server:
You’ll be tempted to scroll down through it because that’s how most web pages work. This is not most web pages. Click on the first error – in my case, Network – and then click on the first warning it takes you to – in my case, Validate Network Communication – and then read the warnings in yellow:
This is where things get a little weird.
Some errors are completely okay to ignore. The most common two are:
- Single network interface warnings – if you’re doing network teaming or virtualization, you’re probably protecting your network through means that the clustering wizard can’t detect.
- No disks found for cluster validation – if you’re building an AlwaysOn configuration with no shared disk, like failover clustering on a UNC path or a pair of Availability Group replicas with local disk, this isn’t a problem.
But otherwise, read this report really carefully and fix the warnings before you go on to the next step. For example, if the two nodes have different Windows patch levels, get the missing updates installed first. (If you’re having difficulty getting some of the patches to apply, there’s your first sign that this isn’t a good candidate for a reliable AlwaysOn implementation – more like OftenOn, or OffAndOn.)
After you fix the issues, run the validation wizard again. This is important because the last validation report is saved in Failover Cluster Manager, and you always want that report to look good. When you call Microsoft (or a team of highly skilled consultants) for Availability Groups help, they’re going to start by looking at this validation report, and if it didn’t pass last time, that’s going to be a bad start.
Create the Cluster.
After a successful validation report, leave the “Create the cluster now” checkbox checked, and click Finish – your cluster will be created:
This checkbox is particularly diabolical – it will be checked by default as long as you didn’t have any heinously ugly showstopper errors, but even with mildly serious errors, it will still let you create the cluster. Make sure you fix the errors first.
You will be prompted for a cluster name – again, a little diabolical here, because this is not how you will address SQL Server. This name is how you will control the cluster itself:
Think of your cluster like a kitchen – a restaurant can have lots of cooks, but it will have only one head chef. Your head chef knows who’s doing what at all times, and bosses people around. The head chef never actually cooks anything – he just tells other people what to do.
Your cluster can have lots of SQL Server instances in it, and they can have lots of Availability Groups. We need one controller to know what’s going on, and that’s the cluster name.
In my case, I’m using SQL2016CLUSTER. You’ll notice that in my screenshot, it says “One or more DHCP IPv4 addresses were configured automatically” – don’t do that in your production environment. Use static IP addresses. You always want to know where the head chef is.
The next screen will prompt you to add all eligible storage to the cluster – uncheck that unless you’re building an AlwaysOn Failover Clustered Instance.
In a perfect world, the wizard finishes, your cluster is completed successfully, and your Failover Cluster Manager looks something like this:
Pay particular attention to the “Witness: None” line. Just like any good preacher, your cluster needs a witness, so right-click on the cluster name on the left side of the screen, click More Actions, and Configure Cluster Quorum Settings. In the next wizard, choose “Select the quorum witness” and choose either a disk (shared drive) or file share (UNC path) witness.
For my simple two-server configuration, I need a really reliable tiebreaker, so I’m going with a file share witness. I need it hosted on my most reliable server possible – in my simple lab, that’s one of my domain controllers, so I type in the server name and then click the Show Shared Folders button:
Click the New Shared Folder button, configure your file share, and exit the wizard. What I just did there is right for my lab, but only my lab – you’ll want to make some better-informed decisions in your production environment. For example, if all of my SQL Servers lived on virtual machines backed by NetApp storage, I’d probably use a file share right on the NetApp itself, because if that thing goes down, all my SQL Servers are hosed anyway.
Enable AlwaysOn Availability Groups on Each Server
On each replica, launch SQL Server Configuration Manager, then go into SQL Server Services, right-click on the SQL Server service, and click Properties.
On the AlwaysOn High Availability tab, check the box to Enable, and click OK. This won’t take effect until the SQL Server service is restarted. Repeat this on all of the SQL Servers.
Now let’s take a moment to reflect: we’ve got a pair of Windows servers that are in the same cluster, but at this point, there are no dependencies between the SQL Server services running. They can be hosting different databases, doing log shipping or mirroring between them, or have absolutely nothing in common. If you’re planning your first AG in production, the work done up to this point is a good amount of work to tackle in one maintenance window. I would schedule all of this for the first outage, and then let things simmer down for the next week or two before assembling the Availability Group components.
Create the Availability Group
Whew! We’re finally done with the Windows portion of setup. Let’s fire up SQL Server Management Studio and have some fun.
On my SQL Server, I have a copy of the StackOverflow.com database that I want to copy over to my secondary replica. It’s already in full recovery mode, and I’ve been taking full backups, so it already meets the lengthy AlwaysOn Availability Group prerequisites.
In Object Explorer, right-click on AlwaysOn High Availability and click New Availability Group Wizard. Starting in SQL Server 2016, we’ve got a few new options:
As of this writing, the AG documentation hasn’t caught up with the product yet, so here’s an explanation of each option:
Basic Availability Group – terminology can always change leading up to release, but as of CTP2, this functionality is called a Basic AG. I would just like to take another second to point out that the acronym is BAG.
Database Level Health Detection – in Super Availability Groups (as opposed to Basic – look, if they can make up stupid acronyms, so can I) with multiple databases, the SAG doesn’t drop if one database happens to go offline, become corrupt, or take the day off. New in SQL Server 2016 is the ability to force that failover if any one database goes down, so the SAG stays up.
Per Database DTC Support – The Distributed Transaction Coordinator (DTC) enables simultaneous commits across multiple SQL Servers hosted on different hardware. Prior to SQL Server 2016, AG databases didn’t support DTC participation, but now, just check this box.*
* – You’ll probably have to do more than that, but hey, the infrastructure is here.
Since we’re doing Standard Edition, we’ll just check the Basic Availability Group box and move on. Check the box to pick which database you want in your BAG, and then it’s time for replica selections. On this screen, click the Add Replica button and connect to your new soon-to-be secondary:
Here’s what the choices mean, and I’m going to explain them a little out of order:
Synchronous Commit – say SQL2016A is the primary, and a user does a delete/update/insert there. If I check the Synchronous Commit boxes for SQL2016A and SQL2016B, then SQL2016A will write the user’s transaction locally, send the transaction over the network to SQL2016B, write it to disk there, and then wait for confirmation back before telling the user that the transaction is committed. (Most of the time. If SQL2016B drops offline, that’s another story.)
If I don’t check the Synchronous Commit boxes, then I get asynchronous transactions – the SQL2016A primary transaction will succeed right away, and SQL2016A just sends the transactions over to SQL2016B later. SQL2016B can be seconds (or minutes, or hours) behind, so I’ll lose data on failover, but the primary’s transactions won’t be subject to that killer latency.
Automatic Failover – if the Synchronous Commit boxes are checked, then you can do automatic failover between two of the replicas. (This is where the cluster witness starts to come into play – in my two-node setup, SQL Server needs a tiebreaker to know the difference between single node failures and a split-brain scenario.)
For my evil purposes, I only need a disaster recovery copy of my databases, and I don’t want to fail over to DR if my primary just happens to restart or hiccup. I’m going to leave the Automatic Failover and Synchronous Commit boxes unchecked.
Readable Secondary – in Super Availability Groups, you can offload reads and backups to your secondaries. Since I’m dealing with a BAG here, the Readable Secondary dropdown only offers the “No” option.
The tabs for Endpoints, Backup Preferences, and Listener don’t really matter here. SQL Server automatically creates the endpoints for you, and you can’t offload backups or create a listener for BAGs.
The next step in the wizard syncs the data between replicas:
The Full option performs one full and one transaction log backup on the primary, then restores them over to the secondary to start the replication. (Keep in mind that if your databases are large, you’ll want to set database backup compression on by default in order to speed this process – you don’t get to choose compression settings in the wizard.)
The other two options are useful if you performed the backups yourself manually, or if you’re using storage replication technologies to get it started.
If you choose the Full option, you can track backup and restore status with sp_WhoIsActive’s percent-completed column.
After the AG wizard finishes, you can right-click on your Availability Group in Object Explorer and click Show Dashboard:
Voila! Papa’s got a brand new BAG.