Before we get started, we need to talk about something. I’m going to be using the C word – that’s right, Clustering. Deep, calming breaths. Relax. We can do this, you and me. This isn’t the old-school clustering with shared disks, heartbeat networks, and hardware compatibility lists. This is just another feature you’re going to add to your servers, click a few buttons, and move right on to more important things.
What We’re Going To Build
My imaginary business in Chicago is running a mission-critical accounting application that need to be online at all times. We need to be able to tolerate the following problems:
- If the production server dies, I need to fail over automatically to a second server in the same Chicago datacenter. (This also means the production and secondary servers can’t be on the same SAN, because SANs do fail.)
- If the datacenter goes down, I need to fail over to a DR datacenter offsite in Portland. The business is okay with losing some data in the event of a failure that large, and they’re okay with this failover being a manual process.
- The BI team wants to run reports on the live database with as little delay as possible. They know they should be building a separate warehouse, but they’re too
lazybusy, and they want this to work right now.
- Backups need to run faster, and ideally, I don’t want to run backups on my main production server.
To solve this, I’m going to implement four SQL Servers with Availability Groups. The four servers will be:
- SQL2012PROD1 – the primary production SQL Server in Chicago. Data will be stored on a SAN.
- SQL2012PROD2 – the secondary production SQL Server in Chicago. It’s located in the next rack over, and it’s got all local storage so that we can tolerate a SAN failure without changing datacenters. We’ll be doing synchronous commits between SQL2012PROD1 and SQL2012PROD2, which means I do still need fast storage on #2 – a good fit for SSDs.
- SQL2012DR1 – an offsite SQL Server in Portland. Because the business is okay with some data loss in the event of a complete Chicago datacenter failure, we’ll be doing asynchronous commits to this server.
- SQL2012RPT1 – a read-only SQL Server in Chicago. It can be on any storage (SAN or local). We’ll be doing asynchronous commits here, which means we might be running 5-60 seconds behind depending on load, but that’s still more than current enough for the BI team’s reports. After hours, when users aren’t running reports, we’ll run full backups from this server.
Now let’s get started building these four servers.
Prerequisites: Before We Install SQL Server 2012
I’m using four Windows Server 2008 R2 Enterprise servers in this tutorial, but you can use as few as two. I strongly recommend using the same drive letters across all servers: for example, if you want to use E for apps, F for data, and G for logs, then all four servers need to use those same drive letters. If the production server has an H drive that no other server has, and we add a data file on the H drive, then that alter-database statement will fail across the rest of our servers – instantly marking their databases as suspect. We can fix that by restoring files to the other servers manually, but the whole point of our solution is to make it easy and automatic, so keep those drive letters identical from the start.
Windows Enterprise is required due to the clustering features, but they don’t have to be beefy machines – in my lab, I’m using VMs with 2GB of RAM.
Each server needs the .NET Framework 3.5.1 feature and the Failover Clustering feature already installed. To install these features, go into Server Manager, Features, and click Add Feature. In the Add Features wizard, check the boxes for .NET Framework 3.5.1 and Failover Clustering:
After those two features are installed on all of our servers, we need to configure the cluster. Go into Control Panel, Administrative Tools, Failover Cluster Manager. Click Validate a Configuration, and enter the names for your SQL Servers.
Note that all four of my servers are in the same domain. You can’t use servers in different AD forests here.
Click Next through the wizard, and you’ll be asked what tests you want to run. I run all tests every time, but technically, you could skip the storage tests if you’re only using Availability Groups. These servers will not be sharing storage, so I don’t need to test whether any of them can see each others’ drives. If you run all tests, you’ll see warnings in the Storage section:
Whether you pass or fail, click the View Report button to get more insight on your servers. A detailed HTML report pops up to give you the skinny:
The Network section of validation is giving me warnings because I’ve got a few single points of failure. Each of my servers only has one network card and no network teaming. If a network card went bad, if someone tripped over a patch cable, or if somebody assigned an IP address that duplicated my SQL Server’s, then whammo, one of my nodes would go down. In a real production environment, you’d want to address these issues.
If you got errors that you want to change, go fix them and rerun validation. Otherwise, if you don’t see any showstopper errors, close the validation report and click the “Create the cluster now using the validated nodes” link.
Creating The Cluster
The Create Cluster Wizard will ask for the new cluster’s name, IP address, and network. The cluster is like a virtual server that doesn’t really exist, but just gets passed around from server to server depending on who’s in charge. You remember that time at the restaurant when you found a hair in your bacon, and the waiter said it looked an awful lot like yours? And then you stood up and asked for the manager? You didn’t know what the manager’s exact name was, because the manager changes all the time from shift to shift over at Harry’s Bacon Shack, so you just asked for the manager. Some guy came over with a nametag that said “Manager On Duty.” Well, that’s exactly what we’re creating here – a manager-on-duty nametag.
The cluster’s name is not the name of the SQL Server. That’s a really important distinction to remember. A restaurant might have a lot of waiters, but it only has one manager. Your cluster may end up with several different instances of SQL Server down the road, but there’s only one manager resource for the cluster. The manager is the one you call when there’s a problem with your waiter. If SQL Server is serving us flies in our database, we’re going to call the manager. The rest of the time, we can pretty much forget that the manager even exists.
In my example, I’m naming my cluster SQL2012Cluster1. I’m only naming it that because all of my machines are named SQL2012*. Your naming scheme may vary, but I just wanna drive that point home that the cluster name doesn’t really have anything to do with SQL Server.
After the wizard finishes, Failover Cluster Manager will show our newly created cluster with four nodes, a network, and no services or applications.
Yes, installing the Failover Clustering feature really is as easy as checking a box and clicking next-next-next. Now, don’t get me wrong, we’re going to have a few more things to do down the road, but take a moment here to appreciate just how far Windows has come in the last ten years.
Next: The SQL Server 2012 Setup
Download SQL Server 2012 Evaluation Edition and start the setup on one of the Windows machines. The very first screen in setup is just a little tricky because it offers one link for “New SQL Server stand-alone installation” and another for “New SQL Server failover cluster installation.” The one you want is stand-alone. You can indeed do clustering with 2012, but for our solution, we’re using Availability Groups by themselves.
In the Setup Role step, choose SQL Server Feature Installation.
In the Feature Selection step, the only feature you absolutely need is Database Engine, but here’s what I choose for lab boxes:
In real life, I wouldn’t include Distributed Replay on my production servers, but in a 2012 lab environment, it’s a fun feature to start playing with. Distributed Replay holds the promise of making load testing easier.
In the Server Configuration step, specify a domain account for SQL Server to use. In my SQL Server setup checklist, I explain why you should use a different domain account per instance. (I’m not so anal-retentive as to use a different account per service per instance, though.) Using domain accounts will make it easier for us to let each SQL Server instance connect to the other instances. Note that if you’re using Kerberos, or plan to later, you should use the same service account for all instances in the same AvailabilityGroup.
The screenshot shows that I use the server name as part of the service account name. My personal naming standard is:
- ‘svc’ – all my service accounts start with this so I can quickly pick them out of AD
- ‘MachineName’ – because I use a different account per Windows machine for each service
- ‘ServiceName’ – because I might be running multiple services on the same Windows box
So in the example below, I’m using LAB\svcSQL2012PROD1_SQL for the database service accounts on SQL2012PROD1.
In the Database Engine Configuration step, make sure to click Add Current User as an administrator for the database engine. Also in this step, click on the Data Directories tab and configure the default file paths. Use the same paths across all of the SQL Server instances we’re using for AlwaysOn.
Next your way through the installation, and a few minutes later, you’ll have yourself one working SQL Server. Repeat that process for all of your 2012 machines.
Configuring SQL Server 2012
By default, SQL Server does not accept remote connections. This is because end users are the root of all our problems. While I admire Microsoft’s reluctance to let their product be used, we’re going to have to change that default setting. Click Start, Programs, Microsoft SQL Server 2012, Configuration Tools, SQL Server Configuration Manager. Drill into SQL Server Network Configuration, click on Protocols for MSSQLSERVER, and double-click on the TCP/IP entry. Change the “Enabled” option to Yes, and click OK.
Back in Configuration Manager, enable AlwaysOn by clicking SQL Server Services in the left pane, then double-click SQL Server. Click on the AlwaysOn High Availability tab. (Wait – if it’s AlwaysOn, shouldn’t it be called Complete Availability? Hmmm.)
Click OK. Our changes won’t take effect until we restart the SQL Server instance, so now’s a good time to do that. Repeat this process for all of your 2012 instances.
Configuring Security for Cross-Instance Connectivity
Like any close-knit family, our instances are going to have to talk to each other. Each of the server’s service accounts is going to need to connect to the other instances. In a real-life environment, we would be more conservative with our security, but for the purposes of this lab, we’re going to take a couple of shortcuts. You should never go live with this configuration. This is purely for training purposes. (K. Brian Kelley is going to kill me when he reads this, but I’m aiming for quick setup.)
Open SQL Server Management Studio and connect to one of the instances. Go into Security, right-click on Logins, and create a new login. Click Search, and put in one of the service account names that you used for each SQL Server. On the left side, click Server Roles, check the sysadmin box, and click OK.
Repeat this process for every service account, and repeat it on every instance. In all, you’re going to be setting up 3 accounts per server on all 4 servers.
A slightly easier way (and my personal favorite): put the service accounts in an Active Directory group instead, and then create a login for the group. If you take that approach, be aware that groups don’t show up by default when you’re searching in the new-login creation process. After you click Search, click Object Types, and check Groups.
Last, but not least, we need a file share that all four service accounts can access. We’re going to initialize our replicas with backups and restores, and to do that, all four services need to be able to read & write to the same folder. This file share can be on any file server or on one of the SQL Servers, but make sure to grant rights to read/write for all of the service accounts. (Again, made easier if we’re using AD groups.)
We Interrupt This Setup for a Talk About Your Job
There’s been a few times in this post where I’ve mentioned tasks that might be outside of the typical SQL Server DBA’s responsibility zone: creating file shares, adding AD accounts and groups, and picking names and IP addresses. If you’ve never set up a cluster before, this may seem intimidating, but fear not. If these responsibilities are outside of your domain, they’re very trivial matters for your Windows team.
If you were doing database mirroring before, then you’ve probably already dealt with file share permissions and Active Directory. AlwaysOn Availability Groups don’t change anything there. However, they do add the requirement for a virtual network name and IP address. That’s not a big requirement.
Finally! Let’s Play with Availability Groups
Back to SQL Server Management Studio. On the main production server (in my case, SQL2012PROD1), set yourself up a few lab databases – either create a few new databases or restore databases from your other servers. If you create new databases, take a full backup of each database before we go on. It’s not that I treasure your work, but rather your newly created databases aren’t really in full recovery mode until the first full backup. (And yes, the Availability Groups setup wizard checks for this.)
In SSMS, go into the main production server, click Management, right-click on Availability Groups, and click New Availability Group Wizard. You’ll be presented with a list of databases that you can include in the Availability Group. Keep in mind that all of the databases in the Availability Group will be failed over from your primary server to your secondary server together, automatically. In my case, my accounting application consists of the databases Accounting, AccountingPayroll, and AccountingReports. I have some applications that make cross-database SELECT queries for reporting purposes, so I want to make sure they all fail over together.
Click Next, and you’ll be able to specify which servers are part of the group and what their roles are. This below screen has so much of what I’ve always wanted in a database product. I’ll need just a moment by myself. Excuse me.
This is, as Martha Stewart would say, a Good Thing. Let’s take it one column at a time:
Server Instance – obviously, the server name we’re dealing with.
Initial Role – SQL2012PROD1 will be my Primary instance because it’s got the current live copy of the data.
Replica Mode – PROD1 and PROD2 will do Automatic Failover. Only two instances can be involved in Automatic Failover at a time. For my business configuration, DR1 and RPT1 are in High Performance mode, which is akin to asynchronous database mirroring. High Safety is also an option, meaning synchronous mirroring.
Connection Mode in Secondary Role – when this instance isn’t the primary instance, will it allow connections? If so, then we have to be aware that its performance will slow down. I’m not keen on allowing any connections to my server that’s acting as an automatic failover partner, because that server needs to commit writes as fast as possible. In my setup, I’ve got PROD1 and PROD2 set to disallow connections for that reason. My DR1 and RPT1 instances, however, can allow connections, thereby letting my read-only users connect there to run reports or backups.
That’s amazing. This is probably the coolest screen of any wizard I’ve ever seen. This one screen lets me protect and scale out multiple databases easily. Granted, it took us 2,800 words of explanation to get to this point, but it’s way easier than any other scale-out method I’ve seen. It’s all in the GUI, for crying out loud.
Click Next, and the next screen will set up a listener. You can just take the defaults here.
On the Select Initial Data Synchronization screen, put in the UNC path for the file share that you set up for all of the SQL Servers to access. The production server is going to take a backup to this path, and the replicas will restore that backup. You can also skip this step entirely if you’d rather initialize the replicas yourself, and in many production environments, you’ll need to do that. For example, environments with SAN snapshot capabilities or small WAN pipes will need customization.
Next your way through the wizard and validation, and next thing you know, you’ve got an Availability Group. Right-click on your new Availability Group and click Show Dashboard. Presto:
What to Play With Next
Here’s some of the cool things you can experiment with:
- Take your full and transaction log backups on a replica. (You can’t do differential backups on the replicas – long story.)
- Query the replicas
- Run DBCC on a replica
- Query the new DMVs for AlwaysOn features
- Remove a replica, and add a different server as a replica
- Automatic page repair – 2012 automatically detects corrupt pages on the replicas and repairs them by fetching a good copy from any other instance
- Replication – Availability Groups can include a replication publisher, but not the distributor
I’m really, really excited about AlwaysOn Availability Groups. Yes, it requires some compromises (like Windows Enterprise and an additional name/IP) but I believe this is a big bang for very little buck. My only concern – and this is a big concern – is that this is an Enterprise Edition only feature, and that’s where the big bucks will come in.