Blog

Before we get started, we need to talk about something. I’m going to be using the C word – that’s right, Clustering. Deep, calming breaths. Relax. We can do this, you and me. This isn’t the old-school clustering with shared disks, heartbeat networks, and hardware compatibility lists. This is just another feature you’re going to add to your servers, click a few buttons, and move right on to more important things.

What We’re Going To Build

My imaginary business in Chicago is running a mission-critical accounting application that need to be online at all times. We need to be able to tolerate the following problems:

  • If the production server dies, I need to fail over automatically to a second server in the same Chicago datacenter. (This also means the production and secondary servers can’t be on the same SAN, because SANs do fail.)
  • If the datacenter goes down, I need to fail over to a DR datacenter offsite in Portland. The business is okay with losing some data in the event of a failure that large, and they’re okay with this failover being a manual process.
  • The BI team wants to run reports on the live database with as little delay as possible. They know they should be building a separate warehouse, but they’re too lazy busy, and they want this to work right now.
  • Backups need to run faster, and ideally, I don’t want to run backups on my main production server.

To solve this, I’m going to implement four SQL Servers with Availability Groups. The four servers will be:

  • SQL2012PROD1 – the primary production SQL Server in Chicago. Data will be stored on a SAN.
  • SQL2012PROD2 – the secondary production SQL Server in Chicago. It’s located in the next rack over, and it’s got all local storage so that we can tolerate a SAN failure without changing datacenters. We’ll be doing synchronous commits between SQL2012PROD1 and SQL2012PROD2, which means I do still need fast storage on #2 – a good fit for SSDs.
  • SQL2012DR1 – an offsite SQL Server in Portland. Because the business is okay with some data loss in the event of a complete Chicago datacenter failure, we’ll be doing asynchronous commits to this server.
  • SQL2012RPT1 – a read-only SQL Server in Chicago. It can be on any storage (SAN or local). We’ll be doing asynchronous commits here, which means we might be running 5-60 seconds behind depending on load, but that’s still more than current enough for the BI team’s reports. After hours, when users aren’t running reports, we’ll run full backups from this server.

Now let’s get started building these four servers.

Prerequisites: Before We Install SQL Server 2012

I’m using four Windows Server 2008 R2 Enterprise servers in this tutorial, but you can use as few as two. I strongly recommend using the same drive letters across all servers: for example, if you want to use E for apps, F for data, and G for logs, then all four servers need to use those same drive letters. If the production server has an H drive that no other server has, and we add a data file on the H drive, then that alter-database statement will fail across the rest of our servers – instantly marking their databases as suspect. We can fix that by restoring files to the other servers manually, but the whole point of our solution is to make it easy and automatic, so keep those drive letters identical from the start.

Windows Enterprise is required due to the clustering features, but they don’t have to be beefy machines – in my lab, I’m using VMs with 2GB of RAM.

Each server needs the .NET Framework 3.5.1 feature and the Failover Clustering feature already installed. To install these features, go into Server Manager, Features, and click Add Feature. In the Add Features wizard, check the boxes for .NET Framework 3.5.1 and Failover Clustering:

Adding the Features

After those two features are installed on all of our servers, we need to configure the cluster. Go into Control Panel, Administrative Tools, Failover Cluster Manager. Click Validate a Configuration, and enter the names for your SQL Servers.

Validate a Configuration Wizard

Note that all four of my servers are in the same domain. You can’t use servers in different AD forests here.

Click Next through the wizard, and you’ll be asked what tests you want to run. I run all tests every time, but technically, you could skip the storage tests if you’re only using Availability Groups. These servers will not be sharing storage, so I don’t need to test whether any of them can see each others’ drives. If you run all tests, you’ll see warnings in the Storage section:

Of course, I *would* fail at storage. That's so me.

Whether you pass or fail, click the View Report button to get more insight on your servers. A detailed HTML report pops up to give you the skinny:

Network Warnings

The Network section of validation is giving me warnings because I’ve got a few single points of failure. Each of my servers only has one network card and no network teaming. If a network card went bad, if someone tripped over a patch cable, or if somebody assigned an IP address that duplicated my SQL Server’s, then whammo, one of my nodes would go down. In a real production environment, you’d want to address these issues.

If you got errors that you want to change, go fix them and rerun validation. Otherwise, if you don’t see any showstopper errors, close the validation report and click the “Create the cluster now using the validated nodes” link.

Creating The Cluster

The Create Cluster Wizard will ask for the new cluster’s name, IP address, and network. The cluster is like a virtual server that doesn’t really exist, but just gets passed around from server to server depending on who’s in charge. You remember that time at the restaurant when you found a hair in your bacon, and the waiter said it looked an awful lot like yours? And then you stood up and asked for the manager? You didn’t know what the manager’s exact name was, because the manager changes all the time from shift to shift over at Harry’s Bacon Shack, so you just asked for the manager. Some guy came over with a nametag that said “Manager On Duty.” Well, that’s exactly what we’re creating here – a manager-on-duty nametag.

My cluster's name is Harry

Cluster Name and IP Address

The cluster’s name is not the name of the SQL Server. That’s a really important distinction to remember. A restaurant might have a lot of waiters, but it only has one manager. Your cluster may end up with several different instances of SQL Server down the road, but there’s only one manager resource for the cluster. The manager is the one you call when there’s a problem with your waiter. If SQL Server is serving us flies in our database, we’re going to call the manager. The rest of the time, we can pretty much forget that the manager even exists.

In my example, I’m naming my cluster SQL2012Cluster1. I’m only naming it that because all of my machines are named SQL2012*. Your naming scheme may vary, but I just wanna drive that point home that the cluster name doesn’t really have anything to do with SQL Server.

After the wizard finishes, Failover Cluster Manager will show our newly created cluster with four nodes, a network, and no services or applications.

What a Cluster

Yes, installing the Failover Clustering feature really is as easy as checking a box and clicking next-next-next. Now, don’t get me wrong, we’re going to have a few more things to do down the road, but take a moment here to appreciate just how far Windows has come in the last ten years.

Next: The SQL Server 2012 Setup

Download SQL Server 2012 Evaluation Edition and start the setup on one of the Windows machines. The very first screen in setup is just a little tricky because it offers one link for “New SQL Server stand-alone installation” and another for “New SQL Server failover cluster installation.” The one you want is stand-alone. You can indeed do clustering with 2012, but for our solution, we’re using Availability Groups by themselves.

Installation Choices

In the Setup Role step, choose SQL Server Feature Installation.

In the Feature Selection step, the only feature you absolutely need is Database Engine, but here’s what I choose for lab boxes:

The Usual Suspects

In real life, I wouldn’t include Distributed Replay on my production servers, but in a 2012 lab environment, it’s a fun feature to start playing with. Distributed Replay holds the promise of making load testing easier.

In the Server Configuration step, specify a domain account for SQL Server to use. In my SQL Server setup checklist, I explain why you should use a different domain account per instance. (I’m not so anal-retentive as to use a different account per service per instance, though.) Using domain accounts will make it easier for us to let each SQL Server instance connect to the other instances.  Note that if you’re using Kerberos, or plan to later, you should use the same service account for all instances in the same AvailabilityGroup.

The screenshot shows that I use the server name as part of the service account name. My personal naming standard is:

  • ‘svc’ – all my service accounts start with this so I can quickly pick them out of AD
  • ‘MachineName’ – because I use a different account per Windows machine for each service
  • ‘ServiceName’ – because I might be running multiple services on the same Windows box

So in the example below, I’m using LAB\svcSQL2012PROD1_SQL for the database service accounts on SQL2012PROD1.

Service Accounts

In the Database Engine Configuration step, make sure to click Add Current User as an administrator for the database engine. Also in this step, click on the Data Directories tab and configure the default file paths. Use the same paths across all of the SQL Server instances we’re using for AlwaysOn.

Installation Paths

Next your way through the installation, and a few minutes later, you’ll have yourself one working SQL Server. Repeat that process for all of your 2012 machines.

Configuring SQL Server 2012

By default, SQL Server does not accept remote connections. This is because end users are the root of all our problems. While I admire Microsoft’s reluctance to let their product be used, we’re going to have to change that default setting. Click Start, Programs, Microsoft SQL Server 2012, Configuration Tools, SQL Server Configuration Manager. Drill into SQL Server Network Configuration, click on Protocols for MSSQLSERVER, and double-click on the TCP/IP entry. Change the “Enabled” option to Yes, and click OK.

Back in Configuration Manager, enable AlwaysOn by clicking SQL Server Services in the left pane, then double-click SQL Server. Click on the AlwaysOn High Availability tab. (Wait – if it’s AlwaysOn, shouldn’t it be called Complete Availability? Hmmm.)

On and On Like Donkey Kong to the Break of Dawn

Click OK. Our changes won’t take effect until we restart the SQL Server instance, so now’s a good time to do that. Repeat this process for all of your 2012 instances.

Configuring Security for Cross-Instance Connectivity

Like any close-knit family, our instances are going to have to talk to each other. Each of the server’s service accounts is going to need to connect to the other instances. In a real-life environment, we would be more conservative with our security, but for the purposes of this lab, we’re going to take a couple of shortcuts. You should never go live with this configuration. This is purely for training purposes. (K. Brian Kelley is going to kill me when he reads this, but I’m aiming for quick setup.)

Open SQL Server Management Studio and connect to one of the instances. Go into Security, right-click on Logins, and create a new login. Click Search, and put in one of the service account names that you used for each SQL Server. On the left side, click Server Roles, check the sysadmin box, and click OK.

Repeat this process for every service account, and repeat it on every instance. In all, you’re going to be setting up 3 accounts per server on all 4 servers.

A slightly easier way (and my personal favorite): put the service accounts in an Active Directory group instead, and then create a login for the group. If you take that approach, be aware that groups don’t show up by default when you’re searching in the new-login creation process. After you click Search, click Object Types, and check Groups.

Last, but not least, we need a file share that all four service accounts can access. We’re going to initialize our replicas with backups and restores, and to do that, all four services need to be able to read & write to the same folder. This file share can be on any file server or on one of the SQL Servers, but make sure to grant rights to read/write for all of the service accounts. (Again, made easier if we’re using AD groups.)

We Interrupt This Setup for a Talk About Your Job

There’s been a few times in this post where I’ve mentioned tasks that might be outside of the typical SQL Server DBA’s responsibility zone: creating file shares, adding AD accounts and groups, and picking names and IP addresses. If you’ve never set up a cluster before, this may seem intimidating, but fear not. If these responsibilities are outside of your domain, they’re very trivial matters for your Windows team.

If you were doing database mirroring before, then you’ve probably already dealt with file share permissions and Active Directory. AlwaysOn Availability Groups don’t change anything there. However, they do add the requirement for a virtual network name and IP address. That’s not a big requirement.

Finally! Let’s Play with Availability Groups

Back to SQL Server Management Studio. On the main production server (in my case, SQL2012PROD1), set yourself up a few lab databases – either create a few new databases or restore databases from your other servers. If you create new databases, take a full backup of each database before we go on. It’s not that I treasure your work, but rather your newly created databases aren’t really in full recovery mode until the first full backup. (And yes, the Availability Groups setup wizard checks for this.)

In SSMS, go into the main production server, click Management, right-click on Availability Groups, and click New Availability Group Wizard. You’ll be presented with a list of databases that you can include in the Availability Group. Keep in mind that all of the databases in the Availability Group will be failed over from your primary server to your secondary server together, automatically. In my case, my accounting application consists of the databases Accounting, AccountingPayroll, and AccountingReports. I have some applications that make cross-database SELECT queries for reporting purposes, so I want to make sure they all fail over together.

Picking Databases for the Availability Group

Click Next, and you’ll be able to specify which servers are part of the group and what their roles are. This below screen has so much of what I’ve always wanted in a database product. I’ll need just a moment by myself. Excuse me.

Was It As Good For You As It Was For Me?

This is, as Martha Stewart would say, a Good Thing. Let’s take it one column at a time:

Server Instance – obviously, the server name we’re dealing with.

Initial Role – SQL2012PROD1 will be my Primary instance because it’s got the current live copy of the data.

Replica Mode – PROD1 and PROD2 will do Automatic Failover. Only two instances can be involved in Automatic Failover at a time. For my business configuration, DR1 and RPT1 are in High Performance mode, which is akin to asynchronous database mirroring. High Safety is also an option, meaning synchronous mirroring.

Connection Mode in Secondary Role – when this instance isn’t the primary instance, will it allow connections? If so, then we have to be aware that its performance will slow down. I’m not keen on allowing any connections to my server that’s acting as an automatic failover partner, because that server needs to commit writes as fast as possible. In my setup, I’ve got PROD1 and PROD2 set to disallow connections for that reason. My DR1 and RPT1 instances, however, can allow connections, thereby letting my read-only users connect there to run reports or backups.

That’s amazing. This is probably the coolest screen of any wizard I’ve ever seen. This one screen lets me protect and scale out multiple databases easily. Granted, it took us 2,800 words of explanation to get to this point, but it’s way easier than any other scale-out method I’ve seen. It’s all in the GUI, for crying out loud.

Click Next, and the next screen will set up a listener. You can just take the defaults here.

On the Select Initial Data Synchronization screen, put in the UNC path for the file share that you set up for all of the SQL Servers to access. The production server is going to take a backup to this path, and the replicas will restore that backup. You can also skip this step entirely if you’d rather initialize the replicas yourself, and in many production environments, you’ll need to do that. For example, environments with SAN snapshot capabilities or small WAN pipes will need customization.

Next your way through the wizard and validation, and next thing you know, you’ve got an Availability Group.  Right-click on your new Availability Group and click Show Dashboard. Presto:

Voila! The Availability Group Dashboard

What to Play With Next

Here’s some of the cool things you can experiment with:

  • Take your full and transaction log backups on a replica. (You can’t do differential backups on the replicas – long story.)
  • Query the replicas
  • Run DBCC on a replica
  • Query the new DMVs for AlwaysOn features
  • Remove a replica, and add a different server as a replica
  • Automatic page repair – 2012 automatically detects corrupt pages on the replicas and repairs them by fetching a good copy from any other instance
  • Replication – Availability Groups can include a replication publisher, but not the distributor

I’m really, really excited about AlwaysOn Availability Groups. Yes, it requires some compromises (like Windows Enterprise and an additional name/IP) but it’s worth it.

Learn More in Our AlwaysOn Resources

↑ Back to top
  1. Thx for taking the time to write such a complete article. Very interesting ! (ps. I had the same feeling when looking at the wizard screen, impressive :) )

  2. Is is a great read. thank you

  3. I wish our business needs required something on this level so that i could get my hands dirty.

  4. Great article – I loved the comment about Microsoft’s reluctance to let their product be used :-).

    It will be interesting to see the rules on licensing replicas.

  5. Dang Brent, you are going to make me download my first CTP ever. Nothing new though since you are always putting out great stuff that makes me want to dig deeper into SQL Management.

  6. Thanks for all the great insight. This will be my first experience with a new release of SQL Server and you have managed to make me overly giddy for this release, of which is usually reserved for the release of a new album by random favorite band!

  7. Great article Brent. Was AlwaysOn Availability Groups available in CTP1, or did you pull and all-nighter to write this? :-)

    • James – thanks! No, this wasn’t available in CTP1, and no, I didn’t pull an all-nighter, heh. Microsoft was gracious to give me an advance meeting with the product team under NDA, but I wasn’t allowed to talk about it until CTP3 came out through official channels.

      • Brent,

        Very nice article, Win 2008 R2 core support is crutial, I look forward to try that on hyperv, thanks…

        Just to avoid confusion, availibility groups was available in CTP1, though limited to 1 replica, dashboard and multiple replicas works now in CTP3…

        Denali has great feature:)

        Prakash Heda
        OneBox Denali Hyperv setup…http://sqlfeatures.com/?p=10

  8. Brent, one point of clarification, I think this is all dependent on static IP’s, correct?

    • Dale – I wouldn’t use DHCP for a production server, so I wouldn’t know. To me, that’s kinda like saying its dependent on electric power and functioning network switches of 1Gb or higher. ;-)

  9. Pingback: Something for the Weekend – SQL Server Links 15/07/11

  10. Like:

    By default, SQL Server does not accept remote connections. This is because end users are the root of all our problems.

    :-)

  11. Thanks for the excellent write up Brent. I set this all up in my virtual environment and aside from a few issues with Windows firewall it worked like a charm.

    One question I did have was how to query the replicas that allow read-intent connections using SSMS. I was able to do it using sqlcmd and the -K parameter but could not get SSMS to work. I tried specifying additional connection parameters when opening a query (;ApplicationIntent=ReadOnly and some variations of this) but that did not work. Any suggestions?

  12. Hey Brent,

    Thank you so much for such a detailed article. I’m following along with it right now. I had one question for you. You state that we shouldn’t give sysadmin rights to all the service accounts when configuring security for cross-instance connectivity.

    What is the correct way to do this in production?

    Thanks,

    Eugene

    • Eugene – great question, but it’s beyond what I can hit here in a blog comment. You want to follow the principle of least privilege: give each account the bare minimum it needs to do its job, and nothing more. In that case, I’d want to give each service account enough permissions to connect to the other endpoints.

  13. Great article!!!Could you please let me know the transafer of data is based on transaction log or any other technique between the groups?

  14. Thank you this is a great article. BOL is a mine of information but it lacks the human touch that this article brings to the table, which makes the subject much easier to understand.

    Do you know which versions of Denali will include the availability groups feature?

  15. Hi Brent,

    Excellent post! Just a thought – wouldn’t it be better to video capture the above since you are running them on VMs and post a smaller blog with a link on youtube.com?

    RS

  16. Hi Brent,
    Thank you for the detailed write up, very useful. How do we handle the SQL logins when we failover, as they become orphaned users. I tried contained database users, but that doesn’t work well, my JDBC connections throwing up errors with contained database users with local password.

    • I was able to figure out the issue. Earlier, while was I running “sp_migrate_user_to_contained” procedure to migrate the SQL logins to contained users I chose not to disable the logins, but now once I disable the logins, it works now, the application logins now using the database user and its password.

  17. As always very helpful Brent. I have one (two really) questions? 1. Can system databases be part of an availability group? 2. If not, then does one have to do SSIS wizardry to keep the system databases sycned with the replicas so that when the primary server has a meltdown, the other server keeps on chugging along? In desperate need of some illumination.

    Thanks.

  18. Hi Brent,

    So with regards to taking backups on the secondary replicas, its a copy_only backup. This traditionally we’d only use for adhoc purposes so that the sequence is not interrupted.
    Aside from not being able to do differential backups -since the differential map is not updated – would you see any other drawbacks to this approach?

  19. Magnificent article on Denali AlwaysOn. Just a quick question on setting up a cluster since we cannot use this feature if our node is not part of the cluster. Do we need a shared storage for the witness disk for the cluster to work? How about the heartbeat do we need this also?

  20. Great post. Thanks

  21. Hi Brent,

    Is it possible to Load Balance between 2 asynchronous 2012 servers in the same data center for high performance??

    Thank you very much!!

    • Yep, that’s feasible to improve read (but not write) performance with Availability Groups.

      • so If I have 3 SQL servers 1 primary Read\Write and 2 read only all in the same group using a Virtual Network name will sql server auto detect read and writes and load balance the reads to the 2 read only servers??

        Thank you very much for your help!!!

        • Kender – well, it’s not quite as easy as automatically detecting reads and writes. You have to declare your intent in your connection string if you want SQL to automatically put reads on some servers and writes on another. That’s a little beyond the scope of this post, though – I’ll write about that at some point in the future. In the meantime, your best bet is to pick up SQL 2012 and start reading the documentation on Availability Groups.

        • Im asking all these because we dont have access to modified the client app so all logic failover and LB must be done at the server level!! :(

          • Hey Brent thank you very much for your help.

            I did my homework by reading a lot about it before posting here and was very confuse but now I know the answer.

            Thank you much again you were very very help full!!!

  22. Quick question for you on this Brent. Say you had Transactional Replication from Prod to Reporting just for that purpose – reporting. If you weren’t concerned about the latency of this solution, would this be a viable alternative to replication? What could be a potential drawback of this? Would you have the ability to create anything on the secondary DB or is it a true read-only copy?

    • Andrew – That’s not a quick question, that’s three questions, hahaha. But yes, it’s an alternative, and you can’t create anything. It’s a read-only copy of the entire database. Transactional replication lets you get a subset of the database.

  23. If you have time, I have some questions. If not, I appreciate all the info you have shared already. Totally on point.

    Since you have servers in different locations, they must be on different subnets. How do you choose what IP to pick for the access point? My concern is what happens if Chicago goes down, and the access point is there, will Portland be accessible?

    Can the primary be a Cluster?

    Thanks
    Steve

    • Sure, glad we could help.

      You can have cluster members in different IP address subnets now. You don’t have to pick an IP – you can have one IP in each subnet. You use an “OR” dependency for IPs, so as long as a node brings up either of the IPs for the cluster’s virtual name, you can go live.

      Yes, the primary one can be a shared disk failover cluster, too. In that case, automatic failover is done inside that shared disk cluster, not out to other nodes. If you want to fail over to other nodes, that’s a manual process.

  24. Is there a DMV that exposes whether a node in an Availability Group is the primary? I ask for the purpose of adding a step to my jobs that determines whether to proceed that allows for all jobs to continue in event of a failover.

  25. Pingback: Video – How to Test Availability Groups in SQL Server 2012 | Brent Ozar PLF | Brent Ozar PLF

  26. Great Article Brent,

    So is really similar to Database Availability Groups in Exchange 2010. Virtualizing at the application layer as i run both my Exchange Servers in a DAG, on separate data stores and i love it. Like you said, SAN’s do/can fail! As you portray setup looks very familiar, i was just wondering if that was the concept that tried to bring to SQL 2012.

    What’s the difference between this and SQL Clustering?

    Lastly, can i run a beefy dedicated server for my active SQL Always On Server and then run a lower end Virtual Server as my passive box. I’d dedicate a NIC for replication along with another for SQL Traffic. I just want more or less a way to failover to a less beefy server to handle some of the SQL I/O if something did happen to the primary, does that make sense?

    • Mitch – thanks. The difference between this and failover clustering is that failover clustering requires shared storage. It may be connected via fiber, iSCSI, or even just a plain old NAS.

      Yes, you can run asymmetrically sized hardware. I’m in the midst of deploying a setup just like that for a client – they’ve got a beefy failover cluster for production, and then much smaller (although not virtual) machines in DR.

  27. thanks for the quick response!

    Do you think that people will adopt this technology more times than not and go away from the SQL Clustering? One advantage with SQL Clustering is assuming you can do some sort of load balancing, would that be correct? Otherwise this Always On technology would be the way to go, agree?

    • Well, you’ve got a lot of interesting questions in there and it’s beyond what I can address in the comments. I’d recommend picking up a book on SQL Clustering like Allan Hirt’s excellent one, and that’ll help explain the pros and cons of the different high availability methods. There’s not a 2012 version out yet, but the 2008 one will help answer the questions you’re mentioning here.

  28. Great and detailed walk-through. One question for you though, which I have yet to get to a clear and dry cut answer. What version of SQL 2012 should we use for this? I have seen MS’s PDF licensing guide which states that DAG is supported in Standard, but AlwaysOn is an Enterprise feature. In another article they state certain features of AlwaysOn are available in Standard, but they leave it at that. What we are looking for is a 2 node DAG cluster, really acting as an active/passive(DR) scenario. We were thinking we can utilize Standard for this. Any thoughts?

    • Bryant – AlwaysOn is multiple features. AlwaysOn Failover Clustering (the traditional shared drives approach) is supported in Standard Edition with 2-node clusters. AlwaysOn Availability Groups is Enterprise Edition only, and for licensing, there’s no such thing as a passive node in Availability Groups.

  29. Regarding the last question about licensing: I saw it mentioned somewhere that Microsoft may not require additional licensing fees if a secondary replica is not “actively” used, ie. is inaccessible to application users, but is still used for admin tasks, such as backups.

    • Marios – do you have a link for that? We can’t exactly go by “I saw it mentioned somewhere” when the auditors come knocking. ;-)

      • In the “Microsoft SQL Server 2012 Licensing Guide” it is stated that:
        “The secondary server used for failover support does not need to be separately licensed for SQL Server as
        long as it is truly passive. If it is serving data, such as reports to clients running active SQL Server workloads,
        or performing any “work” such as additional backups being made from secondary servers, then it must be
        licensed for SQL Server.”

        So my earlier statement was indeed not accurate. :-)

        If backups are being taken on the secondary replica, that still counts as being actively used and should be licensed.

        (If I am reading the guide right).

  30. On Friday I had no idea how to setup a Windows cluster let alone how to run SQL on top of it. Its now Sunday and I have a four node, two data center AlwaysOn Availability Group setup. Your instructions were clear, easy to follow and much appreciated.

    My boss was going to pay a consultant 2K to setup a two node proof of concept cluster. Tomorrow I get to tell him to save his money.

    Thanks!

  31. Question:

    So we had the Fusion IO vendors in to discuss how we should set up their storage offerings in our new servers. They said thast the Always On High Availability reduces the requirements of redundant disks at the server level and said that we could increase our capacity by configuring the storage in a RAID 0 on each server in the failover cluster, and count on the cluster for redundancy.

    This is contrary to everything I feel! Personally, I think this would increase the chance of failure and make a failure more complicated (seamless disk failure in a RAID 1 or 10 vs. disk failure causeing a cluster failover).

    So my question is – how seamless is a failover using the failover clustering? Is it good enough that we can rely on this for redundancy? Will this effectively eliminate the need to have disk parity?

    • I agree with you completely. Lets assume AlwaysOn works perfectly with no down time or data loss and your users don’t know anything happened. When the smoke clears you have a down cluster server. Depending on how you design your cluster you may be a single drive failure or WAN link blip away from a complete SQL shutdown.

      Also, I have seen a lot of RAID sets go into a degraded state because of a disk going offline. Often the RAID can be restored by simply reseating the drive. I would hate to have to repartition and restore every time this happens.

  32. Brent, to be sure, we “must” or “can” set this up using stand-alone installations?

  33. Hi Brent,

    Great article, very easy to follow.

    I do have a question.
    We are thinking about setting up a main SQL cluster (2 node) and a secondary server. The databases in the main cluster would be in an availability group with the secondary server.
    The idea is that the cluster takes care of failures on the SQL box, and the secondary server ensures data redundancy. A two phase failover, if you will.
    My question is, is this a valid setup?
    And also, would you recommend using synchronous mirroring in this scenario?

    Thanks,
    Michel

    • Michel – Yes, that’s a valid setup. If your business needs disaster recovery – like if the datacenter catches fire – then you’ll also want a replica offsite or log shipping.

      Synchronous mirroring can add latency to transactions if the storage isn’t fast enough. If your business demands zero data loss, then you can equip the servers with solid state drives to reduce some of the latency. Otherwise, if the business doesn’t need zero loss, I’d avoid synchronous mirroring.

  34. Hi Brent,
    Can you please show us the static IP’s you’ve assigned to your machines(only if it’s okay with you) in your lab for setting up Always ON Availability Grps?

    • Sreekanth – I use IPs in the 192.168.37.x range, but I’m not sure if that’s what you’re asking. What’s the problem you’re trying to solve, and that might help you get a better answer?

  35. Brent,
    I’ve just found that with my 2 node DAG setup if the primary server’s SQL service is stopped (just default instance) the secondary server does not come up – just get continuous ‘Resolving’ on secondary server. Any idea what is going on there? In the same setup, synchronous mirrored databases automatically failover without any problems!

  36. James, I assume you are using a sync replica with auto-failover?

    This white paper discusses how to set up cluster quorum for various cluster setups:
    http://msdn.microsoft.com/en-us/library/jj191711

    I think in your case you need to add a file-share witness – outside the cluster – with each own quorum vote for a total of 3 votes. You then need to set up the quorum as node-and-file-share majority.

  37. Hi Brent, ive been searching around but cant seem to get the answer to this query:
    Using your example,
    If my reading is correct, the Availability Group Listener service is hosted in SQL2012PROD1.

    In the event where there is a failover from SQL2012PROD1 to SQL2012PROD2, the Availability Group Listener will be failover to SQL2012PROD2 as well yes? So that now all client connectivity will be managed by the Availability Group Listener on SQL2012PROD2.

    Apologies if this seem very fundamental, as AlwaysOn is pretty new where im at.

    Thanks much in advance. :)

    • Colin – it depends on whether a failover cluster is involved, or just an Availability Group. Your best bet is to set up a lab with this and play with it yourself – the concepts will become more clear then.

    • Colin thats correct – the Listener is a client access point held within the WSFC that you have to create to make ‘AlwaysOn’ possible.
      So when SQL2012PROD1 fails to SQL2012PROD2, the Windows cluster – and by deduction all services and resources beneath it – move with it…

      • The listener is not necessary for AlwaysOn Availability Groups (AGs) to work. You can set up AGs and skip the listener creation. The listener is useful when you want to seamlessly get apps to connect to the new primary replica on failover and it adds some nice functionality having to do with ReadOnly routing (getting reporting apps to connect to a secondary replica with no explicit mention of the sec. replica server name).

        One can address the issue of app failover through the use of a DNS alias for the server name of the primary replica and getting apps to connect to the DNS alias and not the actual server name. Then when failover occurs, simply switch the DNS alias to point to the new primary-replica server name. It is not as seamless as using a listener, to be sure, but it works as well.

        I just don’t want people to mistakenly think that the listener is “needed” for AGs to work.
        It is not.

  38. Hi Brent,

    Say we’re working off the assumption that we will all now be building several local instances per node instead of clustered instances, we obviously need to consider the 1433 port situation. Rather than having to listen on 1434, 1435 and having to adjust all the configs down the tree, is it feasible to present a pool of IPAddresses to a nic and force the local instances to each listen on port 1433, but with a different IP Address?

    • Hmm – lemme ask a question first. Why several instances per node? You can end up with a lot of CPU/storage/network contention, patching problems, security challenges, and there’s not much gain.

      • Sure thing; we are running on a consolidated SQL clustered environment at the moment which probably works out at 4 instances per node. We would want to implement the same line of thinking for SQL 2012 but without the SQL clustering.
        I would have thought this line of thinking would be most people’s approach, there aren’t many implementations of one sql instance per physical machine these days unless its a TB db surely?

  39. Pingback: Create Availability Group « Secret SQL

  40. Hi, im wondering what kind of quorom configuration i shall setup in a 3 server config. Two nodes in Always ON active/active and the third as DR Asynk, thus not eligable to vote, making it two servers in failover manager, i really dont want to add a shared storage in to the mix just for witness disk.

    • The disk your only option short of adding in an extra node I’m afraid, a file share witness has proven to be an unreliable method.

  41. I know this article is almost two years old now, but it was perhaps the most succinct article I found on setting up Always On. Just used it to setup a lab system between a pair of Server 2012 / SQL 2012 systems with no big issues.

    Thank Mr Ozar!
    Jason A.

  42. Pingback: Just a quick note…. « Qreed IT Consulting AB

  43. Hi Brent, awesome article! As always i dare to say :)

    I’m wondering am i missing it or there is no way to create email alerts when automated failover occurs, or if there is a problem. In general i cant seem to find anything related to alerts and AAG.
    Do you have any advice on how to setup notificatiions for AAG events?

    Thanks in advance,

    orshee

    • Thanks, glad you like it! Personally, I like having a stored procedure run when SQL Server starts, and that stored proc uses Database Mail to send an alert to the operators with basic diagnostic info (virtual name, hardware name, hardware qty, etc). That way I know even if the service restarts on the same node.

  44. Brent-This is such a nice and as simple as possible article as always! just gonna set up one now in lab … Will look forward a read on the other New features like performance wise. Thanks!

  45. Any ideas on how to test out automatic page repair?

  46. Hi Brent,

    Consider a scenario of 2 replicas: a primary and an *async* secondary.

    If the primary replica fails and we manually failover to the secondary, there will be some data loss.

    Would it be possible to restore transaction-log backups on the dbs of the secondary replica (which will now be the new primary) in an effort to minimize the data loss?
    Or are the dbs going to be in such a state that will prevent restoring log backups?

    We are reluctant to implement replicas in sync-commit mode in our environment, as that will pose a risk to the performance of our applications. At the same time, we are looking for options to reduce the data loss we would suffer in the case of primary-replica failures.

    Thank you for any input,
    Marios Philippopoulos

    • Marios – that’s not technically supported. It might be possible – I haven’t tried – but since it’s not supported, it’s not something that I’d see widely adopted in production environments. You’re free to try it as an experiment though.

      • Thank you for responding. I had a chance to look into this a bit more, and I think it can work, if, before failing over, we drop the avail group on the sec replica. At that moment the databases should be in a state that allows restoring log backups to them prior to recovering them. Of course, once the avail group is dropped, it will need to be recreated from scratch, and that can be costly in terms of time/effort for large databases. I haven’t tested this scenario yet though, so it is still a theory at this point.

  47. Hey Brent, i saw your guide and it was very useful now that SQL 2012 is RTM i have a question for you, how do you manage the the SQL availability group failover as windows cluster failover, i cant manage to do that, is there a way to acomplish that??

    Using Windows Server 2012 STD and SQL 2012 EE SP1

  48. Great work. Have setup my first SQL Cluster without having to leave it to DBAs. Great for increasing my knowledge of a SharePoint backend.
    Although, wondering…
    - previously I would use a SQL Alias – can I do this still and this will be used in AG.
    - when using multiple instances, how does this relate to the OS Cluster Failover Virtual IP and name?
    I’m not sure if my applications should be using the main OS Virtual name or I should be creating one for each SQL Instance.

    I also notice that the Windows Cluster Manager still thinks a host is vaild even though its shutting down and installing updates – it responds to ping.

    • Hi, Paul. You can still use an alias – just point it to the listener.

      When using multiple failover cluster instances, each one gets its own virtual IP and name, so it’s unrelated to the cluster access point’s IP and name. In addition, each *listener* gets its own virtual IP and name too. You’ll want to connect to the Availability Group through the listener’s name. It’s possible that the listener will fail over somewhere else (due to a single corrupt database, for example) while the instance supporting it remains online.

      When installing updates, you’ll want to be proactive – move any resources off that node first before installing updates.

      Hope that helps!

      • Thanks, thought I should be using the listener to differentiate.

        Wonder if you could help with my failover as I am pulling my hair out.
        I have 2 SQL servers and used the synchronos option.
        Used a File Share for Quorum as do not want to use another server.

        If I lose the Primary – it fails over. Does not fail back unless downing secondary. That’s fine.
        If I lose the Secondary – nothing happens as expected. Fine.
        If I lose the fileshare – the whole show stops. It also prefers that HA2 takes control even though HA1 has just as much in ‘votes’.

        I suspect this is related to some Quorum setting. Have seen references to removing votes but cannot figure out or what is best.

  49. I am setting up 2-node, non-shared storage Availabilty Group(s) with local MSDTC’s.
    In our non-production environment we do a fair number of database cross-loads from production.
    We currently have a shared storage environment now, so we only “need” to restore once.
    Would the non-shared storage mean we would need to restore to both nodes otherwise we may incur
    heavt ‘syncing’ between nodes with AG ?

    • Mike – databases that are in an Availability Group can’t be restored. You’ll need to pull them out of the Availability Group, do the restores, and add them back into the Availability Group.

      • Back the DB up on the source server.
        Restore it to the primary node DB which has to be removed from AG.
        Back up the primary node DB(which was just restored) along with transaction logs.
        Restore the second backup to any secondary node(s) DB(s) which have also been removed from AG, along with transaction logs in norecovery mode.
        Add the primary node DB and secondary node DB back to AG.
        In prod where there is no restores, no issue.
        In other environments OUCH!
        Thanks, Mike

  50. Hello.

    First of all sorry for my poor English, and thanks in advance for your help.

    I’m setting up an SQL Server 2012 AlwaysOn in my work, but the system that access to the BD has a lot of querys with “with(nolock)” that are not controled by StoredProcedures in the BD.

    This mean that inside the System’s code are a lot of Querys with(nolock) hint, How can affect that kind of querys in the System behaviour if the Query is executed by one of my secondary Async replica?.

    If anyone can help me? please.

    Regads, Jose

  51. Hi Brett,
    I’m looking for a way to measure the over head of transaction latency that is added when a database in an AG group is synchronous mode compared to asynchromous mode.

    • Dustin – sure, great question. If you use a load testing tool, you can gauge transaction throughput (like batch requests per second) and latency through your load testing tool. If you’re not using a load testing tool, you can’t really get reliable overhead numbers since no two real-life loads from time to time are exactly the same.

  52. HI Brett,

    If I want to upgrade 2005 mirroring with mesh set up like below,

    DB1 on Server 1(principal) —-> server 2(Mirror)
    DB2 on Server 3(Principal) —-> server 1(Mirror)

    how would I set that up in SQL 2012 AG?
    I believe you can only set up AG from primary server not on secondary server.

    Regards,

    Ki

  53. Hi Brent – Big fan. Attended your session in PASS11. Wonderful article as always.

    A question that has been bugging me for a while… Here’s the situation. Currently we have a read-only server for reporting that is subscribing to replication from two production servers (let’s say Prod1 and Prod2). Using AlwaysOn, we would have to set up one of the production server (Prod1) as primary and use a replica as read-only. What is the best strategy from here to keep our current setup? Is it possible to subscribe to replication to Prod2 from the read-only replica?

    • Hi, Eric. Glad you liked the session!

      Unfortunately, advice on a system like this is beyond something I can do in a blog comment. Like you’ve figured out by now, there’s no quick, sure fire solutions when multiple technologies are involved. We offer a 3-day SQL Server Critical Care engagement to get answers like this. If you’d like to schedule one, click Contact at the top of the blog, or email us at help@brentozar.com.

      • Thanks for the quick reply. It would really scratch a big itch if you can (please?) provide a yes or no answer to my last question though :)

        • Well, the best place to start for questions like that is Books Online. Here’s the page on using replication in combination with AlwaysOn Availability Groups:

          http://msdn.microsoft.com/en-us/library/hh710046.aspx

          If you’ve got questions that our blog posts and Microsoft’s Books Online can’t answer, though, feel free to contact us to set up a consulting engagement where we can give you personal advice.

        • I’ve tested this extensively.
          Both your production servers would become publishers of the publication.
          In failover the distributer contains a table called MSredirected_publishers, where you are able to specify the listener name and port no. This means that your subscriber will seamlessly be able to redirect its queries to the ‘new’ primary.
          Pls note, you have to use a remote distributer to make this work, since distribution databases are not supported by ‘Always On’, and in any case a unique distribution database to each instance would be missing the point.

  54. Thanks a lot Brent for the document !!

    I am working as a SAP Administrator, We are planning to use SQL 2012 as our database and Windows 2012 as an operating system for SAP. And we want to use the Always-On availability groups (with Async mode local replica) setup on 2-node Windows failover cluster. I have a question here, request you to help me with the answer.

    Q: How do we need to install the server for utilizing Always-on availability group technology? (I mean do we need to install the SQL server manually on two seperate nodes with same drive letters and with same number of ndf & ldf files on both the nodes)

    Thank you in advance.

    Chandan

  55. Hi Brent,

    I need a HA in file share witness.

    Do you know if i can do a secondary “file share witness” like a “Exchange Dag”?

    • Alexandre – sure, how about a clustered file share? Use the same clustering that you probably already use for a highly available company file share.

  56. Thanks Brent for the write up! I find it so comprehensive and to the point. I do share your concern that this feature is only available in enterprise edition.

  57. Nice article,
    very concise and straight to the point.

    I have a similar infrastructure to the one you described in the article, only that at my DR site, the servers will not be on the same domain as the servers on production. I will communicate with those servers via a management LAN and then the site becomes active on fail-over with exactly the same IP as on Live.
    In essence SQL2012PROD1 & SQL2012PROD2 are on the same domain, while SQL2012DR1 is at DR and passive (not on the same domain).

    How do you advise that I push data to DR and still be able to smile when disaster strikes?

    Secondly, I have a sprawl of dbservers, I instance per server. Does this mean I have to create 1 DAG per server?

    • TOsin – this is a little beyond the architecture planning we can do in a blog comment. If you’d like personalized consulting help, feel free to click Contact at the top of this page and we can talk through what a consulting engagement looks like.

  58. The databases we have are all on VMs. They donot have a Windows Cluster and do not plan on having a Windows cluster in the near future. Is there anyway I can implement Always ON in this situation ?

  59. Brilliant, Brent. Thanks for posting this. I’m a VMware and NetApp dude – definitely not a SQL dude – and this walk-through helped me a get a 3-node AAG up and running in no time. I’m using it to mimic a client environment where we’re implementing a NetApp SnapManager for SQL backup scheme. I was running into trouble not having worked with AAGs before and this helped tremendously. Thanks again!

    All the best,

    Mike Brown

    http://VirtuallyMikeBrown.com
    https://twitter.com/VirtuallyMikeB
    http://LinkedIn.com/in/michaelbbrown

  60. Awesome.. What a tutorial.. clearly explained each and every point for configuring AlwaysOn feature… Thanks Brent.. :)

  61. Hi Brent,

    I have a situation similar to this; I have two offices in two locations (15km apart). But both the offices are connected to a fiber backbone (dedicated). What I need to have same application on both sides which uses an sql server. Basically I want to have SQL Server installed on a server in Location A and B. If I do any data transaction (add, delete, edit) in one location, I need to be able see (replicate) over to the other location also. This should happen the both ways. Is this the scenario you have here between SQL2012PROD1 and SQL2012PROD2 (Primary and Secondary) or else Only Primary is usable for the transaction and the secondary is strictly reserved for the fail-over operations?. In other words In your scenario above; if I do any transaction on a DB of any server(SQL2012PROD1 or SQL2012PROD2) will that replicate on the other server (SQL2012PROD1 or SQL2012PROD2) both ways? Please clarify. And really appreciate the lesson

  62. I have two sites one mile apart connected with fiber over a 4gb backbone. I’m currently running SQL 2005 with synchronous mirroring with one node at site A and the other and the witness at site B. This has worked great for years. Now its time to upgrade. In the new environment we’ll be running vmWare over NetApp at each site with SQL 2012 as our database. Our dba’s are proposing replacing SQL mirroring with availability groups using three synchronous replica’s: 2 at site B and one at site A.
    I am concerned about transaction performance and about failover performance and that the NetApp at site B will be a single point of failure. Also, do we really need three synchronous replica’s? I could see adding an asynch replica down the road. Your thoughts would be appreciated.

    • Hi, BR. I don’t actually know if you need 3 synchronous replicas or how performance will work out for you – that’s something that you’ll need to build out and test to find out. Of course, I’d give that same recommendation no matter what technology you use – you should build out a lab for it and test it first.

  63. Hi Brent,

    I was able to find a way around the cluster setup across datacenters, but I am running active-passove which runs beautifully, however, I need to know if you can create listeners on each network, one on prod and another at DR so apps can connect to the DAG from either location, regardless of where DAG primary is running from at any time.

    • Tosin – the term DAG is an Exchange term, and doesn’t really apply here to AlwaysOn Availability Groups. In AlwaysOn AGs, the listener answers from anywhere just like a web server takes requests from anywhere.

  64. Just wondering if you could of used SQL2012PROD2 as the reporting server, can you have a read only copy of the data in synchronous mode?

    • You can indeed have a read-only copy in synchronous mode, but be aware that when you put that server under load (like heavy reporting queries), it will slow down transactions on the primary. That’s not usually a good idea.

  65. Hi Brent,

    I have the Always on Setup on my infrastructure, 3 nodes per AG, 2 in synchronous, 1 in async.
    Everything looks fine, I try to connect to the listener from SSMS, it works like a charm. I have configured automatic failover also,

    However, I try to simulate a failure on the primary node by either by stopping the SQL Service or removing the network card. (I’m virtualized)

    I suddenly see that I cannot connect to the listener again.which means there is no failover.
    Could I be missing something here by way of config or otherwise please?

    • Tosin – unfortunately troubleshooting this is way beyond the scope of a blog comment. You’ll need to open up the documentation and start digging in. For faster help, contact Microsoft support.

  66. Hi Brent, great write up, have to admit a newbie at this cluster thing, we currently have two separate SQL clusters on different subnets each with their own SAN storage and copy of their database, I am looking to cluster the two clusters, to give one front online sql instance that replicate the same data to each other, so with two working individual clusters that fail over correctly i am unable to join the databases together in SMSS under HA. Where am i going wrong? or can point me in the right direction.
    thanks

    Alex

    • Alex – if the clusters already exist, and they’re not in the same Windows cluster, you can’t. Members of an AlwaysOn Availability Group have to belong to the same Windows cluster.

  67. Great article Brent! I have used this as a go to guide for all of my AlwaysOn configuration and setup. While a shared SAN is not required for SQL 2012 AlwaysOn, would you have a preference of one of the two possible configurations suggested here? Uptime requirement is 99.99%.

    Scenario 1: 3 SQL Servers, each with it’s own dedicated storage. Node 1 and Node 2 are at the main site and Node 3 is at DR location. Synchronous data between Node 1 and Node 2 and Asynchronous data transfer at Node 3.

    Scenario 2: 3 SQL Servers – Node 1 and Node 2 are at the main site and Node 3 is at DR location. Node 1 and Node 2 are on a shared SAN, a.k.a cluster and Node 3 has its own dedicated storage. Synchronous data between Node 1 and Node 2 and Asynchronous data transfer at Node 3.

    Scenario 1 gives you auto failover to Node 2 and also does not have the issue of SAN failure.
    Scenario 2 gives you auto failover to Node 2 but in the event of a SAN failure, there would be manual failover to Node 3 (some possible downtime).

    Thanks!

  68. Hello Brent,
    I am taking on a big Always on setup with always on setup between 2 three node cluster with two SQL instances each.

    For the setup of this do we need to do something different. I was reading somewhere that we will have to add the SQL node on the Primary. your reply will be really appreciated.

  69. Pingback: Configuring SQL Server 2012 Always On Lab- Part 1 | Scott Mattie's Blog

  70. Pingback: Configuring SQL Server 2012 Always On Lab- Part 1 - SQL Server - SQL Server - Toad World

  71. We’re using 2012 Availability Group Node Failover Primary>Secondary and password data encryption (not TDE at the database level) on Master Key Encryption By Service Master Key. Secondaries are created via restore from primary node backups. On node failover from primary to secondary we encounter a an error where the master key cannot be opened automatically on the fly, requiring manual intervention to open the master key. So far Google is not our friend. Does anyone know why ?

  72. Hi Brent,

    Thanks you so much for the detailed explanation of AlwasOn. It really helped understand this new technology. I am trying to implement similar configuration, but running into some problems. I am planning to create FCI + HA solution in the same datacenter. I’d really appreciate if you can provide some guidance.

    Here is the current configuration (All nodes are in same datacenter):

    WINCluster1
    SQL1 – Part of FCI instance SQLFCI1 (S drive for data, L drive for log, Q drive for quorum)
    SQL2 – Part of FCI instance SQLFCI1 (S drive for data, L drive for log, Q drive for quorum)

    The above is a standard SQL FCI cluster and it works great.

    SQL3 – Standalone box for HA (Separate S drive for data, separate L drive for log, No Q drive)

    I added SQL3 to WINCluster1 and setup Node Majority for quorum (Removed Q drive disk witness from quorum) as I have odd number of nodes.

    I removed SQL3 from possible owner for SQLFCI1.

    Then, I setup AlwaysOn Availability Group AG1, with SQLFCI1 as Primary replica and SQL3 as secondary replica. The setup is Asynchronous with Manual failover and readable secondary.

    When I use SSMS to failover between SQLFCI1 and SQL3 (back and forth), it works fine.
    When I use FCI under WinCluster1 to failover between SQL1 and SQL2 for SQLFCI1, it works fine.

    But, when I shutdown SQL1, the SQLFCI1 fails over to SQL2, which is perfect, but SQL3 goes offline. I can remote into SQL3, click on drives, etc, but the cluster resources goes offline.

    Same thing happens when my network engineer unplugs NIC cards, simulating NIC fail scenario, from SQL1. And both cases happens when SQL1 is online but SQL2 is down or NIC unplugged.

    Here is the error message under event viewer:
    The cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.

    And another strange thing is, when I shutdown both SQL1 and SQL2, under WinCluster1, the current owner for SQLFCI1 says SQL3, even though there is nothing shared between SQL3 and SQL1, SQL2. SQL3 is just standalone box. It brings down the Windows cluster and nothing works. Once SQL1 and SQL2 come up, it starts working.

    Hope this explains the current configuration and issue. Thank you so much sharing your knowledge. It helps a lot to DBAs like me who are just starting in the DBA world.

    Have a wonderful day.

    • Hi Raj. Thanks for typing this out, but this is way beyond the scope of the free troubleshooting I can do in a blog comment. You’re welcome to bring us in for consulting help, but for free advice, you’re going to be best off posting the specifics over at http://dba.stackexchange.com.

      • Hi Brent,

        Thanks for the prompt reply. I will post the question there. I will also talk to our management team regarding the consulting help.

        Thanks and have a wonderful day.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php