Before we get started, we need to talk about something. I’m going to be using the C word – that’s right, Clustering. Deep, calming breaths. Relax. We can do this, you and me. This isn’t the old-school clustering with shared disks, heartbeat networks, and hardware compatibility lists. This is just another feature you’re going to add to your servers, click a few buttons, and move right on to more important things.
What We’re Going To Build
My imaginary business in Chicago is running a mission-critical accounting application that need to be online at all times. We need to be able to tolerate the following problems:
- If the production server dies, I need to fail over automatically to a second server in the same Chicago datacenter. (This also means the production and secondary servers can’t be on the same SAN, because SANs do fail.)
- If the datacenter goes down, I need to fail over to a DR datacenter offsite in Portland. The business is okay with losing some data in the event of a failure that large, and they’re okay with this failover being a manual process.
- The BI team wants to run reports on the live database with as little delay as possible. They know they should be building a separate warehouse, but they’re too
lazybusy, and they want this to work right now.
- Backups need to run faster, and ideally, I don’t want to run backups on my main production server.
To solve this, I’m going to implement four SQL Servers with Availability Groups. The four servers will be:
- SQL2012PROD1 – the primary production SQL Server in Chicago. Data will be stored on a SAN.
- SQL2012PROD2 – the secondary production SQL Server in Chicago. It’s located in the next rack over, and it’s got all local storage so that we can tolerate a SAN failure without changing datacenters. We’ll be doing synchronous commits between SQL2012PROD1 and SQL2012PROD2, which means I do still need fast storage on #2 – a good fit for SSDs.
- SQL2012DR1 – an offsite SQL Server in Portland. Because the business is okay with some data loss in the event of a complete Chicago datacenter failure, we’ll be doing asynchronous commits to this server.
- SQL2012RPT1 – a read-only SQL Server in Chicago. It can be on any storage (SAN or local). We’ll be doing asynchronous commits here, which means we might be running 5-60 seconds behind depending on load, but that’s still more than current enough for the BI team’s reports. After hours, when users aren’t running reports, we’ll run full backups from this server.
Now let’s get started building these four servers.
Prerequisites: Before We Install SQL Server 2012
I’m using four Windows Server 2008 R2 Enterprise servers in this tutorial, but you can use as few as two. I strongly recommend using the same drive letters across all servers: for example, if you want to use E for apps, F for data, and G for logs, then all four servers need to use those same drive letters. If the production server has an H drive that no other server has, and we add a data file on the H drive, then that alter-database statement will fail across the rest of our servers – instantly marking their databases as suspect. We can fix that by restoring files to the other servers manually, but the whole point of our solution is to make it easy and automatic, so keep those drive letters identical from the start.
Windows Enterprise is required due to the clustering features, but they don’t have to be beefy machines – in my lab, I’m using VMs with 2GB of RAM.
Each server needs the .NET Framework 3.5.1 feature and the Failover Clustering feature already installed. To install these features, go into Server Manager, Features, and click Add Feature. In the Add Features wizard, check the boxes for .NET Framework 3.5.1 and Failover Clustering:
After those two features are installed on all of our servers, we need to configure the cluster. Go into Control Panel, Administrative Tools, Failover Cluster Manager. Click Validate a Configuration, and enter the names for your SQL Servers.
Note that all four of my servers are in the same domain. You can’t use servers in different AD forests here.
Click Next through the wizard, and you’ll be asked what tests you want to run. I run all tests every time, but technically, you could skip the storage tests if you’re only using Availability Groups. These servers will not be sharing storage, so I don’t need to test whether any of them can see each others’ drives. If you run all tests, you’ll see warnings in the Storage section:
Whether you pass or fail, click the View Report button to get more insight on your servers. A detailed HTML report pops up to give you the skinny:
The Network section of validation is giving me warnings because I’ve got a few single points of failure. Each of my servers only has one network card and no network teaming. If a network card went bad, if someone tripped over a patch cable, or if somebody assigned an IP address that duplicated my SQL Server’s, then whammo, one of my nodes would go down. In a real production environment, you’d want to address these issues.
If you got errors that you want to change, go fix them and rerun validation. Otherwise, if you don’t see any showstopper errors, close the validation report and click the “Create the cluster now using the validated nodes” link.
Creating The Cluster
The Create Cluster Wizard will ask for the new cluster’s name, IP address, and network. The cluster is like a virtual server that doesn’t really exist, but just gets passed around from server to server depending on who’s in charge. You remember that time at the restaurant when you found a hair in your bacon, and the waiter said it looked an awful lot like yours? And then you stood up and asked for the manager? You didn’t know what the manager’s exact name was, because the manager changes all the time from shift to shift over at Harry’s Bacon Shack, so you just asked for the manager. Some guy came over with a nametag that said “Manager On Duty.” Well, that’s exactly what we’re creating here – a manager-on-duty nametag.
The cluster’s name is not the name of the SQL Server. That’s a really important distinction to remember. A restaurant might have a lot of waiters, but it only has one manager. Your cluster may end up with several different instances of SQL Server down the road, but there’s only one manager resource for the cluster. The manager is the one you call when there’s a problem with your waiter. If SQL Server is serving us flies in our database, we’re going to call the manager. The rest of the time, we can pretty much forget that the manager even exists.
In my example, I’m naming my cluster SQL2012Cluster1. I’m only naming it that because all of my machines are named SQL2012*. Your naming scheme may vary, but I just wanna drive that point home that the cluster name doesn’t really have anything to do with SQL Server.
After the wizard finishes, Failover Cluster Manager will show our newly created cluster with four nodes, a network, and no services or applications.
Yes, installing the Failover Clustering feature really is as easy as checking a box and clicking next-next-next. Now, don’t get me wrong, we’re going to have a few more things to do down the road, but take a moment here to appreciate just how far Windows has come in the last ten years.
Next: The SQL Server 2012 Setup
Download SQL Server 2012 Evaluation Edition and start the setup on one of the Windows machines. The very first screen in setup is just a little tricky because it offers one link for “New SQL Server stand-alone installation” and another for “New SQL Server failover cluster installation.” The one you want is stand-alone. You can indeed do clustering with 2012, but for our solution, we’re using Availability Groups by themselves.
In the Setup Role step, choose SQL Server Feature Installation.
In the Feature Selection step, the only feature you absolutely need is Database Engine, but here’s what I choose for lab boxes:
In real life, I wouldn’t include Distributed Replay on my production servers, but in a 2012 lab environment, it’s a fun feature to start playing with. Distributed Replay holds the promise of making load testing easier.
In the Server Configuration step, specify a domain account for SQL Server to use. In my SQL Server setup checklist, I explain why you should use a different domain account per instance. (I’m not so anal-retentive as to use a different account per service per instance, though.) Using domain accounts will make it easier for us to let each SQL Server instance connect to the other instances. Note that if you’re using Kerberos, or plan to later, you should use the same service account for all instances in the same AvailabilityGroup.
The screenshot shows that I use the server name as part of the service account name. My personal naming standard is:
- ‘svc’ – all my service accounts start with this so I can quickly pick them out of AD
- ‘MachineName’ – because I use a different account per Windows machine for each service
- ‘ServiceName’ – because I might be running multiple services on the same Windows box
So in the example below, I’m using LAB\svcSQL2012PROD1_SQL for the database service accounts on SQL2012PROD1.
In the Database Engine Configuration step, make sure to click Add Current User as an administrator for the database engine. Also in this step, click on the Data Directories tab and configure the default file paths. Use the same paths across all of the SQL Server instances we’re using for AlwaysOn.
Next your way through the installation, and a few minutes later, you’ll have yourself one working SQL Server. Repeat that process for all of your 2012 machines.
Configuring SQL Server 2012
By default, SQL Server does not accept remote connections. This is because end users are the root of all our problems. While I admire Microsoft’s reluctance to let their product be used, we’re going to have to change that default setting. Click Start, Programs, Microsoft SQL Server 2012, Configuration Tools, SQL Server Configuration Manager. Drill into SQL Server Network Configuration, click on Protocols for MSSQLSERVER, and double-click on the TCP/IP entry. Change the “Enabled” option to Yes, and click OK.
Back in Configuration Manager, enable AlwaysOn by clicking SQL Server Services in the left pane, then double-click SQL Server. Click on the AlwaysOn High Availability tab. (Wait – if it’s AlwaysOn, shouldn’t it be called Complete Availability? Hmmm.)
Click OK. Our changes won’t take effect until we restart the SQL Server instance, so now’s a good time to do that. Repeat this process for all of your 2012 instances.
Configuring Security for Cross-Instance Connectivity
Like any close-knit family, our instances are going to have to talk to each other. Each of the server’s service accounts is going to need to connect to the other instances. In a real-life environment, we would be more conservative with our security, but for the purposes of this lab, we’re going to take a couple of shortcuts. You should never go live with this configuration. This is purely for training purposes. (K. Brian Kelley is going to kill me when he reads this, but I’m aiming for quick setup.)
Open SQL Server Management Studio and connect to one of the instances. Go into Security, right-click on Logins, and create a new login. Click Search, and put in one of the service account names that you used for each SQL Server. On the left side, click Server Roles, check the sysadmin box, and click OK.
Repeat this process for every service account, and repeat it on every instance. In all, you’re going to be setting up 3 accounts per server on all 4 servers.
A slightly easier way (and my personal favorite): put the service accounts in an Active Directory group instead, and then create a login for the group. If you take that approach, be aware that groups don’t show up by default when you’re searching in the new-login creation process. After you click Search, click Object Types, and check Groups.
Last, but not least, we need a file share that all four service accounts can access. We’re going to initialize our replicas with backups and restores, and to do that, all four services need to be able to read & write to the same folder. This file share can be on any file server or on one of the SQL Servers, but make sure to grant rights to read/write for all of the service accounts. (Again, made easier if we’re using AD groups.)
We Interrupt This Setup for a Talk About Your Job
There’s been a few times in this post where I’ve mentioned tasks that might be outside of the typical SQL Server DBA’s responsibility zone: creating file shares, adding AD accounts and groups, and picking names and IP addresses. If you’ve never set up a cluster before, this may seem intimidating, but fear not. If these responsibilities are outside of your domain, they’re very trivial matters for your Windows team.
If you were doing database mirroring before, then you’ve probably already dealt with file share permissions and Active Directory. AlwaysOn Availability Groups don’t change anything there. However, they do add the requirement for a virtual network name and IP address. That’s not a big requirement.
Finally! Let’s Play with Availability Groups
Back to SQL Server Management Studio. On the main production server (in my case, SQL2012PROD1), set yourself up a few lab databases – either create a few new databases or restore databases from your other servers. If you create new databases, take a full backup of each database before we go on. It’s not that I treasure your work, but rather your newly created databases aren’t really in full recovery mode until the first full backup. (And yes, the Availability Groups setup wizard checks for this.)
In SSMS, go into the main production server, click Management, right-click on Availability Groups, and click New Availability Group Wizard. You’ll be presented with a list of databases that you can include in the Availability Group. Keep in mind that all of the databases in the Availability Group will be failed over from your primary server to your secondary server together, automatically. In my case, my accounting application consists of the databases Accounting, AccountingPayroll, and AccountingReports. I have some applications that make cross-database SELECT queries for reporting purposes, so I want to make sure they all fail over together.
Click Next, and you’ll be able to specify which servers are part of the group and what their roles are. This below screen has so much of what I’ve always wanted in a database product. I’ll need just a moment by myself. Excuse me.
This is, as Martha Stewart would say, a Good Thing. Let’s take it one column at a time:
Server Instance – obviously, the server name we’re dealing with.
Initial Role – SQL2012PROD1 will be my Primary instance because it’s got the current live copy of the data.
Replica Mode – PROD1 and PROD2 will do Automatic Failover. Only two instances can be involved in Automatic Failover at a time. For my business configuration, DR1 and RPT1 are in High Performance mode, which is akin to asynchronous database mirroring. High Safety is also an option, meaning synchronous mirroring.
Connection Mode in Secondary Role – when this instance isn’t the primary instance, will it allow connections? If so, then we have to be aware that its performance will slow down. I’m not keen on allowing any connections to my server that’s acting as an automatic failover partner, because that server needs to commit writes as fast as possible. In my setup, I’ve got PROD1 and PROD2 set to disallow connections for that reason. My DR1 and RPT1 instances, however, can allow connections, thereby letting my read-only users connect there to run reports or backups.
That’s amazing. This is probably the coolest screen of any wizard I’ve ever seen. This one screen lets me protect and scale out multiple databases easily. Granted, it took us 2,800 words of explanation to get to this point, but it’s way easier than any other scale-out method I’ve seen. It’s all in the GUI, for crying out loud.
Click Next, and the next screen will set up a listener. You can just take the defaults here.
On the Select Initial Data Synchronization screen, put in the UNC path for the file share that you set up for all of the SQL Servers to access. The production server is going to take a backup to this path, and the replicas will restore that backup. You can also skip this step entirely if you’d rather initialize the replicas yourself, and in many production environments, you’ll need to do that. For example, environments with SAN snapshot capabilities or small WAN pipes will need customization.
Next your way through the wizard and validation, and next thing you know, you’ve got an Availability Group. Right-click on your new Availability Group and click Show Dashboard. Presto:
What to Play With Next
Here’s some of the cool things you can experiment with:
- Take your full and transaction log backups on a replica. (You can’t do differential backups on the replicas – long story.)
- Query the replicas
- Run DBCC on a replica
- Query the new DMVs for AlwaysOn features
- Remove a replica, and add a different server as a replica
- Automatic page repair – 2012 automatically detects corrupt pages on the replicas and repairs them by fetching a good copy from any other instance
- Replication – Availability Groups can include a replication publisher, but not the distributor
I’m really, really excited about AlwaysOn Availability Groups. Yes, it requires some compromises (like Windows Enterprise and an additional name/IP) but it’s worth it.
Learn More in Our AlwaysOn Resources
Want help? Talk to Brent for free.
The problem probably isn’t fragmentation – you keep defragmenting and the problems keep coming back.
Our 3-day SQL Critical Care® is a quick, easy process that gets to the root cause of your database health and performance pains.
Learn more, see sample deliverables, and book a free 30-minute call with Brent.
Thx for taking the time to write such a complete article. Very interesting ! (ps. I had the same feeling when looking at the wizard screen, impressive 🙂 )
Is is a great read. thank you
I wish our business needs required something on this level so that i could get my hands dirty.
Great article – I loved the comment about Microsoft’s reluctance to let their product be used :-).
It will be interesting to see the rules on licensing replicas.
Dang Brent, you are going to make me download my first CTP ever. Nothing new though since you are always putting out great stuff that makes me want to dig deeper into SQL Management.
Jason – wow, thanks, man! Have fun with the CTP.
Everybody else – thanks for the compliments!
Thanks for all the great insight. This will be my first experience with a new release of SQL Server and you have managed to make me overly giddy for this release, of which is usually reserved for the release of a new album by random favorite band!
Great article Brent. Was AlwaysOn Availability Groups available in CTP1, or did you pull and all-nighter to write this? 🙂
James – thanks! No, this wasn’t available in CTP1, and no, I didn’t pull an all-nighter, heh. Microsoft was gracious to give me an advance meeting with the product team under NDA, but I wasn’t allowed to talk about it until CTP3 came out through official channels.
Very nice article, Win 2008 R2 core support is crutial, I look forward to try that on hyperv, thanks…
Just to avoid confusion, availibility groups was available in CTP1, though limited to 1 replica, dashboard and multiple replicas works now in CTP3…
Denali has great feature:)
OneBox Denali Hyperv setup…http://sqlfeatures.com/?p=10
Prakash – Availability Groups were really only in CTP1 by name only. You couldn’t do any of the cool things you can do in CTP3.
Brent, one point of clarification, I think this is all dependent on static IP’s, correct?
Dale – I wouldn’t use DHCP for a production server, so I wouldn’t know. To me, that’s kinda like saying its dependent on electric power and functioning network switches of 1Gb or higher. 😉
By default, SQL Server does not accept remote connections. This is because end users are the root of all our problems.
Thanks for the excellent write up Brent. I set this all up in my virtual environment and aside from a few issues with Windows firewall it worked like a charm.
One question I did have was how to query the replicas that allow read-intent connections using SSMS. I was able to do it using sqlcmd and the -K parameter but could not get SSMS to work. I tried specifying additional connection parameters when opening a query (;ApplicationIntent=ReadOnly and some variations of this) but that did not work. Any suggestions?
Darcy – not off the top of my head, unfortunately. I’m onsite at clients this week so I’ll have limited ability to help with troubleshooting.
Thank you so much for such a detailed article. I’m following along with it right now. I had one question for you. You state that we shouldn’t give sysadmin rights to all the service accounts when configuring security for cross-instance connectivity.
What is the correct way to do this in production?
Eugene – great question, but it’s beyond what I can hit here in a blog comment. You want to follow the principle of least privilege: give each account the bare minimum it needs to do its job, and nothing more. In that case, I’d want to give each service account enough permissions to connect to the other endpoints.
Possibly then, the good old grant connect on endpoint : Always on to [joe\bloggs]
Brent , could you please clarify which role would be good to have enough permissions to connect to the other endpoints ?
Ellen – you know, I have absolutely no idea. I just use the service accounts on each SQL Server.
Great article!!!Could you please let me know the transafer of data is based on transaction log or any other technique between the groups?
SQLZealot – thanks, glad you liked it! Yep, the transfer is based on the transaction log.
Thank you this is a great article. BOL is a mine of information but it lacks the human touch that this article brings to the table, which makes the subject much easier to understand.
Do you know which versions of Denali will include the availability groups feature?
Thanks Rob! Unfortunately no, Microsoft hasn’t said yet which editions will include Availability Groups.
Excellent post! Just a thought – wouldn’t it be better to video capture the above since you are running them on VMs and post a smaller blog with a link on youtube.com?
RS – yep, sometimes I do video posts, and sometimes I do pictures. I like to mix things up. 😀
Could you post an article on how one should go by identifying performance bottlenecks in a particular order? This performance issue identification could be with the engine, SQL queries, T-SQL code, Index, and other resources.
Thanks in advance!
RS – actually, that’s way beyond what we could explain in a post, so we wrote an entire book about it:
Professional SQL Server 2008 Internals and Troubleshooting shows you how to use free tools to identify performance bottlenecks and how to solve ’em.
Thank you for the detailed write up, very useful. How do we handle the SQL logins when we failover, as they become orphaned users. I tried contained database users, but that doesn’t work well, my JDBC connections throwing up errors with contained database users with local password.
I was able to figure out the issue. Earlier, while was I running “sp_migrate_user_to_contained” procedure to migrate the SQL logins to contained users I chose not to disable the logins, but now once I disable the logins, it works now, the application logins now using the database user and its password.
As always very helpful Brent. I have one (two really) questions? 1. Can system databases be part of an availability group? 2. If not, then does one have to do SSIS wizardry to keep the system databases sycned with the replicas so that when the primary server has a meltdown, the other server keeps on chugging along? In desperate need of some illumination.
This is a great question. If I can chime in– availability groups are for *user* databases. You may be interested in two things:
1) The new partially contained databases feature helps resolve some of the issues of syncing system databases: http://msdn.microsoft.com/en-us/library/ff929071(v=SQL.110).aspx. Also, “Management of Logins and Jobs for the Databases of an Availability Group (SQL Server)” is at http://msdn.microsoft.com/en-us/library/hh270282(v=sql.110).aspx
2) Using availability groups in combination with failover clustering proper. These two can mix, so you can still have a clustered instance fail over (with system and user databases) to an alternate node, just like a normal cluster. More details are here: http://msdn.microsoft.com/en-us/library/ff929171(v=sql.110).aspx
So with regards to taking backups on the secondary replicas, its a copy_only backup. This traditionally we’d only use for adhoc purposes so that the sequence is not interrupted.
Aside from not being able to do differential backups -since the differential map is not updated – would you see any other drawbacks to this approach?
Magnificent article on Denali AlwaysOn. Just a quick question on setting up a cluster since we cannot use this feature if our node is not part of the cluster. Do we need a shared storage for the witness disk for the cluster to work? How about the heartbeat do we need this also?
Rafael – nope! There’s new quorum options available.
Thanks for the info Brent. Looking forward to more article on Denali.
Can I configure always on AG group with San replication on SQL 2014.Can I
if yes as alwayson does not helps in data file corruption.
How many servers I need to involve.
Mukki – for questions unrelated to the blog post, your best bet is a Q&A site like http://dba.stackexchange.com.
Great post. Thanks
Is it possible to Load Balance between 2 asynchronous 2012 servers in the same data center for high performance??
Thank you very much!!
Yep, that’s feasible to improve read (but not write) performance with Availability Groups.
so If I have 3 SQL servers 1 primary Read\Write and 2 read only all in the same group using a Virtual Network name will sql server auto detect read and writes and load balance the reads to the 2 read only servers??
Thank you very much for your help!!!
Kender – well, it’s not quite as easy as automatically detecting reads and writes. You have to declare your intent in your connection string if you want SQL to automatically put reads on some servers and writes on another. That’s a little beyond the scope of this post, though – I’ll write about that at some point in the future. In the meantime, your best bet is to pick up SQL 2012 and start reading the documentation on Availability Groups.
Im asking all these because we dont have access to modified the client app so all logic failover and LB must be done at the server level!! 🙁
Hey Brent thank you very much for your help.
I did my homework by reading a lot about it before posting here and was very confuse but now I know the answer.
Thank you much again you were very very help full!!!
Quick question for you on this Brent. Say you had Transactional Replication from Prod to Reporting just for that purpose – reporting. If you weren’t concerned about the latency of this solution, would this be a viable alternative to replication? What could be a potential drawback of this? Would you have the ability to create anything on the secondary DB or is it a true read-only copy?
Andrew – That’s not a quick question, that’s three questions, hahaha. But yes, it’s an alternative, and you can’t create anything. It’s a read-only copy of the entire database. Transactional replication lets you get a subset of the database.
OK, so I snuck a few in for good measure. Thanks for the info Brent.
If you have time, I have some questions. If not, I appreciate all the info you have shared already. Totally on point.
Since you have servers in different locations, they must be on different subnets. How do you choose what IP to pick for the access point? My concern is what happens if Chicago goes down, and the access point is there, will Portland be accessible?
Can the primary be a Cluster?
Sure, glad we could help.
You can have cluster members in different IP address subnets now. You don’t have to pick an IP – you can have one IP in each subnet. You use an “OR” dependency for IPs, so as long as a node brings up either of the IPs for the cluster’s virtual name, you can go live.
Yes, the primary one can be a shared disk failover cluster, too. In that case, automatic failover is done inside that shared disk cluster, not out to other nodes. If you want to fail over to other nodes, that’s a manual process.
Is there a DMV that exposes whether a node in an Availability Group is the primary? I ask for the purpose of adding a step to my jobs that determines whether to proceed that allows for all jobs to continue in event of a failover.
Great Article Brent,
So is really similar to Database Availability Groups in Exchange 2010. Virtualizing at the application layer as i run both my Exchange Servers in a DAG, on separate data stores and i love it. Like you said, SAN’s do/can fail! As you portray setup looks very familiar, i was just wondering if that was the concept that tried to bring to SQL 2012.
What’s the difference between this and SQL Clustering?
Lastly, can i run a beefy dedicated server for my active SQL Always On Server and then run a lower end Virtual Server as my passive box. I’d dedicate a NIC for replication along with another for SQL Traffic. I just want more or less a way to failover to a less beefy server to handle some of the SQL I/O if something did happen to the primary, does that make sense?
Mitch – thanks. The difference between this and failover clustering is that failover clustering requires shared storage. It may be connected via fiber, iSCSI, or even just a plain old NAS.
Yes, you can run asymmetrically sized hardware. I’m in the midst of deploying a setup just like that for a client – they’ve got a beefy failover cluster for production, and then much smaller (although not virtual) machines in DR.
thanks for the quick response!
Do you think that people will adopt this technology more times than not and go away from the SQL Clustering? One advantage with SQL Clustering is assuming you can do some sort of load balancing, would that be correct? Otherwise this Always On technology would be the way to go, agree?
Well, you’ve got a lot of interesting questions in there and it’s beyond what I can address in the comments. I’d recommend picking up a book on SQL Clustering like Allan Hirt’s excellent one, and that’ll help explain the pros and cons of the different high availability methods. There’s not a 2012 version out yet, but the 2008 one will help answer the questions you’re mentioning here.
Great and detailed walk-through. One question for you though, which I have yet to get to a clear and dry cut answer. What version of SQL 2012 should we use for this? I have seen MS’s PDF licensing guide which states that DAG is supported in Standard, but AlwaysOn is an Enterprise feature. In another article they state certain features of AlwaysOn are available in Standard, but they leave it at that. What we are looking for is a 2 node DAG cluster, really acting as an active/passive(DR) scenario. We were thinking we can utilize Standard for this. Any thoughts?
Bryant – AlwaysOn is multiple features. AlwaysOn Failover Clustering (the traditional shared drives approach) is supported in Standard Edition with 2-node clusters. AlwaysOn Availability Groups is Enterprise Edition only, and for licensing, there’s no such thing as a passive node in Availability Groups.
Regarding the last question about licensing: I saw it mentioned somewhere that Microsoft may not require additional licensing fees if a secondary replica is not “actively” used, ie. is inaccessible to application users, but is still used for admin tasks, such as backups.
Marios – do you have a link for that? We can’t exactly go by “I saw it mentioned somewhere” when the auditors come knocking. 😉
In the “Microsoft SQL Server 2012 Licensing Guide” it is stated that:
“The secondary server used for failover support does not need to be separately licensed for SQL Server as
long as it is truly passive. If it is serving data, such as reports to clients running active SQL Server workloads,
or performing any “work” such as additional backups being made from secondary servers, then it must be
licensed for SQL Server.”
So my earlier statement was indeed not accurate. 🙂
If backups are being taken on the secondary replica, that still counts as being actively used and should be licensed.
(If I am reading the guide right).
On Friday I had no idea how to setup a Windows cluster let alone how to run SQL on top of it. Its now Sunday and I have a four node, two data center AlwaysOn Availability Group setup. Your instructions were clear, easy to follow and much appreciated.
My boss was going to pay a consultant 2K to setup a two node proof of concept cluster. Tomorrow I get to tell him to save his money.
So we had the Fusion IO vendors in to discuss how we should set up their storage offerings in our new servers. They said thast the Always On High Availability reduces the requirements of redundant disks at the server level and said that we could increase our capacity by configuring the storage in a RAID 0 on each server in the failover cluster, and count on the cluster for redundancy.
This is contrary to everything I feel! Personally, I think this would increase the chance of failure and make a failure more complicated (seamless disk failure in a RAID 1 or 10 vs. disk failure causeing a cluster failover).
So my question is – how seamless is a failover using the failover clustering? Is it good enough that we can rely on this for redundancy? Will this effectively eliminate the need to have disk parity?
I agree with you completely. Lets assume AlwaysOn works perfectly with no down time or data loss and your users don’t know anything happened. When the smoke clears you have a down cluster server. Depending on how you design your cluster you may be a single drive failure or WAN link blip away from a complete SQL shutdown.
Also, I have seen a lot of RAID sets go into a degraded state because of a disk going offline. Often the RAID can be restored by simply reseating the drive. I would hate to have to repartition and restore every time this happens.
Brent, to be sure, we “must” or “can” set this up using stand-alone installations?
Tony – you CAN set it up with standalone installations, or you can use failover clusters as well.
Great article, very easy to follow.
I do have a question.
We are thinking about setting up a main SQL cluster (2 node) and a secondary server. The databases in the main cluster would be in an availability group with the secondary server.
The idea is that the cluster takes care of failures on the SQL box, and the secondary server ensures data redundancy. A two phase failover, if you will.
My question is, is this a valid setup?
And also, would you recommend using synchronous mirroring in this scenario?
Michel – Yes, that’s a valid setup. If your business needs disaster recovery – like if the datacenter catches fire – then you’ll also want a replica offsite or log shipping.
Synchronous mirroring can add latency to transactions if the storage isn’t fast enough. If your business demands zero data loss, then you can equip the servers with solid state drives to reduce some of the latency. Otherwise, if the business doesn’t need zero loss, I’d avoid synchronous mirroring.
Can you please show us the static IP’s you’ve assigned to your machines(only if it’s okay with you) in your lab for setting up Always ON Availability Grps?
Sreekanth – I use IPs in the 192.168.37.x range, but I’m not sure if that’s what you’re asking. What’s the problem you’re trying to solve, and that might help you get a better answer?
I’ve just found that with my 2 node DAG setup if the primary server’s SQL service is stopped (just default instance) the secondary server does not come up – just get continuous ‘Resolving’ on secondary server. Any idea what is going on there? In the same setup, synchronous mirrored databases automatically failover without any problems!
James – well, this is a little beyond what I can troubleshoot in blog comments, but for starters, I’d look at how quorum is set up and look in the cluster logs.
James, what version are you on? In CTP3 failover can be hit and miss but I haven’t seen a blip in this regard in RTM
James, I assume you are using a sync replica with auto-failover?
This white paper discusses how to set up cluster quorum for various cluster setups:
I think in your case you need to add a file-share witness – outside the cluster – with each own quorum vote for a total of 3 votes. You then need to set up the quorum as node-and-file-share majority.
Hi Brent, ive been searching around but cant seem to get the answer to this query:
Using your example,
If my reading is correct, the Availability Group Listener service is hosted in SQL2012PROD1.
In the event where there is a failover from SQL2012PROD1 to SQL2012PROD2, the Availability Group Listener will be failover to SQL2012PROD2 as well yes? So that now all client connectivity will be managed by the Availability Group Listener on SQL2012PROD2.
Apologies if this seem very fundamental, as AlwaysOn is pretty new where im at.
Thanks much in advance. 🙂
Colin – it depends on whether a failover cluster is involved, or just an Availability Group. Your best bet is to set up a lab with this and play with it yourself – the concepts will become more clear then.
Colin thats correct – the Listener is a client access point held within the WSFC that you have to create to make ‘AlwaysOn’ possible.
So when SQL2012PROD1 fails to SQL2012PROD2, the Windows cluster – and by deduction all services and resources beneath it – move with it…
The listener is not necessary for AlwaysOn Availability Groups (AGs) to work. You can set up AGs and skip the listener creation. The listener is useful when you want to seamlessly get apps to connect to the new primary replica on failover and it adds some nice functionality having to do with ReadOnly routing (getting reporting apps to connect to a secondary replica with no explicit mention of the sec. replica server name).
One can address the issue of app failover through the use of a DNS alias for the server name of the primary replica and getting apps to connect to the DNS alias and not the actual server name. Then when failover occurs, simply switch the DNS alias to point to the new primary-replica server name. It is not as seamless as using a listener, to be sure, but it works as well.
I just don’t want people to mistakenly think that the listener is “needed” for AGs to work.
It is not.
Then when failover occurs, simply switch the DNS alias to point to the new primary-replica server name
Q: Do we need to manually point new primary to DNS alias ? or will there be any automatic process ?
Say we’re working off the assumption that we will all now be building several local instances per node instead of clustered instances, we obviously need to consider the 1433 port situation. Rather than having to listen on 1434, 1435 and having to adjust all the configs down the tree, is it feasible to present a pool of IPAddresses to a nic and force the local instances to each listen on port 1433, but with a different IP Address?
Hmm – lemme ask a question first. Why several instances per node? You can end up with a lot of CPU/storage/network contention, patching problems, security challenges, and there’s not much gain.
Sure thing; we are running on a consolidated SQL clustered environment at the moment which probably works out at 4 instances per node. We would want to implement the same line of thinking for SQL 2012 but without the SQL clustering.
I would have thought this line of thinking would be most people’s approach, there aren’t many implementations of one sql instance per physical machine these days unless its a TB db surely?
Hi, im wondering what kind of quorom configuration i shall setup in a 3 server config. Two nodes in Always ON active/active and the third as DR Asynk, thus not eligable to vote, making it two servers in failover manager, i really dont want to add a shared storage in to the mix just for witness disk.
The disk your only option short of adding in an extra node I’m afraid, a file share witness has proven to be an unreliable method.
I know this article is almost two years old now, but it was perhaps the most succinct article I found on setting up Always On. Just used it to setup a lab system between a pair of Server 2012 / SQL 2012 systems with no big issues.
Thank Mr Ozar!
Hi Brent, awesome article! As always i dare to say 🙂
I’m wondering am i missing it or there is no way to create email alerts when automated failover occurs, or if there is a problem. In general i cant seem to find anything related to alerts and AAG.
Do you have any advice on how to setup notificatiions for AAG events?
Thanks in advance,
Thanks, glad you like it! Personally, I like having a stored procedure run when SQL Server starts, and that stored proc uses Database Mail to send an alert to the operators with basic diagnostic info (virtual name, hardware name, hardware qty, etc). That way I know even if the service restarts on the same node.
Brent-This is such a nice and as simple as possible article as always! just gonna set up one now in lab … Will look forward a read on the other New features like performance wise. Thanks!
Any ideas on how to test out automatic page repair?
Consider a scenario of 2 replicas: a primary and an *async* secondary.
If the primary replica fails and we manually failover to the secondary, there will be some data loss.
Would it be possible to restore transaction-log backups on the dbs of the secondary replica (which will now be the new primary) in an effort to minimize the data loss?
Or are the dbs going to be in such a state that will prevent restoring log backups?
We are reluctant to implement replicas in sync-commit mode in our environment, as that will pose a risk to the performance of our applications. At the same time, we are looking for options to reduce the data loss we would suffer in the case of primary-replica failures.
Thank you for any input,
Marios – that’s not technically supported. It might be possible – I haven’t tried – but since it’s not supported, it’s not something that I’d see widely adopted in production environments. You’re free to try it as an experiment though.
Thank you for responding. I had a chance to look into this a bit more, and I think it can work, if, before failing over, we drop the avail group on the sec replica. At that moment the databases should be in a state that allows restoring log backups to them prior to recovering them. Of course, once the avail group is dropped, it will need to be recreated from scratch, and that can be costly in terms of time/effort for large databases. I haven’t tested this scenario yet though, so it is still a theory at this point.
Hey Brent, i saw your guide and it was very useful now that SQL 2012 is RTM i have a question for you, how do you manage the the SQL availability group failover as windows cluster failover, i cant manage to do that, is there a way to acomplish that??
Using Windows Server 2012 STD and SQL 2012 EE SP1
Javier – thanks, glad you liked it. I’m not sure what you mean in the question – can you rephrase it?
Great work. Have setup my first SQL Cluster without having to leave it to DBAs. Great for increasing my knowledge of a SharePoint backend.
– previously I would use a SQL Alias – can I do this still and this will be used in AG.
– when using multiple instances, how does this relate to the OS Cluster Failover Virtual IP and name?
I’m not sure if my applications should be using the main OS Virtual name or I should be creating one for each SQL Instance.
I also notice that the Windows Cluster Manager still thinks a host is vaild even though its shutting down and installing updates – it responds to ping.
Hi, Paul. You can still use an alias – just point it to the listener.
When using multiple failover cluster instances, each one gets its own virtual IP and name, so it’s unrelated to the cluster access point’s IP and name. In addition, each *listener* gets its own virtual IP and name too. You’ll want to connect to the Availability Group through the listener’s name. It’s possible that the listener will fail over somewhere else (due to a single corrupt database, for example) while the instance supporting it remains online.
When installing updates, you’ll want to be proactive – move any resources off that node first before installing updates.
Hope that helps!
Thanks, thought I should be using the listener to differentiate.
Wonder if you could help with my failover as I am pulling my hair out.
I have 2 SQL servers and used the synchronos option.
Used a File Share for Quorum as do not want to use another server.
If I lose the Primary – it fails over. Does not fail back unless downing secondary. That’s fine.
If I lose the Secondary – nothing happens as expected. Fine.
If I lose the fileshare – the whole show stops. It also prefers that HA2 takes control even though HA1 has just as much in ‘votes’.
I suspect this is related to some Quorum setting. Have seen references to removing votes but cannot figure out or what is best.
Hi Brent and Paul,
I am stuck at the same step! I configured file share witness in nfs but everyday midnight due to some filer process which I am trying to identify discussing with storage team active OS cluster node goes for a reboot!
It would be great if you were able to come up with the best configuration which will not restart the cluster node!
WER dump is as below,
EventType=Failover clustering resource deadlock
Sig.Value=File Share Witness
Sig.Value=File Share Witness
FriendlyEventName=Failover clustering resource deadlock
ConsentKey=Failover clustering Resource Host Monitor
AppName=Failover Cluster Resource Host Subsystem
ReportDescription=Failover clustering resource deadlock
Hi Sreejith. If your storage reboots every day, then it’s not a good volume to use as quorum.
Sorry I didn’t elaborate on the behavior, the nfs does go down every night. Its just timeout from cluster we are trying to identify if its a clean up process/script killing active sessions.
If witness goes down I would expect the cluster resource to go offline and don’t disturb the AG or nodes. Yes we will lose continuous fail over and for that I was planning to configure a custom monitoring through our monitoring solution Idera checking => select member_state from sys.dm_hadr_cluster_members where member_name = ‘File Share Witness’ which will alert us and we will manually fix it.
We are still in testing phase and trying the best configuration and we will find root cause of the time-out; but file share witness time-out bouncing the cluster makes me uncomfortable!
Sreejith – agreed, losing your file share and bouncing the cluster would make me pretty uncomfortable too. Good luck with that!
Do you use Windows 2008 R2 or below? I think your Quorum configuration is not right, how many quorum members (votes) you have, do you loose majority when the Fileshare witness goes down, if that is the case, the cluster will go down? If you use Windows 2012, this not an issue any more, with dynamic quorum configuration.
Below is the Configuration,
Windows Server 2012 R2 (VMs) – two node, file share witness (nfs)
Sql Server 2012 SP2
Looks like DFS is the issue! We removed the DFS name and gave the actual file share name and it didn’t record any error pertaining to file share witness time-out and no reboot yesterday night. When we had the errors till a day before it didn’t explicitly record any DFS related errors but just the file share error.
We have another AlwaysOn AG which is using server file share and no issues, it’s the ditto configuration in terms of hardware and software. When we pointed that to nfs file share same issue same window and getting rebooted! the cluster validation shows everything fine for File Share witness Quorum configuration. So doesn’t looks like cluster configuration issue.
Sreejith – correct, DFS isn’t supported as a file share quorum. Probably want to check out the documentation as you’re setting up clusters.
I am setting up 2-node, non-shared storage Availabilty Group(s) with local MSDTC’s.
In our non-production environment we do a fair number of database cross-loads from production.
We currently have a shared storage environment now, so we only “need” to restore once.
Would the non-shared storage mean we would need to restore to both nodes otherwise we may incur
heavt ‘syncing’ between nodes with AG ?
Mike – databases that are in an Availability Group can’t be restored. You’ll need to pull them out of the Availability Group, do the restores, and add them back into the Availability Group.
Back the DB up on the source server.
Restore it to the primary node DB which has to be removed from AG.
Back up the primary node DB(which was just restored) along with transaction logs.
Restore the second backup to any secondary node(s) DB(s) which have also been removed from AG, along with transaction logs in norecovery mode.
Add the primary node DB and secondary node DB back to AG.
In prod where there is no restores, no issue.
In other environments OUCH!
First of all sorry for my poor English, and thanks in advance for your help.
I’m setting up an SQL Server 2012 AlwaysOn in my work, but the system that access to the BD has a lot of querys with “with(nolock)” that are not controled by StoredProcedures in the BD.
This mean that inside the System’s code are a lot of Querys with(nolock) hint, How can affect that kind of querys in the System behaviour if the Query is executed by one of my secondary Async replica?.
If anyone can help me? please.
Jose – every situation really is different, and you’ll need to do performance testing in order to find out what works best for you.
I’m looking for a way to measure the over head of transaction latency that is added when a database in an AG group is synchronous mode compared to asynchromous mode.
Dustin – sure, great question. If you use a load testing tool, you can gauge transaction throughput (like batch requests per second) and latency through your load testing tool. If you’re not using a load testing tool, you can’t really get reliable overhead numbers since no two real-life loads from time to time are exactly the same.
If I want to upgrade 2005 mirroring with mesh set up like below,
DB1 on Server 1(principal) —-> server 2(Mirror)
DB2 on Server 3(Principal) —-> server 1(Mirror)
how would I set that up in SQL 2012 AG?
I believe you can only set up AG from primary server not on secondary server.
Hi, Ki. You can indeed set that up the same way you’d set it up in mirroring. A server can be a primary for one AG, but a secondary for a different AG.
Last time I tired I was not able to create AG from secondary server databases. Maybe I missed something.
OK, may want to go through it again and give it another shot. You have to set it up from the primary for that database.
Hi Brent – Big fan. Attended your session in PASS11. Wonderful article as always.
A question that has been bugging me for a while… Here’s the situation. Currently we have a read-only server for reporting that is subscribing to replication from two production servers (let’s say Prod1 and Prod2). Using AlwaysOn, we would have to set up one of the production server (Prod1) as primary and use a replica as read-only. What is the best strategy from here to keep our current setup? Is it possible to subscribe to replication to Prod2 from the read-only replica?
Hi, Eric. Glad you liked the session!
Unfortunately, advice on a system like this is beyond something I can do in a blog comment. Like you’ve figured out by now, there’s no quick, sure fire solutions when multiple technologies are involved. We offer a 3-day SQL Server Critical Care engagement to get answers like this. If you’d like to schedule one, click Contact at the top of the blog, or email us at email@example.com.
Thanks for the quick reply. It would really scratch a big itch if you can (please?) provide a yes or no answer to my last question though 🙂
Well, the best place to start for questions like that is Books Online. Here’s the page on using replication in combination with AlwaysOn Availability Groups:
If you’ve got questions that our blog posts and Microsoft’s Books Online can’t answer, though, feel free to contact us to set up a consulting engagement where we can give you personal advice.
Thank you Brent.
I’ve tested this extensively.
Both your production servers would become publishers of the publication.
In failover the distributer contains a table called MSredirected_publishers, where you are able to specify the listener name and port no. This means that your subscriber will seamlessly be able to redirect its queries to the ‘new’ primary.
Pls note, you have to use a remote distributer to make this work, since distribution databases are not supported by ‘Always On’, and in any case a unique distribution database to each instance would be missing the point.
Does anyone know what the license implications are here? We have a ‘passive’ secondary to keep license cost down.
I want to set everything up to get Transactional Replication to work if we have a failover I need both Primary and Secondary set up as Publishers and a downstream Distributor.
Does this affect the licensing of the Secondary?
John – as long as the replication is going from the primary to some other machine, and doesn’t involve the secondary, you’re fine.
Thanks a lot Brent for the document !!
I am working as a SAP Administrator, We are planning to use SQL 2012 as our database and Windows 2012 as an operating system for SAP. And we want to use the Always-On availability groups (with Async mode local replica) setup on 2-node Windows failover cluster. I have a question here, request you to help me with the answer.
Q: How do we need to install the server for utilizing Always-on availability group technology? (I mean do we need to install the SQL server manually on two seperate nodes with same drive letters and with same number of ndf & ldf files on both the nodes)
Thank you in advance.
Chandan – this is a little beyond what I can help you with quickly in a blog comment. Your best bet is to read the documentation on this one.
I need a HA in file share witness.
Do you know if i can do a secondary “file share witness” like a “Exchange Dag”?
Alexandre – sure, how about a clustered file share? Use the same clustering that you probably already use for a highly available company file share.
Thanks Brent for the write up! I find it so comprehensive and to the point. I do share your concern that this feature is only available in enterprise edition.
very concise and straight to the point.
I have a similar infrastructure to the one you described in the article, only that at my DR site, the servers will not be on the same domain as the servers on production. I will communicate with those servers via a management LAN and then the site becomes active on fail-over with exactly the same IP as on Live.
In essence SQL2012PROD1 & SQL2012PROD2 are on the same domain, while SQL2012DR1 is at DR and passive (not on the same domain).
How do you advise that I push data to DR and still be able to smile when disaster strikes?
Secondly, I have a sprawl of dbservers, I instance per server. Does this mean I have to create 1 DAG per server?
TOsin – this is a little beyond the architecture planning we can do in a blog comment. If you’d like personalized consulting help, feel free to click Contact at the top of this page and we can talk through what a consulting engagement looks like.
The databases we have are all on VMs. They donot have a Windows Cluster and do not plan on having a Windows cluster in the near future. Is there anyway I can implement Always ON in this situation ?
No. But just to be clear – we’re talking about Windows clusters, which you can actually do inside VMs. You don’t have to have shared drives.
Brilliant, Brent. Thanks for posting this. I’m a VMware and NetApp dude – definitely not a SQL dude – and this walk-through helped me a get a 3-node AAG up and running in no time. I’m using it to mimic a client environment where we’re implementing a NetApp SnapManager for SQL backup scheme. I was running into trouble not having worked with AAGs before and this helped tremendously. Thanks again!
All the best,
Awesome.. What a tutorial.. clearly explained each and every point for configuring AlwaysOn feature… Thanks Brent.. 🙂
Gnana – you’re welcome, glad you liked it!
I have a situation similar to this; I have two offices in two locations (15km apart). But both the offices are connected to a fiber backbone (dedicated). What I need to have same application on both sides which uses an sql server. Basically I want to have SQL Server installed on a server in Location A and B. If I do any data transaction (add, delete, edit) in one location, I need to be able see (replicate) over to the other location also. This should happen the both ways. Is this the scenario you have here between SQL2012PROD1 and SQL2012PROD2 (Primary and Secondary) or else Only Primary is usable for the transaction and the secondary is strictly reserved for the fail-over operations?. In other words In your scenario above; if I do any transaction on a DB of any server(SQL2012PROD1 or SQL2012PROD2) will that replicate on the other server (SQL2012PROD1 or SQL2012PROD2) both ways? Please clarify. And really appreciate the lesson
Buddhika – Hi. You’re looking for a different set of technologies, multi-master replication: https://www.brentozar.com/archive/2013/07/many-masters-one-truth/
I have two sites one mile apart connected with fiber over a 4gb backbone. I’m currently running SQL 2005 with synchronous mirroring with one node at site A and the other and the witness at site B. This has worked great for years. Now its time to upgrade. In the new environment we’ll be running vmWare over NetApp at each site with SQL 2012 as our database. Our dba’s are proposing replacing SQL mirroring with availability groups using three synchronous replica’s: 2 at site B and one at site A.
I am concerned about transaction performance and about failover performance and that the NetApp at site B will be a single point of failure. Also, do we really need three synchronous replica’s? I could see adding an asynch replica down the road. Your thoughts would be appreciated.
Hi, BR. I don’t actually know if you need 3 synchronous replicas or how performance will work out for you – that’s something that you’ll need to build out and test to find out. Of course, I’d give that same recommendation no matter what technology you use – you should build out a lab for it and test it first.
I was able to find a way around the cluster setup across datacenters, but I am running active-passove which runs beautifully, however, I need to know if you can create listeners on each network, one on prod and another at DR so apps can connect to the DAG from either location, regardless of where DAG primary is running from at any time.
Tosin – the term DAG is an Exchange term, and doesn’t really apply here to AlwaysOn Availability Groups. In AlwaysOn AGs, the listener answers from anywhere just like a web server takes requests from anywhere.
Just wondering if you could of used SQL2012PROD2 as the reporting server, can you have a read only copy of the data in synchronous mode?
You can indeed have a read-only copy in synchronous mode, but be aware that when you put that server under load (like heavy reporting queries), it will slow down transactions on the primary. That’s not usually a good idea.
I have the Always on Setup on my infrastructure, 3 nodes per AG, 2 in synchronous, 1 in async.
Everything looks fine, I try to connect to the listener from SSMS, it works like a charm. I have configured automatic failover also,
However, I try to simulate a failure on the primary node by either by stopping the SQL Service or removing the network card. (I’m virtualized)
I suddenly see that I cannot connect to the listener again.which means there is no failover.
Could I be missing something here by way of config or otherwise please?
Tosin – unfortunately troubleshooting this is way beyond the scope of a blog comment. You’ll need to open up the documentation and start digging in. For faster help, contact Microsoft support.
Hi Brent, great write up, have to admit a newbie at this cluster thing, we currently have two separate SQL clusters on different subnets each with their own SAN storage and copy of their database, I am looking to cluster the two clusters, to give one front online sql instance that replicate the same data to each other, so with two working individual clusters that fail over correctly i am unable to join the databases together in SMSS under HA. Where am i going wrong? or can point me in the right direction.
Alex – if the clusters already exist, and they’re not in the same Windows cluster, you can’t. Members of an AlwaysOn Availability Group have to belong to the same Windows cluster.
Great article Brent! I have used this as a go to guide for all of my AlwaysOn configuration and setup. While a shared SAN is not required for SQL 2012 AlwaysOn, would you have a preference of one of the two possible configurations suggested here? Uptime requirement is 99.99%.
Scenario 1: 3 SQL Servers, each with it’s own dedicated storage. Node 1 and Node 2 are at the main site and Node 3 is at DR location. Synchronous data between Node 1 and Node 2 and Asynchronous data transfer at Node 3.
Scenario 2: 3 SQL Servers – Node 1 and Node 2 are at the main site and Node 3 is at DR location. Node 1 and Node 2 are on a shared SAN, a.k.a cluster and Node 3 has its own dedicated storage. Synchronous data between Node 1 and Node 2 and Asynchronous data transfer at Node 3.
Scenario 1 gives you auto failover to Node 2 and also does not have the issue of SAN failure.
Scenario 2 gives you auto failover to Node 2 but in the event of a SAN failure, there would be manual failover to Node 3 (some possible downtime).
Vishal – unfortunately architecture design is a little outside of the scope of a blog comment. If you’d like architecture help, feel free to email us at firstname.lastname@example.org for consulting. Thanks!
Makes sense to contact for consulting hours 🙂
I am taking on a big Always on setup with always on setup between 2 three node cluster with two SQL instances each.
For the setup of this do we need to do something different. I was reading somewhere that we will have to add the SQL node on the Primary. your reply will be really appreciated.
Ravi – they all need to be in the same Windows cluster.
We’re using 2012 Availability Group Node Failover Primary>Secondary and password data encryption (not TDE at the database level) on Master Key Encryption By Service Master Key. Secondaries are created via restore from primary node backups. On node failover from primary to secondary we encounter a an error where the master key cannot be opened automatically on the fly, requiring manual intervention to open the master key. So far Google is not our friend. Does anyone know why ?
John – your best bet on a complex issue like this is to post the details on http://DBA.StackExchange.com, or if it’s happening in production, call Microsoft for support.
Thanks you so much for the detailed explanation of AlwasOn. It really helped understand this new technology. I am trying to implement similar configuration, but running into some problems. I am planning to create FCI + HA solution in the same datacenter. I’d really appreciate if you can provide some guidance.
Here is the current configuration (All nodes are in same datacenter):
SQL1 – Part of FCI instance SQLFCI1 (S drive for data, L drive for log, Q drive for quorum)
SQL2 – Part of FCI instance SQLFCI1 (S drive for data, L drive for log, Q drive for quorum)
The above is a standard SQL FCI cluster and it works great.
SQL3 – Standalone box for HA (Separate S drive for data, separate L drive for log, No Q drive)
I added SQL3 to WINCluster1 and setup Node Majority for quorum (Removed Q drive disk witness from quorum) as I have odd number of nodes.
I removed SQL3 from possible owner for SQLFCI1.
Then, I setup AlwaysOn Availability Group AG1, with SQLFCI1 as Primary replica and SQL3 as secondary replica. The setup is Asynchronous with Manual failover and readable secondary.
When I use SSMS to failover between SQLFCI1 and SQL3 (back and forth), it works fine.
When I use FCI under WinCluster1 to failover between SQL1 and SQL2 for SQLFCI1, it works fine.
But, when I shutdown SQL1, the SQLFCI1 fails over to SQL2, which is perfect, but SQL3 goes offline. I can remote into SQL3, click on drives, etc, but the cluster resources goes offline.
Same thing happens when my network engineer unplugs NIC cards, simulating NIC fail scenario, from SQL1. And both cases happens when SQL1 is online but SQL2 is down or NIC unplugged.
Here is the error message under event viewer:
The cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.
And another strange thing is, when I shutdown both SQL1 and SQL2, under WinCluster1, the current owner for SQLFCI1 says SQL3, even though there is nothing shared between SQL3 and SQL1, SQL2. SQL3 is just standalone box. It brings down the Windows cluster and nothing works. Once SQL1 and SQL2 come up, it starts working.
Hope this explains the current configuration and issue. Thank you so much sharing your knowledge. It helps a lot to DBAs like me who are just starting in the DBA world.
Have a wonderful day.
Hi Raj. Thanks for typing this out, but this is way beyond the scope of the free troubleshooting I can do in a blog comment. You’re welcome to bring us in for consulting help, but for free advice, you’re going to be best off posting the specifics over at http://dba.stackexchange.com.
Thanks for the prompt reply. I will post the question there. I will also talk to our management team regarding the consulting help.
Thanks and have a wonderful day.
Did you get solution or work around for the scenario you mentioned? We also want to setup the same and have issue if primary site cluster is down.
when i am creating and configure a AG the following error shown ??
the following folder locations do not exist on the server instance that hosts secondary replica
Sounds like the folders don’t exist on the secondary. Make sure your drive letters and paths are the same on all your servers.
ok can you Plz clarify how can i do this ?
i am installing the 2 SQL instance as standalone installation ” SQLServer1 on VM1 installed on Driver S and SQLServer2 on VM2 installed on Driver S “
We do have similar setup across data centers. What happen if cluster goes down? Cluster is offline. Still availability group is up and running?
Thanks for your help in advance!
Ashish – if the whole cluster goes down, the AG is not up and running. The AG is part of the cluster.
Thanks Brent. This is really a draw back for cluster across the data centers.
As so many others have commented, great post!
I have a brief design question for you.
If we have two nodes in the cluster, Node1 and Node2, and we create a SQL Availability Group (AG1) across both nodes with Node1 owning the primary replica and Node2 owning the secondary, can we create a second Availability Group (AG2) across the same two nodes of the cluster, but Node2 owning the primary replica of AG2, and Node1 owning the secondary replica of AG2?
So Node1 will be primary for AG1 and secondary for AG2, while Node2 will be secondary for AG1 and primary for AG2.
In addition to whether it is possible, and assuming that one server can handle the load of both Availability Groups, can both groups reside on the same node as the “primary” replica during a failover event?
Thanks for your comments!
As others have stated prior to me, great post! I have a hypothetical design question for you.
If we set up a two-node cluster for SQL Availability Groups. Is it possible to create two groups such that AG1 has the primary replica on Node1 and the secondary replica on Node2, while AG2 has its primary replica on Node2 and its secondary replica on Node1?
My follow-up question is: Assuming that there is enough resources for one server to handle the load, can two primary replicas (from two different Availability Groups) reside on the same node?
Finally, how bad of an idea is this? I look forward to your comments.
Hi! Yes, that’s possible and comes down to the specifics of your implementation.
Thanks for clear easy step notes on setting up Avaiability Group in SQL 2012.
Some guides I am reading say setting up endpoints are a requirement, however, you do not cover these in your post. Are they required for an AG to work properly?
Ashley – yep, and they’re automatically configured for you in these steps.
Brilliant piece. I have a question though.
Replicas / Secondaries are initialised through backup and restore. When dealing with Very Large Databases (size of about 3.5TB), is there some other way of initialising replicas without the normal backup and restore route?
Philip – thanks! Hmm, when you say some other way, I’m curious – how would you attempt to get 3.5TB of data from one place to another in a more efficient way than a backup and restore?
Nice article – just gave me a foot into the door of MSSQL! Thank you very much!
After installing the Servers as mentioned in the article I’ve tried to play around a bit.
My environment is running 2k12R2 with MSSQL2014 which is a bit different, but also very similar.
I have a smaller environment which contains 2 SQL Servers, a DC, a Administration PC (for status and blablabla) and a Win7 pro to simulate a normal computer connection.
When I now try to figure out what happens if the “facility manager” puts out a cable of SQL01 (my primary ofc) it automatically fails over to SQL02(Mr. Secondary/Replicate) (while connected from Win7). That’s awesome – I love that!
But why on earth my whole cluster gets shut down, when I fail in anyway on SQL02?
I don’t even see my Failovercluster in the Failovercluster-Manager on SQL01 – it’s just gone.
I have that problem since about .. hmm..two weeks. I’ve made a reset the whole scenario. Installed it again, tried stuff, different roles, different configurations; but it’s always the same.
How do I handle that problem? I’m going insane!
Kerim – unfortunately, doing one-on-one troubleshooting is beyond the scope of what we can do in a blog comment. Your best bet is to hit http://dba.stackexchange.com.
thanks for the fast reply!
Unfortunately I was to stupid to see that I need at least 3 DB Servers or 2 and a Quorumlistener. (tested both ways)
After I understood that everything works perfect! Thank you so much!
But Failover cluster works on quorum and if your Chicago center get down, it will not form the Cluster, as Quorum works on voting and half of the cluster should be up. So this configuration is incomplete. You need third server on different network as the complete cluster should be work on Node majority and file sharing quorum . Chicago and Poland data center should have even number of nodes + third server as file sharing to form a failover cluster.
Martin – so if you put the third server somewhere else, and your primary datacenter loses its network connection for just one millisecond, your entire primary data center will go down. Generally, I don’t see Internet connections quite that reliable.
Brent & Team,
Your site has been a savior to me, the wealth of information you provide the community is AWESOME!
I’m considering a move into a VM world in the process of upgrading to SQL Server 2014. With the changes in licensing at Microsoft we’re trying to hang on to all the performance we can in this upgrade. An approach I’m considering is to allocate an 8CPU 48GB ram VM as the Availability group primary replica. And only allocate a 4CPU 16GB ram VM as the Synchronous secondary replica. My concern of course, would be whether it could have any impact on performance on the primary server while its synchronizing synchronously to that lower CPU/memory machine. My hopes would be that the lower CPU and memory server would not have a bottleneck, since it’s just keeping up with the updates to the databases; this is assuming they both have very fast (separate)SANs attached for storage. And that our environment is at least 70% read 30% write, if not even more read. I realize that in the event of failing over to the secondary we will be in a degraded performance situation. But that would be ok, the key here is zero downtime and zero data loss, and the easy ability to patch the systems with no actual “complete” downtime during off business hours. And if it turns out that the primary VM’s going to be down for a significant amount of time (SAN crashed) then we could reallocate the CPUs and memory to the secondary server…accepting the fact that there will be an outage for the reboot.
Does this seem like a sound approach, or do you think it will have a noticeable impact on that primary servers performance?
Thanks again for the information you share,
Jeremiah – thanks, glad you like our site! Unfortunately, personalized architecture design is kinda beyond the scope of something we can do in a blog comment. Your best bet there would be to contact us (or another firm) for consulting advice. You don’t want to base a system costing hundreds of thousands of dollars based on a blog comment. 😉
If I am setting up an AG but would like to use different network path to connect for Heart beat and Availability Sync traffic how can I configure my AG that way.
We are attempting to copy this concept for our lab environment. We are moving forward with either AG and FCI or just AG on our network.
We have a SAN and expect to use local storage.
How do I go about having the SAME drive letters when there is a SAN involved and using local storage?
Chris – you can set your drive letters for a volume at the time you format them. You can make any volume be any letter, whether it’s local or on shared storage.
I was overthinking that… a lot! 🙂
Hello Mr. Brent. For the testing purpose can I setup Alwayson on a single server (test server) like mirroring? or it has to be 2 different servers?
Dan – it has to be different ones.
How we can setup AAG with SAN to SAN level replication?
Mustafa – very carefully.
When adding a secondary server, you state above to “Use the same paths across all of the SQL Server instances we’re using for AlwaysOn”…
Is this absolutely necessary? I have inherited an AAG that I don’t particularly like the setup of (which I wont go into). I was going to try to change it a little by incrementally adding replicas in (to my specifications), then decommissioning the old ones until only my new boxes are left. They will eventually all be using the dame paths. As I understand it, the system db’s (master, temp et al) dont actively sync anyway so I don’t see how it *could* be a problem, but i dont *know* 100% if it will. Anyone here added replicas with differing data paths and had serious issues?
Matt – when you add a file to the database, if all of the replicas don’t have the same paths, the replication will break until you go in and manually fix it.
That’s a shame. Out of interest, this 2 server AAG does actually have a couple of databases with different paths. They must have ran out of drive space in the past and put some DB’s on network shares, and the mdf/ldf path on the primary differs in the secondary (by UNC path). Yet replication occurs, dashboard shows healthy and failover works. is this an anomoly?
Matt – as I said, if all of the replicas don’t have the same paths, replication will break until you go in and fix it. Once you’ve fixed it, it works until you add another file. Read carefully. 😉
My windows cluster was created successfully. When add replica using the New Availability group wizard, I am getting “The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster (WSFC) node.” Any advise?
Daisy – all of the replicas have to be a member of the same Windows cluster.
I got the same error, and for me the issue was that the user I was connected as wasn’t admin. I somehow misread that in your post.
Another reason can be this: http://www.sqlpanda.com/2014/10/how-to-fix-the-specified-instance-of.html
To be clear I mean the user was not sysadmin on SQL Server. BTW. I was using SQL 2014
how we can mix SQL AAG with “San to San Replication”,
They’re not designed to be mixed together.
i have two SQL Servers with AAG in my main site and i want to use san to san replication to have a DR on my secondary site
how i can accomplish this scenario??
Those two technologies are not complimentary.
I have a 2-node Cluster with an AAG set up across the default instances on both nodes. I have a dozen databases in the AAG. Can I install a SQL named instance on both of these nodes, and set up another AAG with it’s databases in this new AAG?
FYI… I did try this, and instead of the single database that I added to the AAG from this named instance, I see all of the same databases that are in the default instances AAG.
For unrelated questions, check out Dba.stackexchange.com.
I’m using standard edition of SQL Server 2014. What is the best option for high availability if my production server down?
Mahendran – that’s a great question. Start with the High Availability and Disaster Recovery Worksheet in our First Responder Kit. Click First Aid at the top of our site to download it.
Thank you very much @Brent Ozar. Now I can plan based on my budget.
Thanks for this.
Any thought on having primary db on a single SSD (without RAID) NVMe SSD, like Intel P3608 series? If it fails replica is still ok without data loss?
Trubar – yes, but it would mean a failover – you would lose all running transactions.
I am wondering to know why you created different user for each machine and then put them in a group and give all access to all machine , can’t we just create one domain account called sqlservice and create a login on all sql servers for that and also run all sql service and agent service on all servers with that one user ?
Ellen – generally I like having different accounts for each SQL Server because if someone messes up typing a password on a new SQL Server installation, it doesn’t lock out all of the SQL Servers’ service accounts.
In an AG, though, I use the same service account for all servers in that AG. Makes Kerberos authentication easier.
Brent , thanks for your reply
I think I couldn’t get my question across. I want to know what role we should set for domain account that we created for sql service , you mentioned that sysadmin is not good for production but when I checked the NT SERVICE\MSSQLSERVER , it is sysadmin I mean if I would go with default setting and do not create a separate domain account it would be sysadmin , and now when I separated I don’t know which role is minimum required and aslo what is the point of not using NT SERVICE\MSSQLSERVER ?
Ellen – I don’t think I understand the question. You may want to post this on http://DBA.stackexchange.com.
Nice article !- We have successfully setup the Alwayson with different subnets and it was working fine.
But unfortunately the existing storage had a problem. Now our storage team provide new storage IP.
We need to replace the existing storage configuration with new on. Is there any way without removing the existing setup due we already configured around 500 database.
Please do the needful.
Sure! Here’s the needful: when you have questions about your own personal projects, head over to http://DBA.StackExchange.com rather than leaving a comment on a random blog post you found on the Internet. DBA.se is a great place to get answers for your questions – but make sure you boil down the question as clearly and concisely as possible, including enough details for someone to reproduce your problem. In your case, it might be storage brand names, and how you’re accessing the storage (like is it iSCSI, FC, SMB, etc.)
Thank you Brent, i will do that
hope someone can help me.
Currently we work on a desing for AlwaysON Highavailibility Groups, and i have especially a question for a file share witness in a 2 datacenter situation (DC1 & DC2).
Ok, so imaging the following situation:
DC1: Primary Node1 & File Share Witness (2Votes)
DC2: Secondary Node2 (1Vote)
If DC1 is shutting down in case of power outage, the wohle cluster will be offline as only 1 Vote (Node2) is remaining. But what we want to have is a automatically start up on DC2
Hence, Microsoft suggest a 3rd Datacenter for the File Share Witness. But we don’t have a 3rd Datacenter.
So what can i do? Personally, i have 2 ideas:
1. place the file share witness on a redundant Netapp Filer (if DC1 shut down, Netapp Filer is start up on DC2)
2. Windows File Share Cluster, means a redundant file share Cluster with shared Highavailibility SAN
Is there any other option?
Thanks a lot.
Andy – yep, put the file share witness on the most reliable thing you have. If it’s the NetApp, use that, if it’s a cluster, use that.
Thanks a lot. Just one more doubt:
Imagine In case of power outage of DC1 (primary node1), the netapp filer bring the file share witness online in DC2. (Secondary node2)
So far so good, but imagine that the netapp need some time to do the same. Is the cluster / alwayson starting up automatically when file share witness is back online in DC2?
Andy – we’re getting pretty far beyond what I can answer quickly/clearly in blog post comments, unfortunately.
Is it Sql Server Always on Highavailability is active – passive ? or active – active? Can I write data to both Server?. Please reply.
Mahendran – you can only write data to one server.
@brent ozar : Thanks for your immediate reply. if I want to go for active – active, other than Sql ServerMerge Replication, What are the other options available?
Brent- installation fails With Error 35250?
OK, this post wasn’t about installing SQL Server. For general Q&A, feel free to head on over to http://dba.stackexchange.com.
Just a simple question about the Lab setup here. How and where did you set up the quorum? From reading the article a few times, I can’t seem to figure it out?
Russ – we didn’t. Quorum (like a lot of additional features in SQL Server and clustering) is above and beyond the scope of what I covered in this post.
Thanks for this Brent, great write-up. question for you though: we have setup an AO cluster here in a three node 2012R2 failover cluster. Node1 and 2 are clustered using shared storage, node3 (our AO node) has its own storage that isn’t zoned or presented to the other two nodes. Yet, the failover cluster sees all the AO storage in the “available storage” resource group. Where did we go wrong and more importantly how do we fix it? I’ve seen a few different techniques suggested – some say evict the AO node and re-add it.. some say remove ALL storage and then re-add everything but the AO storage.. both seem overly drastic to me. Any thoughts? Thanks!
Hank – unfortunately, I really can’t troubleshoot AGs via blog post comments. This is the kind of thing we do in consulting, though – if you’re interested, click Consulting at the top of the site and learn more. Thanks!
lol, can’t blame me for trying. Thanks Brent!
I am looking to install SQL Always On with SQL 2014 across 2 data centers. We use Nutanix cluster which uses non-shared disks across sites. So, I got one VM in each Data Center for cross-site fail over. Now, if I want to add local site fail over as well, can I do the setup with 3 servers. So, VM 1 and VM 2 in DC-1 (for local fail over) and VM 3 will be in DC-2 (So, if VM 1 and VM 2 fails, then the cluster should automatically fail to VM 3). Can this be handled for automatic failover ?
SUHAS – generally, you don’t want to fail over automatically across data centers. The only way SQL Server will fail over automatically is if there’s zero data loss, which means you have to do synchronous replication. While that’s doable across data centers, it’s generally done with very fast network connectivity. I’m guessing if you’re using Nutanix and VMs, you probably don’t have the low level of network latency required.
I have a big confusion between Always On FCI/AG. I know that for FCI you need to have shared disks and in-case of Hardware failure or OS failure or System reboot the instance with db, logins etc will fail-over to another node. Plus instance is constant through the setup. For AG i am not clear when db fail-over occur ? what if hardware fails or OS fails or System reboots ? What instance names i must configure for all nodes ? What about logins we need to take care about ? I have the below requirement and i am confused what setup to use; Kindly Suggest.
I want to have SQL Server Multi Site DB Availability. I have experience with single site 2 node cluster but i have a scenario where i have a secondary site. In case of wan network failure on Site1 My ISP is redirecting my wan traffic to Site2. So i need to deploy Node1 at Site1 and Node2 at Site2 or Node1 & 2 at Site1 and Node3 at Site2. I really don’t have specific read requirements like how Availability Groups describes for reporting or backup purposes. My main requirement is that the database must be accessible from Site1 node and in-case of Site1 wan fails the fail-over must happen to Site2 node. So Kindly Suggest what kind of deployment will be best for this case.
Danish – for architecture design, that’s what we do in our consulting projects. Click Consulting at the top of the page, and you can read our sample findings – check out Client C for an example of an HA/DR project design. If you’re interested, you can schedule a consulting sales call from that page too. Thanks!
Sure i would initiate that ASAP but could you at-least answer my questions about differences between FCI/AG as i asked above, I would really appreciate that.
I’m working with clients today, but for free help, head to DBA.StackExchange.com. Thanks!
Very good article with well explained. I have read your article. I have one query that once I try to add Failover Cluster node 2 on Failover Cluster node 1, I am getting error that
you do not have administrative privileges on the server Node2
Shyam – for support questions, head on over to http://dba.stackexchange.com.
Really useful guide, thanks. I set this up using 4 virtual machines, each running SQL Server 2016. I also created a VM to act as the domain controller as all the machines need to be on the same domain (and it was a useful learning experience).
One question – I had always assumed (perhaps incorrectly) that AG was about both high availability and load-balancing. However, based on what I’ve learnt from the setup, it appears that the server hosting the primary database will still take all the load (whilst acknowledging that reporting etc could be targeted at a readable secondary). Is it possible to have multiple ‘primaries’ across multiple servers and load balance between them? I expect they’d need to share the same data files.
Shaun – no, no load balancing for writes here. Just one primary.
Brent, first off THANK YOU SO MUCH! It’s sharing professionals like you that make this a better world. I do have a question that is not clear in my mind. You write:
“The very first screen in setup is just a little tricky because it offers one link for “New SQL Server stand-alone installation” and another for “New SQL Server failover cluster installation.” The one you want is stand-alone. You can indeed do clustering with 2012, but for our solution, we’re using Availability Groups by themselves.”
So, how do the two options differ? I was under the impression one had to choose “New SQL Server failover cluster installation.” Otherwise, it would not be a SQL Server Cluster. But it seems that by leveraging AG you “make” the two notes become a cluster when you create the AG?
More importantly: How do the two installation options vary then? What each installation option can or cannot do? When is one recommended over the other?
Thank you so much in advance, (and I hope you are still reading and answering questions here.) 🙂
Raphael – this is a little beyond what I can cover in a blog post comment. Your best bet there is to pick up a book on AGs or attend one of our AG training classes.
Thank you for your reply. Yes, I am interested in doing BOTH. With that in mind, what are the two most comprehensive and easy to follow (if one can achieve that in a book.. 🙂 ) books that you would recommend currently specifically for SQL Server Clustering?
Also, could you please discourse how your training classes work? I imagine there is a fee, and the training is online? Is it official formal training, or only a free workshop? If not a free workshop, what are the options and the fee(s) please? Dates and times? Etc?
FYI: I work remotely from home (have done so for 5 years) for a medium-sized university based off of Chicago (approximately 5,000 students and another thousand staff and faculty roughly). I do have 10+ experience being a DBA, and I am their Senior DBA, but, although my knowledge of SQL Clusters is fair, my understanding of the Windows/Networking side is not as strong. I am looking to being able to vivisect Windows and SQL Server clusters from the ground up, atop a VMWare infrastructure. I do have some of the basic concepts down already, but the networking, AD, Windows Server side of a Windows/SQL Cluster are more than I was able to retain in a solid manner for the moment. I’m sure I’d learn it by myself If I had two months to dedicate to practicing, but who has that kind of time nowadays? Too many urgent projects with looming deadlines (mostly ETL stuff.) 🙂
So, again, please let me know 1 or 2 books you would recommend. And also details on you AG training classes. If you prefer, you can e-mail me directly at my personal, professional e-mail: email@example.com
Thank you in advance,
Raphael – sure, click Training at the top of the site. Enjoy!
Lol. Yes, thanks again for the quick response.
Dah. If I had taken the time to peruse through the site better I would have seen you already covered both my questions throughout the site. Thank you.
I downloaded your FirstResponderKit and I already see the value in your work. I’ll start with the “Black Friday Recorded Class Season Pass” out of my own pocket, and build from there to get enough evidence to propose more formal training/consulting to the CIO, if/when necessary. Thank you very much for putting all of this together. Best, Raphael.
Will the user get automatically replicated in the secondary database when it is created in the primary database?
The user (inside the database) will, but the login (in the system database) will not.
how to drop the user from the secondary database when it is in read only mode and synchronizing?
Allan – for questions, head on over to a Q&A site like https://dba.stackexchange.com.
Great to get connected to you. I have a question: If I give a user a right to read on my secondary replica which is in sync (not async)with primary replica ; and if that user runs a query with a lot of Select into ##Temptable statement on Database (that creates a lock on a table) … will that slow down the Primary Replica….. (i was thinking if the deadlock and locks on secondary replica will cause the slowness on the Primary replica) since they are synchronous and the primary should wait for the secondary to commit the transaction ? Thanks
Samuel – for questions, head on over to a Q&A site like https://dba.stackexchange.com.