Always On Availability Groups FAQ

Here are some of the Always On Availability Groups questions I get the most often:

Q: How much network bandwidth will I need?

For a really rough estimate, sum up the amount of uncompressed transaction log backups that you generate in a 24-hour period. You’ll need to push that amount of data per day across the wire. Things get trickier when you have multiple replicas – the primary pushes changes out to all replicas, so if you’ve got 3 replicas in your DR site, you’ll need 3x the network throughput. Calculating burst requirements is much more difficult – but at least this helps you get started.

Q: What’s the performance overhead of a synchronous replica?

From the primary replica, ping the secondary, and see how long (in milliseconds) the response takes. Then run load tests on the secondary’s transaction log drive and see how long writes take. That’s the minimum additional time that will be added to each transaction on the primary. To reduce the impact, make sure your network is low-latency and your transaction log drive writes are fast.

Q: How far behind will my asynchronous replica be?

The faster your network and your servers are, and the less transactional activity you have, the more up-to-date each replica will be. I’ve seen setups where the replicas are indistinguishable from the primary. However, I’ve also seen cases with underpowered replicas, slow wide area network connections, and heavy log activity (like index maintenance) where the replicas were several minutes behind.

Q: What’s the difference between AGs in SQL 2012 and SQL 2014?

SQL Server 2014’s biggest improvement is that the replica’s databases stay visible when the primary drops offline – as long as the underlying cluster is still up and running. If I have one primary and four secondary replicas, and I lose just my primary, the secondaries are still online servicing read-only queries. (Now, you may have difficulties connecting to them unless you’re using the secondary’s name, but that’s another story.) Back in SQL 2012, when the primary dropped offline, all of the secondaries’ copies immediately dropped offline – breaking all read-only reporting queries.

Q: How do I monitor AlwaysOn Availability Groups?

That’s rather challenging right now. Uptime monitoring means knowing if the listener is accepting writeable connections, if it’s correctly routing read-only requests to other servers, if all read-only replicas are up and running, if load is distributed between replicas the way you want, and how far each replica is running behind. Performance monitoring is even tougher – each replica has its own statistics and execution plans, so queries can run at totally different speeds on identical replicas. There’s not really a good answer to this question right now.

Q: How does licensing work with AlwaysOn Availability Groups in SQL 2012 and 2014?

All replicas have to have Enterprise Edition. If you run queries, backups, or DBCCs on a replica, you have to license it. For every server licensed with Software Assurance, you get one standby replica for free – but only as long as it’s truly standby, and you’re not doing queries, backups, or DBCCs on it.

Q: Can I use AlwaysOn Availability Groups with Standard Edition?

Not at this time, but it’s certainly something folks have been asking for since database mirroring has been deprecated.

Q: Do AlwaysOn AGs require shared storage or a SAN?

No, you can use local storage, like cheap SSDs.

Q: Do Availability Groups require a Windows cluster?

Yes, they’re built atop Windows failover clustering. This is the same Windows feature that also enables failover clustered instances of SQL Server, but you don’t have to run a failover clustered instance in order to use AlwaysOn Availability Groups.

Q: Do I need a shared quorum disk for my cluster?

No, and check out Kendra’s video on failover cluster quorum to learn your options.

Q: What version of Windows do I need for AlwaysOn AGs?

We highly recommend Windows Server 2012R2, and here’s why.

Q: Can I have different indexes or tables on my replicas?

No, the replica database contents will be exactly the same as the primary.

Q: If I fail over to an asynchronous replica, and it’s behind, how do I sync up changes after the original primary comes back online?

That is left as an exercise for the reader. When I go through an AG design with a team, we talk about the work required to merge the two databases together. If it’s complex (like lots of parent/child tables with identity fields, and no update datestamp field on the tables), then management agrees to a certain amount of data loss upon failover. For example, “If we’re under fifteen minutes of data is involved, we’re just going to walk away from it.” Then we build a project plan for what it would take to actually recover >15 minutes of data, and management decides whether they want to build that tool ahead of time, or wait until disaster strikes.

Kendra says:  I wish I’d had this list when I started learning about AGs. I was also surprised to learn that crazy things can happen in an AG when you create indexes. Check out more on that here.

Previous Post
Woohoo! We Won a Red Gate Tribal Award: Best Blog of 2014 (Again)
Next Post
Quiz: Are You the Next Brent Ozar Unlimited Consultant?

34 Comments. Leave new

  • Overall the Q & A is very valuable.
    However for this question “Q: What’s the difference between AGs in SQL 2012 and SQL 2014?”, I had different results. Maybe we tested a different scenario than your answer refers too.
    “when the primary dropped offline, all of the secondaries’ copies immediately dropped offline – breaking all read-only reporting queries”
    We tested what I believe is this scenario in SQL2012 and had no issues. Within a 3 node AG, we use an AG listener for both primary (Srv A) and secondary (Srv B) read-only connections. We don’t have any application connections that specify a secondary node explicitly. The secondary ro connections are from business objects and utilize a read-intent connection through an ODBC definition. When I stopped the SQL service on the primary (Srv A), the report continued run as new connections came in to the new primary (Srv B). After we disconnected from business objects and restarted another report a read-only connection was now started on our remaining secondary (Srv C) as Svr B is now my primary, Srv C is secondary, and Srv A is unavailable.

    Reply
    • Dave – I’d be interested in seeing more about your testing methodology and how the cluster is set up – I bet there’s something we’re not seeing in the testing. It’s beyond something I can troubleshoot in a blog comment though.

      Reply
  • Further Inquiry to this

    Q: If I fail over to an asynchronous replica, and it’s behind, how do I sync up changes after the original primary comes back online?

    Is it possible to take a backup of your asynchronous replica and copy it back to one of your original synchronous servers to re-establish one of the synchronous servers as the primary again? I realize you would need to prevent any connections to your asynchronous server so you do not lose any transactions though. This is the scenario we may be looking at as we only have a few databases we are using AGs with, the databases are less then 5 GB.

    Reply
    • Corey – once you fail over to an async replica, you’re basically walking away from all changes on the former primary.

      Say you’ve got a dbo.Products table. On the primary, you issue an UPDATE dbo.Products SET Color = ‘White’ WHERE ProductID = 100. Then, your primary dies before it can get that change over to your secondary – which still shows the color of Black for ProductID 100.

      If you fail over to the async secondary, and now you want to merge changes between the databases, how do you know which one is correct? Especially if there have been subsequent changes to ProductID 100 on the secondary?

      Or even worse, consider a table with identities where you’ve got conflicts – the same ProductID can have two wildly different things on the two different servers.

      Reply
  • Brent,
    Let’s say you have one AG; a local sync secondary replica and an async secondary replica on DR site.
    And you have not-so-reliable network connection to DR.

    What happens if you lose the connection for 3mins, or several hours when no one is actively monitoring the system.

    Reply
  • I have an Availability Group with a single primary and a single secondary in synchronous mode. During times of peak load, my primary completely locks up because it is waiting on the secondary to get synced up – I get hundreds of HADR_SYNC_COMMITs causing crazy blocking on my primary replica. I have really great hardware and network connectivity and SSD SAN storage already. Are their any settings within SQL Server I can tweak that controls the performance of the HADR SYNC operations?

    Reply
    • Jeff – yeah, there’s lots, but this is kinda beyond what I can address fast in a blog post. You have to zoom out and ask, “Why aren’t these operations completing fast enough?” Is it a network problem, a disk write problem, a thread problem, etc, and then address it from there. The HADR_SYNC_COMMIT waits are just the start of your journey, unfortunately, and we don’t have a blog post on that troubleshooting process. (It’s fresh in my mind because I just did it with a client last week, and it took us a few hours to narrow it down on their particular system.)

      Reply
  • Does my database need to be in native mode if I want to use AlwaysOn with SQL Server 2012, or can I configure my database with SQL Server 2008 R2 BC mode?

    Reply
  • Brent,

    If i have a 2 node SQL Cluster built on top of Windows Cluster on my primary site and a stand alone SQL server on DR site, can i configure AGs between primary and dr with automatic failover ?

    Reply
    • Vijay – if you have a failover clustered instance in your Availability Group, the only automatic failover you get is in the failover clustered instance.

      You don’t really want automatic failover to DR anyway – that would require synchronous writes to both your primary and DR site, and performance on that will be pretty rough.

      Reply
      • Brent,

        Thank you., Appreciate your quick response. Currenlty we have a stand alone SQL server on Primary site and stand alone SQL server on DR site and we have Transactional replication between Primary and DR.. which gives us a readable db ( with minimum data loss) at the DR site if and when the Primary server was offline.

        Now, I am exploring alternatives to our existing scenario where I was hoping to create a 2 node clusrer on the primary site and a stand alone server on the DR site with AG configured between Primary site and DR site.

        I understand that sync between Primary site and replica at the DR with require enough bandwidth , but do you see any other pit falls in this approach ?

        Reply
  • Brent, hi!

    What you can talk about receive and redo speed on replica side in AAG on SQL2012R2?
    I heard about maximum 50 Mb/sec for receive and 50Mb/sec for redo. But really I have only 20Mb/sec for redo and as I know – redo operation use only one core. I think it’s awful for situation when I can generate a lot of logs. For ex. when I must reorganize a lot of indexes in big tables.
    Do you know about any KB or advise for improving performance in my case?

    Reply
  • Hi Brent,

    For, “Q: What’s the difference between AGs in SQL 2012 and SQL 2014?”, what do you mean when you say the “when the primary drops offline”?

    I’m assuming you are referring to the SQL Server Instance (primary replica) becoming unavailable, not a database(s). Shouldn’t the secondary replica come online as the primary replica if it is configured for synchronous automatic failover?

    Thank you!

    Reply
  • a new question on an old thread. Sql Server AG(12.0.2000) sitting on top of VMWare. The sysadmin hotadds memory on the machines, application loses the ability to connect to the listener. Looking in the logs it appears service broker stopped and we had a DNS lookup timeout for the listener name.

    It’s really easy for me to say, “That’s a bug, let’s not do that,” but I would like to know an underlying cause to see if I need to push harder for VMWare updates or sql server updates.

    Reply
    • Tim – I don’t usually recommend hot-adding memory to VMs, period, due to stability issues I’ve seen when doing that. If you want to troubleshoot your setup deeper, you’ll want to open a support ticket with Microsoft and/or VMware.

      Reply
  • Jason Carter
    May 16, 2016 11:37 am

    Brent,

    For the question: “Q: How far behind will my asynchronous replica be?”, Is there a go-to query to get real-time stats for MBs or time estimate it is behind? I have looked online and was surprised that I was unable to find one.

    Reply
  • Can SQL Server FCI and availability groups be implemented in same environment, how does that work? What are the pros\cons? I would like to setup a 4 node cluster\ag where 1 node is the primary, 1 node for DR in another data center that has a slower link, 1 node for reporting and the other node as the primary fail over node.

    Reply
  • rahul yerrabelli
    September 2, 2017 4:34 am

    Hi Brent.. I had a doubt. Please helpme out. How to avoid or get back the data lost in asynchronous always on availability group? Is there anyway to avoid data loss? My scenario is Server A is in synchronous mode in a datacenter called Eastgate and server B in asynchronous in other data centre IOP. Let’s say Eastgate data centre got crashed , now I will failover the availability group manually to IOP data centre availability group which is in asynchronous mode.Now my questions are
    how can I avoid the potential data loss?
    How can i script the lost data?
    how can I failover back to original data centre?
    Can you help me out on this..

    Reply
  • Luis Agustin Azario
    November 9, 2022 4:58 am

    Regarding AGs, If i have a primary and secondary servers configured with AO and one Listener, can I create 2 or more AG with diffent databases and configure all AG with same listener ? All databases are related to same aplication and must failover together.
    I am using Standard edition 2019.

    Reply
  • Luis Agustin Azario
    November 9, 2022 6:38 am

    Regarding AG I found this information:
    “You just have to have a 1:1 mapping of AG to DB. And if you want to have a listener to connect to and not have to deal with changing connection strings at failover or doing something like my kludgey tweet that mentioned using one listener and scripts to automate the failover of the other DBs on the same instance, you have to also have a 1:1:1 mapping of DB:AG:Listener”

    I am using Sql-Server 2019 Standard, I read about new features in AO like Cluster less i.e.
    I don’t understand why is mandatory to have DB:AG:Listener for each DB:AG, can’t I use the same listener for all AG:DB ?

    Reply
  • Hi Brent,
    I hope you are doing great. We have always on server with different data centers. Server A located in VA and Server B located in TX datacenter. When we test failover from primary, VA to secondary TX, it took for ever. Still waiting to see if it work. Spend more than 3 hours. But we saw that failover from SSMS works fine. Any way to to resolve the latency on this?
    Thanks

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.