Want a jump-start on testing the hottest scale-out feature in SQL Server 2012? Kendra Little will tell you why it’s critical that you configure your own lab, and what you need to get your test environment set up. She’ll also detail how to avoid some gotchas that could cost you hours of frustration. She’ll demonstrate the steps you need to enable the Availability Group feature in SQL Server 2012 and how to create and test your first Availability Group. This half-hour session is appropriate for DBAs with one or more years of experience with SQL Server:

Links mentioned in the video:

For more videos like this:

Kendra Little
My goal is for you to understand your SQL Server’s behavior– and learn how to change it. When I’m not figuring out the solutions to your database problems, you’ll find me at user group meetings in Portland, Oregon. I also love to draw.
Kendra Little on sabtwitterKendra Little on sablinkedin
↑ Back to top
  1. Perfect! I am getting this setup at home and you have made my life much, much easier! Thank you Kendra. You rock solid.

  2. For those of us that don’t have access to hardware beefy enough, there is a cheesy alternative. Microsoft’s TechNet ( has setup virtual labs you can connect to. That’s what I’ve been using. Be wary though! These are based on Denali CTP0 so some of the code base is a bit out of date. Also, sadly it only has 3 instances to play around with (and the 150 minute time limit leaves something to be desired)

    – Brian

  3. Great webcast. I´m installing and cloning machines like crazy to get my lab environment up and running. you´re a real inspiration and source for learning all of you.

  4. I have setup this up several times. I can do manual failover all day but automatic failover does not always work. More often than not, after the first simulated failover test, which is usually successful, I end up with the AG in a “Resolving” mode and have to restart the original primary instance to get the DB and AG online.

    Have you seen this problem? And if so, how did you fix it? This happens regardless of how I test — by shutting down the primary instance or disabling the NIC.

    Peter Trast

    • Peter – ooo, that’s a bit beyond what we can cover in a blog comment. Check out Allan Hirt’s book Pro SQL Server 2008 Clustering for more basic information on troubleshooting Windows clustering in general. For example, Windows will prevent rapid automatic failover back-and-forth between the same two nodes, so you might be hitting that. In order to get to the bottom of it, we would need to go through the error logs – I’d start there.

    • I like what Brent said about the error logs. The question that comes to mind for me is whether it’s a quorum issue preventing it from coming online, and how exactly you have the quorum configured in the cluster.

      I personally haven’t seen that error, and we have a client running AGs in production who has had automatic fail overs kick in in the real world and hasn’t had the issue– so it definitely doesn’t have to be that way!

    • Hi! Peter, I’ve got the same problem. Please let me know if you’ve already find a solution. At the moment I’ve no idea about this bug 🙁

      • For automatic failover, you need to have both partners in synchronous commit mode. I noticed once that it was easy to set this up and not notice that the primary was not in syncrhonous commit mode, although the secondary was. I believe the symptoms were much like you are describing, so I would check that first.

    • I was testing this and found that after 3 attempts of automatic failover it sits on resolving. I had to change the failover properties in cluster manager. I’m using Windows 2012. Under roles I see role with the same name as my listener. Go into the properties, failover tab and change the maximum failures….

  5. I made some changes. When I use Node and Fileshare majority (on 3rd server) and disable the AG Primary nic, I get failover and the disconnected node goes into “Unknown”, and stays there after begin reconnected until I go to manage the cluster which cause it to come online as secondary. This seems to be a proper behaviour. And from what I have read, stopping the primary instance service should NOT trigger a failure to prevent a split brain scenario because the secondary can still contact the box…? At least that is how it is acting.

    So shutting off or disconnecting the Primary now causes failover, but stopping the Primary service does not. Did this act differently for you?

  6. Kendra,

    I was able to create the cluster on Vlab Manager. I had to disable the virus scan applications as well…Besides disableing the windows firewalls. I think you should add this to your tips….I have added my comments to your blog.

    You rock!!!!
    Thanks so much for all your help.


  7. Excellent topic!

    I went through setting up in a lab and evrything worked quickly and easily. When I got to setting up an N environment in the real world, however, there were some gotchas that I wasn’t ready for. Through my travels in AlwaysOnHighAvailability testing, I have found a few important considerations:

    – Make sure to test setup using real-world permissions and access. To set up the Windows Serivices Failover Clustering components and listeners, you need to have domain privileges to create new domain computer objects. In my lab, I was given admin rights on the servers, but I was not allowed to have create domain rights. To work around this, I had to invoke the help of a domain admin to create the WSFC and to stage the listener entries in the domain.

    – Size the quorum disk appropriately. When the cluster is set to use file share for quorum, the Availability Group setup will actually use the file share to hold backups of your database while initializing. Maybe not a big deal in most labs, but, in my lab, I am working with a production sized database (2 TB) and only sized the file share to 10 GB.

    – If your lab doesn’t allow you carte blanche, be aware that each node in your cluster will need three IPs – one physical, one for the admin console, and one for the listener.

    Just wanted to put this out there to help anyone else that is working through getting a test environment set up!

  8. Question: In your Demo, how did you make it ‘synchronized’ on the database on your secondary replica asynchronous-commit mode?

    • Hey– great question.

      The demo was six months ago so I had to go back and look at exactly what I did. 🙂

      I didn’t show the setup for the whole AG. The Squeals12 instance was using async and shows as synchroniZING. The Squeals13 instance was using synchronous commit with manual failover.

  9. Once again, while researching something, I come to find the best information available is here. Thanks once again Kendra (and Brent, Jeremiah, and Jes). Thank you thank you thank you!

  10. On Failover ,when I issue query @@servername on Node1 it returns Node1 instead of Node2
    any Ideas

  11. Hi Kendra,
    Enjoyed your video!
    We seem to be stuck after failing over to a db which is supposed to come back online as R/W but when we attempt to write to the db with an applicaiton we’re getting a msg indicated that DB is in a read-only mode.

    The strange thing is that the dba’s can still go in with an admin account and insert successfully against the same DB stating it’s in read-only mode!

    Any thoughts or suggestions?

    We’re wondering if the connect string in the application may not have proper settings.

    Appreciate any help/suggestions!


    • Hi Mark,

      I think you’re on the right track in suspecting the connection string. In order to fail over, the connection string will need to point to the Listener in the SQL Server Availability group– that needs to be configured as part of the setup with a name and IP address.

      Have you confirmed that the listener is set up and that it is failing over properly in the Windows Failover cluster?


  12. So when your original primary comes back online, (obviously it’s based on data load right?) does it let you know when it’s ready to become the primary again? Do you switch over when you can? or just wait until the new primary fails?

    • Hi there–

      I’m not sure what you mean by “obviously based on data load”– the primary could sometimes go offline for hardware failure or other reasons. I may just be missing something in the context of the video, which I haven’t watched myself in a while– so apologies if I’m just misunderstanding.

      If the old primary comes up and get synchronized, some people choose to do some smoketesting and then fail back. Some people choose to just run on the secondary if it’s an equally powerful machine. Failing back requires a downtime, and if all your applications are using the listener for access, then you can stay on the secondary.

      Because of this, part of the trickiness of setup is to make sure all your jobs will work whomever is the primary (backups, maintenance, etc.). The jobs need to be on every server and check who is primary and what they should do.

      • So what I mean is, if we are having thousands of transactions a minute, and the motherboard fails, and we don’t get a replacement for say 24 hours, once the server comes back online it will have some catching up to do correct? Is there anything we would manually have to do to make sure it’s synced? Would it notify us when it has caught up?

        • Got it. You want to run through some scenarios for this and test carefully, because there’s decisions you want to make if / when this happens and you don’t want them to take them by surprise.

          If you leave a replica offline for a period, the primary server could absolutely be impacted — the transaction log is going to grow and grow because it can’t clear out information.

          This is more complex than I can answer in a single comment, but you want to produce the scenario in a test environment in any case to document the steps that should be taken to minimize risk given your RPO/RTO and the environment.

  13. Want to take snapshot of production DB every so often at various times to see what expected traffic would cross network if moved to AO AGs, is there a tool for tis?

    • YNW79 – not that I’m aware of, but instead, I’d just look at your transaction log backup sizes.

      • DB is in Simple mode, trying to find way to determine traffic LAN if moved to AO, RTO/RPO changed on app

        • Agree with Brent about the log backup sizes– and the first step to meeting that RPO and RTO is mostly getting into full recovery and getting log backups in place, in any case.

          • Long story but app team will use other technologies (Symantec) if cannot provide some data on AlwaysOn; I would think MS would have a utility to measure what would be written to a secondary DB that could give an idea of the traffic over network to secondary. Taking on a big project with little knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *