How Would You Change Always On Availability Groups?

Last Updated February 9, 2017

Always On Availability Groups, SQL Server

SQL Server 2012 introduced AlwaysOn Availability Groups, a way to achieve high availability, disaster recovery, and scale-out reads. SQL 2014 brought some improvements around higher uptime and more scale-out, and all signs point to continued improvements in the next version of SQL Server, too. (I love it when Microsoft brings out features like this and continues to invest in them over time.)

A lot of the emails I get start with, “I’d like you to help me implement AlwaysOn AGs,” but it’s funny – most of the projects don’t end up actually deploying AGs. There’s a few barriers to adoption, and even when you’ve built an Availability Group, management can be a little tricky. Don’t get me wrong – I love the feature – but it comes with some surprises.

Rather than me prejudicing you, I’ll just put it out there as a question:

How would you change AlwaysOn Availability Groups?

Leave your answer in the comments. (And yes, Microsoft is watching.) Bonus points if you link to your Connect request.

Oracle HA & DR Basics [With Video]

How many CPUs is my parallel query using in SQL Server?

93 Comments. Leave new

Steven Ormrod
November 25, 2014 8:35 am

Bring in a few features from replication:

Be able to have a subset of tables or rows in your replica.

Be able to have different non-clustered indexes.

Reply
- Brent Ozar
  November 25, 2014 9:15 am
  
  Steven – ah, that’s interesting. So at that point, the data files aren’t the same. Would you also lose the ability to fail over to a replica, do automatic page repair, reinitialize from backup, etc?
  
  Reply
  - Steven Ormrod
    November 25, 2014 10:25 am
    
    Yep, I guess we would lose it for those implementations. But that’s OK. When you need those features, then make sure the replica is your entire database.
    
    Isn’t replication marked for deprecation? There are times when we need subsets of databases to be ‘mirrored’ so I would like to see those features rolled into Availability Groups.
    
    Reply
    - Brent Ozar
      November 25, 2014 10:27 am
      
      Steven – nah, replication will keep soldiering on. (For example, you can’t do peer to peer or merge type stuff with AlwaysOn AGs, or do AGs without a domain.)
      
      Reply
Richard Kut
November 25, 2014 8:42 am

Change the SQL Server license mobility for the core licensing so that a VM may be failed over instead of an entire physical core. And yes, I bought the SA to allow for license mobility.

Reply
- Brent Ozar
  November 25, 2014 9:14 am
  
  Richard – hmm, I’m a little lost. What’s this have to do with AlwaysOn Availability Groups?
  
  Reply
  - Richard Kut
    November 25, 2014 9:32 am
    
    I am saying that the licensing model is restrictive. One has the choice of licensing physical or virtual cores. I believe the minimum number of virtual cores to license for a VM is 4. At the cost per core this becomes expensive very quickly if you have mutliple VMs running. The physical core option licensed with Enterprise Edition and SA allows for an unlimited number of virtual cores to be used. However, when failing over one must failover the physical cores. This is somewhat impractical, and hence my comment asking Microsoft to be a bit more creative in providing additional licensing options that better address this scenario.
    
    Reply
    - Brent Ozar
      November 25, 2014 9:36 am
      
      Richard – got it, but can you explain to me how this is related to AlwaysOn Availability Groups? Are you using AGs in this scenario, or is this a change you just want for all licensing?
      
      Reply
Tammy Cravit
November 25, 2014 9:35 am

In an AG with multiple read-only secondaries, have better logic for routing queries with “Application Intent = ReadOnly” than “send it to the first secondary that’s alive”. Either round-robin or some sort of smarter load-balancing would be perfect.

Reply
- Adam Belebczuk
  November 25, 2014 2:30 pm
  
  Tammy,
  
  It’s funny that you should mention this. I mentioned the same thing during my Intro to AlwaysOn AGs session at the PASS Summit, and a Microsoft employee in the audience said that round-robin load balancing between readable replicas should be in the next version. I’m hoping that this holds true.
  
  — Adam @SQLDiablo Belebczuk
  
  Reply
- Steven Barnard
  November 25, 2014 2:34 pm
  
  How about setting up MS NLB or any other 3rd Party NLB in front? You could couple this with a monitor written in powershell to determine which is the primary role and make sure it is kept offline in the NLB and keep your reads routed to the secondaries.
  
  Reply
  - Adam Belebczuk
    November 25, 2014 4:14 pm
    
    You could, but that feels clunky to me. If you’re using Windows Server Failover Clustering and AlwaysOn Availability Groups, you’re looking for reliability. To have a PS script and a 3rd party load balance in the mix just adds complexity and uncertainty. Will it work? Possibly. Maybe even probably, but I wouldn’t want to rely on it for critical workloads.
    
    Reply
    - 7gartner
      November 25, 2014 9:59 pm
      
      Check out ScaleArc for this. no I do not work for them.
      
      Reply
    - Brent Ozar
      November 26, 2014 6:29 am
      
      Adam – 3rd party load balancers are actually really successful in this type of environment. You can do things like automatically take servers out of the rotation as they’re being patched, as they fail, or as they get too far behind. They’re really well-tested (I’m thinking of F5s in particular) and they have tons of features.
      
      I don’t want my DBAs managing them, though – this is a function of the network team. In mission critical workloads, you probably already have load balancers and load balancer expertise on staff to handle your web servers, so this is only a small extension of what they’re already doing.
      
      Reply
pmpjr
November 25, 2014 9:37 am

Bring AG to Standard in vnext to replace mirroring. On paper with mirroring going away seems like the next step in the process. It would be nice for doing a warm/dr site for small businesses who can’t afford enterprise licenses.

Reply
- Nathan Jolly
  November 26, 2014 11:03 pm
  
  I completely agree – once mirroring goes away, Microsoft is really leaving Standard Edition customers (especially ones who’ve been paying for Software Assurance for years) in the lurch.
  
  Reply
JuanELBaptisto
November 25, 2014 9:44 am

I’d like to see it that if you connect to a SQL HAG Listener via SSMS you are only presented with the databases that are part of that SQL HAG group rather than all databases on that instance. When you have multiple Availabilty groups and listeners I think this would be quite useful when troubleshooting HAG issues.

Reply
- Brent Ozar
  November 25, 2014 12:19 pm
  
  Juan – hmm, but don’t you sometimes need the ability to join to other databases on the server? For example, at some shops, we use reporting databases on each instance that aren’t part of the AG.
  
  Reply
  - John Q Martin
    November 26, 2014 10:44 am
    
    The only caveat with this is that cross database queries are not a supported configuration, but I guess that comes back to the suggestion/request for DTC support elsewhere in the comments section 🙂
    
    Reply
Michael Proko
November 25, 2014 10:04 am

Make backups work on the secondary work like they do on a primary, or single instance for that matter. IE Full, differential, and transaction logs working together in harmony.

Reply
- Brent Ozar
  November 25, 2014 12:21 pm
  
  Michael – hmm, interesting, so what about backups isn’t working for you now? Can you describe the scenario that’s causing you problems?
  
  Reply
  - Ray
    November 25, 2014 12:53 pm
    
    Right now, AO only supports full backups with copy_only on the secondary replica with no support for differerntial backups. I can’t implement my backup strategy; weekly full, daily diff., and 15min transa logs from the secondary.
    
    Reply
    - Josh
      April 13, 2015 11:44 am
      
      I am in the same boat as Ray. I would love for AG to be able to provide a restore to point in time from backups on the secondary Server.
      
      Reply
  - Michael
    November 25, 2014 1:11 pm
    
    Not necessarily a problem by any means, but for different concerns such as time it takes for backups to finish, in the case of doing updates at night; or storage space concerns, it would be nice to have the full functionality of backups available on the secondary.
    
    Reply
  - David Potter
    November 25, 2014 1:52 pm
    
    Brent, we are still on 08..and my AG knowledge is very limited, so take this with a grain of salt. During backups the server/database is hammered and all writes suffer. We have an app that reads a lot of data, and when the user runs the forecast or model it also writes a lot of data. Our company is global, and currently this greatly effects Asia, since that is when backups and the main data loads take place. It would be nice to have some way to push the backups (and query reads) off to other servers. (in a cost effective manner)
    
    Reply
    - David Potter
      November 25, 2014 1:56 pm
      
      Brent.. BTW the above reply is related to Michael’s comment .. but we are different companies… so I am not replying to your comment to him. (just in case there was some confusion)
      
      Reply
      - Nic
        November 26, 2014 11:03 am
        
        The trouble with differential and full backups is that they require certain information to be available.
        For example, a full (non differential) actually updates pages to show that they have been backed up. This is a change to the database, which you can’t have on a secondary.
        A differential looks at a bitmap to get which pages have changed since the last full. Given that it’s log changes that come across rather than data changes I don’t know that this is viable (although I would love for someone in MS to be answer that one).
John
November 25, 2014 10:59 am

Build in load balancing. I know it wouldn’t be trivial, but AGs are so much easier to configure than replication and mirroring.

Reply
- Brent Ozar
  November 25, 2014 12:22 pm
  
  John – that’s an interesting thought exercise. How would you balance load between replicas? Would you need to know their relative horsepower, what types of queries you want on each replica, where the replicas are located?
  
  Reply
  - John
    November 25, 2014 12:56 pm
    
    Good points- as of now, all of our AlwaysOn setups are virtualized (yes, I know), so the machines are uniform, and those things are a bit easier to control. Our only physical cluster (well, the only one that can wake me up in the middle of the night, anyway) is still on 2008 R2 and probably isn’t leaving anytime soon, not to mention the distance between the datacenters it’s kept in is a whopping 2 miles.
    
    It’d work better for me than others, I suppose. It’d be a rare case of business requirements actually making things easier.
    
    Reply
- John Tamburo (@SQLBlimp)
  November 25, 2014 9:18 pm
  
  This would be a good feature for high-load plus high-availability. However, just like peer-to-peer replication (I have a four-node PPTR running in prod now and it is not fun unless all is stable), you might introduce all sorts of conflicts. I can’t see a way to do this practically unless you extend locks to all nodes, effectively making every replica not just a replica but a fully active database. Alternatively you could load-balance select statements, but then don’t you have latency issues between read-only replicas and the read-write instance? Therefore I certainly agree that the issue is not trivial! I’d love to see this solved effectively.
  
  Reply
Jerrod
November 25, 2014 11:00 am

gMSA support for FCIs and AlwaysOn AGs. The concept is such a cool one from a security perspective and it would be nice to have it supported. I see no technical reason sMSAs or separate gMSAs for each node wouldn’t work for AGs. Unfortunately, I have seen fuzzy information around official MS support for sMSAs (and don’t see anyone who has tried the separate gMSA route), plus the computer, not the DCs, still manage the password in sMSAs. It’s not my connect request, but I fully support it!

https://connect.microsoft.com/SQLServer/feedback/details/767211/gmsa-for-sql-server-failover-clusters

Reply
Ben Thul
November 25, 2014 11:08 am

In a multi-subnet setup, route clients based on their IP.

Reply
- Brent Ozar
  November 25, 2014 12:22 pm
  
  Ben – you can actually already do that if you’re using Windows AD and DNS.
  
  Reply
  - Ben Thul
    November 25, 2014 1:04 pm
    
    Cool! Is it anywhere in the AG documentation or would the DBA implementing it need to be an AD and DNS expert o know that that was even a possibility?
    
    Reply
    - Brent Ozar
      November 26, 2014 6:37 am
      
      Ben – yeah, this isn’t something DBAs do, but something your network/AD/DNS team does. Explain that you want people to be routed to the nearest SQL Server replica in a list of IPs, and they handle subnet prioritization for you.
      
      Reply
Paul Mathur
November 25, 2014 11:11 am

Available on Standard edition, cost in $$ is too high for most companies if using 3 nodes, ie HA + DR.

Reply
- Brent Ozar
  November 25, 2014 12:23 pm
  
  Paul – hmm, so how are you accomplishing that today with other technologies like mirroring or clustering? None of them offer 2 free standby instances in either Standard Edition or Enterprise Edition.
  
  Reply
  - Paul Mathur
    November 26, 2014 4:04 am
    
    Good point Brent. Most places use two standard licenses as 1 standard FCI and 1 instance at DR site to host the mirrors.
    
    Reply
    - Brent Ozar
      November 26, 2014 6:23 am
      
      Paul – exactly. And that still works in 2014, but of course that’s with the same mechanisms, FCI and mirroring.
      
      Reply
Matt
November 25, 2014 11:49 am

Remove the licensing cost for replicas that are read. They are already limited to read-only.

Reply
EricCSinger
November 25, 2014 11:53 am

I would second some of what other folks have mentioned, and some of that is basically copying what’s available in Microsoft Exchange Server:

1. Have a “SQL front end” that proxies requests to a “database backend”.
1a. the proxy servers would be AD site aware and could potentially redirect clients based on whether its a read/write to either a local database or to a remote database.
1b. If you have a proxy, there’s probably a lot of things you could do (not already listed below).
2. How about a sharding type of model? Rather than a database being a 1:1 per SQL server, have a database that spans multiple database servers.
3. Have jobs linked as part of a database availability group, not part of a SQL server so when fail over occurs, so do the jobs
4. Load balancing (multiple readers).
4a. If you have a SQL server with 1,000’s of DB’s and 10+ AAG’s, perhaps recommending the best placement for active load based on performance stats (think VMware’s DRS for SQL)
5. Lagged replication. Continually replicate, but only replace logs older than 30 minutes. Might save you the need of having to restore a database.
6. Play well with snapshot. MS made some some major changes in active directory so that it could support snapshot based rollbacks, would be nice to see a similar implementation.
7. Allow full backups from a secondary node.
8. For those still not using contained DB’s, automatically fix SQL login SIDS on the destination DB, or at least throw a warning of a SID mis-match.

Reply
- Michael Williams
  November 25, 2014 12:15 pm
  
  +1 on #8 above.
  
  Reply
- Josh
  April 13, 2015 11:49 am
  
  +1 on #3 and #8
  
  Reply
Kevin
November 25, 2014 12:03 pm

Update SSMS to allow the creation/modification of AGs containing databases encrypted with Transparent Data Encryption. Consider adding steps to the wizards to check for the existence of the certificates on each instance hosting the AG and to add them if they are missing.

http://msdn.microsoft.com/en-us/library/hh510178.aspx

Reply
Vinay P
November 25, 2014 12:05 pm

It would helpful if the RegisterAllProvidersIP is set to to 0 when the listener is created from within SSMS/Availability group instead of being set to 1 and the Validation reporting alerting about it to be changed to 0.

Reply
- Mihir
  November 25, 2014 1:02 pm
  
  I would take this even further, requesting easier overall configuration at the cluster level. After a few tries it’s second nature, but one’s initial setup can be a bit rocky with all the config items required.
  
  Reply
tobi
November 25, 2014 12:30 pm

– Give me a next-next-next wizard that does everything required. Give me the simplicity of mirroring.
– Support MSDTC. How are we supposed to do highly-available DTC transactions? I believe shared storage is the only option right now.

Reply
Elijah W. Gagne
November 25, 2014 12:41 pm

I would second support for MSDTC.

I would also like AGs to support updating BIND DNS and not just MS DNS.

Reply
- Mihir
  November 25, 2014 12:52 pm
  
  +1 on the MSDTC support. It’s a major disappointment that a pervasive MS technology like DTC is not supported under their flagship SQL HA/DR solution.
  
  Reply
  - Nic
    November 26, 2014 11:11 am
    
    The problem with DTC is that you cannot guarantee that the transactions for different databases will all commit on a secondary in the event of a failure of the primary. It is possible that database A will have everything committed but database B have only a third committed on the secondary.
    Transactions live interdependently for each database, and are moved as a part of AGs for each. It’s not great, but the alternative is worse, where all transactions would be greatly slowed down as you would have to wait for all commits to complete before sending to a secondary, and also have to force order across all databases. This would lead to massive queuing issues and sync replicas just would not work.
    
    Reply
Nic Neufeld
November 25, 2014 1:01 pm

It would be nice if there was a way to make the readable secondary concept more useable…with a listener more intelligently filtering than the connection string ApplicationIntent bit. Not sure how that would be possible, just a general sort of wish…since it gets routed at connection time to one or the other server I can’t see how it would divine it without being able to read the future and verifying that it will not, in the future, issue an update statement. So, like I said, maybe a simple declaration of application intent is all that can be done, unless you do a sweeping rebuild with some kind of front end service being the interface for all traffic and redirecting based on specific query.

I’m also a bit dissuaded from the readable secondary model by the prospect of versioning tags getting appended on the primary…perhaps a minor impact but still, the secondary is affecting the primary in a way that log shipping for example doesn’t. If there’s a way to make that have zero impact on the primary, that would be cool.

Simplifying the “domain admin” side of things would be helpful…for those of us who don’t have domain admin access, setting up all the DNS / Active Directory stuff can be complicated when we don’t have access to do it, and we have to enlist a sys admin who may or may not be interested enough to work hard to figure out the setup for us. So the easier we can make it on them (“run this Microsoft wizard and enter in these values”) the better for us. Maybe if I was setting it up regularly this wouldn’t be an issue, but it was a pain, the first time anyway, with lots of lost time going back and forth between me and a network engineer.

We’re only using AGs on one instance…there just isn’t a big enough NewBenefits / NewProblems ratio for us to dump mirroring and log shipping for it, but once our vendor apps start to get upgraded to 2012 and beyond we’ll be looking much harder at it….

Reply
Jeff
November 25, 2014 1:34 pm

Can we call it something simpler, like, multiple mirroring? Or readable replication? Distributed replication? Something that actually tells you what it does? The name right now makes it a more difficult sell because nobody (who hasn’t already researched it) is really sure what that means.

Reply
- Brent Ozar
  November 26, 2014 6:33 am
  
  Jeff – hmm, that’s interesting. The terms you’re listing already have meanings – do we really want to confuse people by making them think AGs are related to mirroring or replication?
  
  Reply
Matthew Harkins
November 25, 2014 1:48 pm

I’m not a DBA, I’m a SharePoint admin that has found himself in a DBA role.
I’d like Microsoft to publish a troubleshooting guide or wizard; Something that I can run or even just follow after configuration to learn or be told that my DNS or AD isn’t setup properly for this multi-subnet AG and here is how you can correct this. Or, my AG listener isn’t accessible from this host – try x, y, z to fix this. Or permissions are incorrect on this DNS/Cluster/Virtual resource – I see you have used service account x for y, please add permission z to object q to complete this configuration.

I want SQL Management Studio to remember that I set the Extended Option for MultiSubnetFailover=True the next time I open SQLMS and try connecting to my AG.

Put more of the configurations in one place. I want to be able to adjust my cluster hearbeat timeout from within SQL Management Studio options where I am making all the other changes for the AG – the dashboard seems like a logical place to me.

I second Kevin’s suggestion of better support for TDE databases.

When I add a DB to an AG, I’d like the Database logins to be kept in sync between hosts.

Reply
- Brent Ozar
  November 26, 2014 6:32 am
  
  Matthew – the troubleshooting wizard is interesting. Take Windows Clustering as an example – it took Microsoft nearly a decade to really make a phenomenal Cluster Validation Wizard, and even now it’s rough around the edges (but it’s wonderful). There’s still no Windows clustering troubleshooting wizard – you have to drop down into PowerShell and look at logs. Given how much more popular Windows clustering is, and how much easier it is to make investments there, I wouldn’t expect to see troubleshooting wizards for a more niche technology like AlwaysOn AGs.
  
  Reply
C Jewel
November 25, 2014 3:29 pm

Our AG set up has encountered issues I wish were resolved. Some of them are have already been mentioned in other comments.

1. Sharing of system dbs or a way of automatically synching logins and jobs across nodes. Not having jobs synced is a PITA. Having to devise a scheme of checking node condition (primary? then proceed, otherwise not) isn’t very efficient either. If we failover there are jobs that should continue to run seamlessly. I don’t understand why there isn’t an out-of-the-box method of dealing with this reality. This applies as well to SSIS packages.

2. It is possible, sadly, to configure an AG set up such that no backups take place. One of our junior DBAs did that, and it happened again with our COLO facility administrator who should have known better. This happens when the primary is set to prefer secondary for backups and the secondary is set to prefer primary. MS needs to log this as a bug and at least warn that will happen, but a no-backups condition shouldn’t be permitted — because in this state, even the Hallengren backup jobs run without error — without actually doing a backup.

3. Service Broker adds additional risks and dangers because if it’s not on and you need it on and/or someone is unaware that you must re-enable SB after a restore, that DB must be pulled out of the AG group, SB enabled, and then returned to the AG group.

I often wish we were in a clustered environment instead of AG. I feel there are too many undocumented gotchas at the moment that make it really hard to design a set up that requires the least possible manual intervention.

Reply
- Brent Ozar
  November 26, 2014 6:30 am
  
  C Jewel – heh heh heh, yep. The backup story makes me chuckle because I see that a lot – not just with the built-in functions, but people getting fancy with their own backup logic and job alerting.
  
  This feature isn’t the easy button – it’s a button, and it’s a really powerful one, but it ain’t the easy button.
  
  Reply
Ian
November 25, 2014 5:38 pm

I would like to see an expansion of the contained database concept – specifically SQL Agent jobs. Being able to pull jobs along with the databases when moving an availability group from one server to another would be extremely helpful for DR planning. While not impossible to manage but being able to keep jobs synchronized between servers and automatically enabled when the primary database role moves from one server to another. Additional instance level objects (linked servers) would be helpful to have stored within the database rather then system databases.

Reply
- Brent Ozar
  November 26, 2014 6:27 am
  
  Ian – agreed, that’s something I hear a lot too.
  
  Reply
- John Q Martin
  November 26, 2014 10:47 am
  
  Thanks for the boot up the rear end, been speaking to people about this and meaning to post a connect suggestion on at least getting jobs into contained databases.
  
  Just got round to doing it 🙂
  
  https://connect.microsoft.com/SQLServer/feedback/details/1040183
  
  Reply
  - Rob Nicholson
    February 8, 2015 7:47 pm
    
    Been wanting this for ages. I went to vote on the Connect Item but noticed I already did this a couple months back, lol.
    
    The implementation of Contained Databases with Agent jobs would have to be quite restrictive e.g would dynamic SQL be allowed? What happens if the job accesses another (db outside the AG/not contained), What happens if an external reference (SSIS file path, batch file) are referenced? What happens when the job is updated (does that get propagated to the secondaries straight away?)
    
    Reply
Chad Tether
November 25, 2014 7:17 pm

I know it sounds small and stupid but I’d love a trace flag to be able to stop the error message ‘The target database, ‘xxxx’, is participating in an availability group and is currently not accessible for queries’ being written to the Error Log.

Reply
- Brent Ozar
  November 26, 2014 6:27 am
  
  Chad – that’s actually probably totally doable! Have you filed a Connect request for it? Sometimes these small/easy changes are the easiest ones to get in – but make sure you explain the business need for it in the Connect item. They’re not likely to do it unless they understand why you want it.
  
  Reply
  - 7gartner
    December 5, 2014 12:05 am
    
    Connect is a giant waste of time
    
    Reply
    - Sam Jones
      December 5, 2014 12:26 am
      
      +1.
      
      I have found no effective way to interact with MS for years. Even when we get SRX numbers, and clear regressions (thinking of an issue related to Win OS just now), it is a black hole.
      
      Reply
Richard Slim
November 25, 2014 9:34 pm

I would like to see the reliance on Failover Clustering removed and for AG’s to exist purely in an instance level and not at a server level. I would love to have an instance where all of my replica db’s go but due to a server having to be added to each cluster this makes it near impossible to consolidate and instead have to go to small disparate vm’s for my readable replica’s.

Also fix AG’s in Azure, the current limitations make it impossible for us to have automatic failovers in Azure.

Reply
- Brent Ozar
  November 26, 2014 6:26 am
  
  Richard – yeah, unfortunately Microsoft employees have repeatedly said at conferences during Q&A that the reliance on Windows clustering is never going to go away.
  
  Reply
  - Nic
    November 26, 2014 11:14 am
    
    If you take away the reliance on Windows Clustering you’ve basically got mirroring again.
    It’s at least a more robust quorum model than a witness server in mirroring, although it does come with added issues, like requirements for a domain, AD being a single point of failure etc.
    That’s why I preach that AGs can not be your only technology for business continuity and why you’ll have to pry log shipping from my cold dead hands.
    
    Reply
Nathan Jolly
November 26, 2014 12:00 am

I’d like to see containment offered the AG level as well as the database level, with (server-level, preferably) logins and jobs that are part of the AG.

I’d also love to see a better method of restoring a database to an AG. On a clustered SQL server, doing a RESTORE WITH REPLACE takes a single line of T-SQL and can hardly go wrong. An AlwaysOn server required a dozen lines of PowerShell and can very easily result in a broken AG.

Reply
- Brent Ozar
  November 26, 2014 6:25 am
  
  Nathan – great points, I agree!
  
  Reply
Pini Krisher
November 26, 2014 3:49 am

Hi
When the secondary is disconnected from the Master in term of networking (the secondary is geo replicated), the the DB in the Secondary become ‘Recovery Pending’.
i want it to be online all time
pini

Reply
- Brent Ozar
  November 26, 2014 6:24 am
  
  Pini – that’s already fixed in 2014. (By “online” I assume you mean readable.)
  
  Reply
Nic
November 26, 2014 11:17 am

One thing that nobody seems to have mentioned yet, and maybe they just aren’t encountering, is scale.

Right now AGs do not scale with high transaction throughput and high numbers of replicas. This is a huge issue that only kills you when you hit it (and it’s not something you want to hit). I’ve seen a million ms of sync_commit waits in a second with AGs, for machines with single millisecond network latency on flash storage.

Reply
captrench
November 27, 2014 7:32 am

The Log_Send_Rate column in the sys.dm_hadr_database_replica_states is incorrectly defined as the “Rate at which log records are being sent to the secondary databases, in kilobytes (KB)/second”.

It should actually read as “the rate at which log records are cleared from the send queue”.

Log records can only be cleared from this queue when they have already been hardened on all secondary’s, and that can only happen when they have already been sent, irrespective of how long it took those records to arrive, and how long it took them to be hardened, and how long it took for the secondary to send the acks back to the primary.

That’s a very different definition from “Rate at which log records are being sent to the secondary databases”. Cosmetically, they are easily confused when your primary and secondary are on the same subnet/datacentre with multiple gigabit lan connections. But the inaccuracy gets easier to notice with multi regional/cross continental/subnet setups, where your latency between regions is never going to equate or relate to the speed at which a local in memory queue can be cleared (the send_queue).

Reply
Sam Jones
December 3, 2014 11:06 pm

(I work at a Gold ISV partner and have worked with SQL Server since 4.21 on NT 3.whatever)

Issue: Setting up db replicas in SQL Server (mirroring, log shipping,
always on) is simply too hard and is no longer state of the art (from an ease of setup/admin/monitoring perspective).

Background: SQL Replication options:

Mirroring: Awesome. The only way to fly is high performance mirroring, but that is now deprecated
(Yes, we are now on SQL 2014)

Always On: Awesome, but so complex, expensive and so dependent on Win
OS stuff that I will never be able to actually do it in my lifetime (I
am 47). More on this below.

Log Shipping: Works. I have used it extensively. Hassle to set up. Too
many moving parts. Too kludgy to monitor.

(RE: I am a part part time DBA and always will be.)

** What is Missing in SQL/Always On: Replication as easy as mongodb: **

I want it to be as easy to set up log shipping / async mirror as it is
in mongodb ( http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ )

(pseudeo code commands to create a replica on mongodb…. I have done
it, and wow, it makes me say “why can’t I do this in SQL Server?):

Steps:
1) Install SQL Server (any edition) on ip: BOX1 // master
2) Install SQL Server (any edition) on ip: BOX2 // read replica
3) on box1, run commands:

replica.AddTarget BOX2
replica.CheckTarget BOX2, myhugedatabase // do some testing that
myhugedatabase will fit on BOX2, that there is connectivity, etc.
replica.StartReplica BOX2, myhugedatabase // make it happen
replica.ShowStatus // tell me what is going on

I can do the above on mongodb. It works. It is like magic.

Really need it on SQL Server….

In summary:

1) Make mirror setup as brainless to set up as mongodb and redis
2) Make it easy to monitor
3) Make it work with minimal OS dependence. (My servers are stand alone, not in a domain. One is a vmware VM in one DC, hot backup is in the same rack, a VM in a domain, but I do not admin the domain at all; off site server is an Amazon AWS instance, on site replica in our office is a vmware vm)

Example: Redis is simple to set up one to many replication (for read only replicas). It is so so easy. We can have backup instances in the rack and in AWS set up in minutes (no kidding, it is amazing). SQL Server should be just as good.

Long Time SQL dude
SQL Server dba/project lead/ops lead/currently managing a
600gb SQL 2008/2014 app behind a big .NET app

Reply
Howard
December 30, 2014 3:42 pm

I would like to see three targets instead of two for automatic failover of AG. Local data center failover between nodes when possible but if the data center is down, failover to the DR data center. Currently I have HA through FCI in data center 1 and manual failover through AG to data center 2 for DR.

Reply
- Brent Ozar
  December 31, 2014 8:00 am
  
  Howard – your bigger concern there will be automatically failing over outside of the FCI. I don’t see that happening anytime soon. (Kevin Farlee was asked about this at the Intersection conference and he pointed out there’s just not a way to do that.)
  
  Reply
  - Howard
    January 5, 2015 5:02 pm
    
    I would be happy to remove FCI from the configuration and do it with pure AG but currently I can only have two automatic failover replicas even though I can have three synchronous replicas. I’m hoping in the future for three auto failover replicas with sameSubnet preferred over crossSubnet.
    
    Reply
Marvin Magdaluyo
January 16, 2015 2:58 pm

BTW, TDE will be used to encrypt all databases in DR Node “node 3”

Again, thanks!

/Marvin

Reply
Marvin Magdaluyo
January 16, 2015 3:09 pm

Hi Brent,

Hope you’re having a good day!

Just want to ask some query. Recently, I setup an AGs with 3 nodes, the 2 nodes “synchronous” are on the same domain “domain 1″, and the 3rd node”asynch — DR node” is on different domain “domain 2”. I have a new requirement, this is to encrypt all databases using TDE on 3rd node “DR Node only”, will this be possible?

I’m assuming that it is possible but I want to ask you’re expert opinion and what will be the impact of this setup?

Thanks in advance!

Regards,
Marvin

Reply
- Brent Ozar
  January 17, 2015 2:17 pm
  
  Marvin – as much as I’d love to do personal support here on the blog comments, I can’t really do that. Your best bet will be to post questions on Q&A sites like http://dba.stackexchange.com. Thanks!
  
  Reply
Rob Nicholson
February 8, 2015 8:33 pm

Just a really small change. I would like an easier way (through sys.databases) to identify whether a secondary database is readable/accessible. Currently these databases are reported as ONLINE in the state_desc. Though this is technically correct a lot of 3rd party tools rely on ONLINE meaning ‘accessible’, when the tools proceed with running a query the SQL error log is flooded with “an availability group and is currently not accessible for queries.” messages for each database.

The argument against this enchantment will most likely be “update your scripts and 3rd party tools to cater for the new AG dmvs”. My reply to this is, it is not practical for the future. Getting vendors to update their tools can at times become a bit of a b.(ch. There will most likely be other features introduced (beyond AGs) that will require other dmvs, the custom scripts/tools would need to be updated every time (not that this happens too frequently anyway). Having a consistent column to determine whether a db is accessible would be handy 🙂

Connect Item:
https://connect.microsoft.com/SQLServer/feedback/details/903367

Reply
Andy @AMtwo
April 14, 2015 2:18 pm

For me, there are four things that easily top the wish list.:

1) Easy out-of-the-box automation for keeping Logins (and server-level permissions) in sync across nodes. Building homemade automation to do this, or manually keeping changes in sync is error-prone.

2) Easy out-of-the-box automation for keeping Jobs in sync, and running certain jobs only on the primary replica. We’re using an MSX to keep the jobs in sync (so that’s not too bad), and there are easy work arounds to make jobs run in the right place–but they are work-arounds. Having this be easier to manage would be…well…easier to manage.

3) More robust GUI for adding databases or nodes to an AG.
* If a regularly-scheduled log backup occurs while the GUI is adding a DB to the AG, the addition fails because the GUI doesn’t know about the other log backup.
* For a large DB, the backup/restore uses only a single backup file–backing up to multiple streams can make the process more efficient.
* Restoring the DB to the additional replicas using the existing Full/Log backups (ie, by looking up backup history in MSDB)

4) Having backup history in MSDB be aware of backups taken on a secondary replica. In the simplest case, you might need to hunt for the node performing backups to find the backup history.
In a more complicated case, it’s not difficult (especially during patching or rolling restarts) to have the backup preferences rotate between servers, and have your backup history spread across multiple servers. In order to generate a point-in-time restore script, you need to gather the backup history from multiple servers.

Reply
Chuck
April 6, 2016 8:26 am

Don’t take databases offline on the primary server in a cluster where all secondary replicas are asynchronous and the cluster loses quorum or connectivity breaks between the nodes.

Reply
Mark
June 23, 2016 11:35 pm

I think virtually all improvement areas have been covered off in this list, with some making there way in to the SQL 2016 RTM release. However I’d like to add the ability to add the distributor system db in to an AG. Presently the only way to provide HA for the single-point-of-failure distributor db is via FCI, which requires shared storage to be synched between data centers when you have more than 1 site. Yes there are 3rd party products that do this (from the storage vendors themselves through to products like Zerto and Veeam, all at extra cost) but it’d be nice to let AG do the replication work as it does today with user db’s (so I can maximize my significant investment in AG).

Reply
ManoShankar
July 22, 2016 5:59 am

Hi,

As per the business requirement, i need to copy datas from AlwaysOn secondary replica AG database to non AG database through agent job. By default agent account cannot use the parameter “ApplicationIntent=Readonly”. Anyone please advice how to read the datas from secondary replica.

Thanks,
ManoShankar

Reply
Cody
February 12, 2017 9:36 pm

SqlClient and other connection libraries should support retrieving multiple read-only replicas and selecting the fastest one like MultiSubnetFailover=True would for multiple IPs.

This is so that you can specify one connection string for applications which are load balanced across regions, and have regionally located AG replicas.

The example scenario is you have AGs in Australia for HA and DR. There are applications writing in Australia. However, you also need to read a lot of the same data in America so you set up an async replica there.

Now you have a problem. If applications are using read-only routing then they may start querying across the LAN. The alternative is that you can use the listener in Australia but a direct async server name in America but then you have no HA capability in America.

Almost all of the background information is there for SqlClient to hook all this functionality up for geo-located read-only replica routing, but, it doesn’t get done.

Reply
Ian
February 2, 2021 5:44 am

I’m suprised not to see this one on the list. If a CREATE DATABASE operation takes place via the SQL Listener, automatically create the database on all nodes and add it to the AG. If done in one transaction, it will leave it ready to replicate any subsequent operations.

Reply
- Chuck
  February 2, 2021 5:57 am
  
  I’d rather see it as an option on the CREATE DATABASE statement. Maybe add an availability group clause. But then again with automatic seeding how difficult is it to run “alter availability group…add database…”?
  
  Reply
  - Ian
    February 2, 2021 6:04 am
    
    I am working with a Sharepoint environment in which databases are constantly added and remove by Sharepoint at the users’ whim. In such a dynamic environment it is hard to keep up with all the changes. We find automatic seeding flakey. And I’ve not been able to make any of the various AG add scripts work reliably.
    
    It just strikes me that if SQL Server were to do the job for you behind the scenes, it would be so much better, rather than rely on manual intervention or home brew solutions.
    
    Reply
Ronald van Rijn
April 1, 2022 2:06 pm

We ‘re setting up various 4 cluster nodes with 1 synced readable copy that we would like to failover to last as it is the readable copy. So, we ‘ve set the preferred node list, but during failover, the AG software decides (resets) the preferred node order. What is the point in being able to set this (set-clusterownernode), if it is reset by the AG ? I know about the workaround, but we do not build this ourselves. It is ordered and preinstalled and configured. If we have the oppertunity to set the order, this should be leading.

Reply
- Ronald van Rijn
  April 1, 2022 2:07 pm
  
  eh, 4 node clusters
  
  Reply