Availability Groups: More Planned Downtime for Less Unplanned Downtime

Last Updated September 21, 2017

Always On Availability Groups, Breaking News, SQL Server

I often hear companies say, “We can never ever go down, so we’d like to implement Always On Availability Groups.”

Let’s say on January 1, 2016, you rolled out a new Availability Group on SQL Server 2014. It’s the most current version available at the time, and you deploy Service Pack 1, Cumulative Update 4 (released 2015/12/22). You’re fully current, and it’s a stable engine from 2014 – how many more bugs can they find, right?

Here’s what your patching schedule would look like:

2016/02/22 – Cumulative Update 5 – corrupted columnstore indexes when AG fails over, stack dumps on AG secondaries.

2016/04/19 – Cumulative Update 6 – non-yielding schedulers during AG version cleanup, FileTables unavailable after AG failover, canceling a backup causes the server to crash (not related, but cringeworthy) – whew! This one has a lot of big fixes. We should definitely apply this.

2016/05/31 – OH SNAP! CU6 broke NOLOCK. Sure hope you didn’t apply that. Time to take another outage to apply the revised version.

2016/06/21 – Cumulative Update 7 – SQLDiag fails in AGs. You could probably skip this one if you don’t use SQLDiag, and most shops don’t.

2016/07/11 – Service Pack 2 – improved lease timeout to prevent outages, filestream directory not visible after a replica is restarted (wait I thought we fixed that in CU6? no wait that was FileTables), missing error numbers in XE.

2016/08/26 – Cumulative Update 1 – memory leak on AGs with change tracking, error 1478 when you add a database back into an AlwaysOn availability group (sic).

2016/10/18 – Cumulative Update 2 – no AG fixes, woohoo!

What do you mean there's only one engine? — What do you mean there’s only one engine?

That’s 5-7 patch outages in 11 months (and I’m not even listing all of the fixes in these, which include things like incorrect results bugs, plus awesome new DMV diagnostic features that you definitely want.)

Here’s the way I like to explain it to companies: if you have an airplane, it’s absolutely imperative that its engines not fail mid-flight. In order to accomplish that, you have to have regular downtime for mechanics to examine and replace parts – and that doesn’t happen up in the air. With Availability Groups, we’re lucky enough to be able to transfer our ~~passengers~~ databases from one airplane to another quickly – but we still have to have those other airplanes getting constant examinations and patches from mechanics.

Selective XML Indexes: Not Bad At All

Should I Install Multiple Instances of SQL Server?

10 Comments. Leave new

Brandon M.
December 7, 2016 9:12 am

Hi Brent,

Great post as always. I can barely handle being on an airplane when everything goes as planned. I can’t imagine hearing “uh… hi guys… this is your captain speaking… we uh… we’re having some problems up here and we… uh… well … do you see that other plane over there flying dangerously close to us? Well… we uh… we’re gonna have to get you all over there like.. like right away… I uh… Women and children first, I guess. Smoke if you got ’em.”

You sure have been posting a lot of pics of you in that Oracle jacket lately. How has Microsoft not revoked your MVP card?

Reply
- Brent Ozar
  December 7, 2016 3:34 pm
  
  Heh heh heh – I gave up my MVP card, actually. Nothing against the program, was just time for a change.
  
  Reply
Simon
December 7, 2016 11:14 am

I seem to be spending more and more time coming up with analogies to try and explain basic process concepts to people recently, but your airplane one there’s a thing of pure beauty. Have a drink.

Reply
Peggy
December 7, 2016 11:42 am

Liked the article . Good analogy.

Reply
Amanda
December 7, 2016 11:48 am

Brent,
Is it advisable to hold off AG setting as there seems so many issues?
Cluster environments can help us achieve always on, right?

By the way, always love to read your post!

Reply
- Brent Ozar
  December 7, 2016 3:35 pm
  
  Amanda – I would just generally advise folks to find the simplest solution that meets their RPO and RTO goals. Always On Availability Groups is a fantastic feature – you just have to be armed with the right people and processes to tackle it.
  
  Reply
Brian Knoblauch
December 7, 2016 12:14 pm

…and the more engines you have, the more likely you’ll have some kind of engine failure at some point.

Reply
Klaus Aschenbrenner
December 7, 2016 3:32 pm

Hello Brent,

Reminds me about my flight simulator: runs on Windows 7, and is *never ever* patched/updated. I don’t want to introduce *any* side effect through an update.

-Klaus

Reply
- Brent Ozar
  December 7, 2016 3:35 pm
  
  Klaus – makes sense! After all, if it works, that’s good enough! The OS is only there to provide services.
  
  Reply
Lance Gabreil Zamora Villacrusis
June 24, 2021 2:37 pm

I like your Analogy, thank you so much for simplifying it

Reply