When Always On Isn’t: Handling Outages in Your Application

Last Updated February 9, 2017

Today’s brief Stack Overflow outage reminded me of something I’ve always wanted to blog about:

There’s a gray bar across the top that says, “This site is currently in read-only mode; we’ll return with full functionality soon.”

That’s not a hidden feature of Always On Availability Groups. Rather, it’s a hidden feature of really dedicated developers whose application:

Tries to connect to the SQL Server, and if that fails, shows a total-site-down page
Checks to see if the database is writeable via DMVs like sys.dm_hadr_availability_group_states and sys.dm_hadr_availability_replica_states
If it’s not writeable, shows a polite banner across the top of the site, and still gracefully runs readable queries only

This is no small feat. For example, if you’re calling stored procedures to render a page, those stored procedures cannot do any writes. You can’t log page views inside a stored proc, or log execution times. You have to assume that the database may not be writeable. (That’s great practices anyway, since you can then use replicas to scale out all page rendering.)

For bonus points, your application can still do logging and writes – but just not to SQL Server. You might cache writes temporarily to a NoSQL store.

This is all left as an exercise for the reader. Always On Availability Groups give you a limp-home capability for your web site, but it’s up to you to make your web site as functional as possible when the database is degraded.

Greg
January 24, 2017 10:56 pm

Is the replica for Stack Overflow is running in async replication (or in sync without auto-failover) and therefore can’t failover the writes? I’m trying to work out how they have a read-only node available but no writable node.

Reply
- Brent Ozar
  January 25, 2017 6:34 am
  
  Greg – yeah, all async replicas. You can learn more about that here:
  
  https://groupby.org/2016/12/keep-it-simple-make-it-fast-the-sql-server-strategy-at-stack-overflow/
  http://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/
  
  Reply
Alex Friedman
January 25, 2017 4:12 am

Did they write their apps this way from the start? Or did they at some point stop and refactor everything? It really is impressive.

Reply
- Brent Ozar
  January 25, 2017 6:36 am
  
  Alex – when Stack Overflow was first built, it was back on SQL 2008. We didn’t have high availability back then – we used just a single SQL Server, with another server restored from backups nightly.
  
  You can see a blast from the past about that infrastructure here:
  https://stackoverflow.blog/2009/01/new-stack-overflow-server-glamour-shots/
  
  Reply
  - Alex Friedman
    January 26, 2017 7:58 am
    
    Oh yeah, now I remember reading that post way back. It makes it all the more impressive, stopping the business and saying — hey, we’re not developing new features, we’re refactoring our apps so we can do this thing. The business must see the availability and scaling as a top priority feature, and that’s not trivial at all.
    
    Reply
wqw
January 26, 2017 11:04 am

I’m curious what does DATABASEPROPERTYEX(DB_NAME(), ‘Updateability’) returns for these read-only replicas — hope it’s not READ_WRITE.

Reply