Today’s brief Stack Overflow outage reminded me of something I’ve always wanted to blog about:
There’s a gray bar across the top that says, “This site is currently in read-only mode; we’ll return with full functionality soon.”
That’s not a hidden feature of Always On Availability Groups. Rather, it’s a hidden feature of really dedicated developers whose application:
- Tries to connect to the SQL Server, and if that fails, shows a total-site-down page
- Checks to see if the database is writeable via DMVs like sys.dm_hadr_availability_group_states and sys.dm_hadr_availability_replica_states
- If it’s not writeable, shows a polite banner across the top of the site, and still gracefully runs readable queries only
This is no small feat. For example, if you’re calling stored procedures to render a page, those stored procedures cannot do any writes. You can’t log page views inside a stored proc, or log execution times. You have to assume that the database may not be writeable. (That’s great practices anyway, since you can then use replicas to scale out all page rendering.)
For bonus points, your application can still do logging and writes – but just not to SQL Server. You might cache writes temporarily to a NoSQL store.
This is all left as an exercise for the reader. Always On Availability Groups give you a limp-home capability for your web site, but it’s up to you to make your web site as functional as possible when the database is degraded.
Is the replica for Stack Overflow is running in async replication (or in sync without auto-failover) and therefore can’t failover the writes? I’m trying to work out how they have a read-only node available but no writable node.
Greg – yeah, all async replicas. You can learn more about that here:
Did they write their apps this way from the start? Or did they at some point stop and refactor everything? It really is impressive.
Alex – when Stack Overflow was first built, it was back on SQL 2008. We didn’t have high availability back then – we used just a single SQL Server, with another server restored from backups nightly.
You can see a blast from the past about that infrastructure here:
Oh yeah, now I remember reading that post way back. It makes it all the more impressive, stopping the business and saying — hey, we’re not developing new features, we’re refactoring our apps so we can do this thing. The business must see the availability and scaling as a top priority feature, and that’s not trivial at all.
I’m curious what does DATABASEPROPERTYEX(DB_NAME(), ‘Updateability’) returns for these read-only replicas — hope it’s not