If you’re using SQL Server 2012′s hot new AlwaysOn Availability Groups feature, your databases will go offline when your network connection does – even if you’re using asynchronous replication.
This is not a bug. This is working as designed – and it’s important to understand the underlying concepts.
AlwaysOn Availability Group functionality relies on Windows Failover Clustering technology to know when things are going well, or when the poop has hit the fan. A core concept of failover clustering is quorum – the voting mechanism that lets each individual node understand whether it’s online or isolated. Windows Server 2008 has a variety of quorum methods, but in the vast majority of configurations, each server needs to be able to see the network in order to reserve its IP address, network name, and see other nodes in the cluster.
Take the following scenario:
- SQL2012PROD1 – primary active node. All read/write connections are going here.
- SQL2012PROD2 – secondary node with asynchronous replication. I could let users query this server, or just let it run in standby – it doesn’t matter for this scenario.
If I disconnect the network connection for SQL2012PROD1 – even for a brief moment – all of the databases in my availability groups roll back all open transactions and then go offline. The informational messages in the SQL Server event log are shown at right for humor purposes.
The SQL Server itself is still up – but the databases in the availability group aren’t because they have a dependency on the availability group’s listener and IP address. Since those aren’t available without a valid network connection, the databases are taken offline.
This is a dramatic departure from database mirroring or replication on a standalone (not clustered) database server. Both of those technologies leave the primary SQL Server’s databases up and running when the network drops.
Bottom line – when deploying AlwaysOn Availability Groups, make sure you understand the risks of everything presented in the cluster validation wizard. Some of the alerts (like some storage alerts) can be skipped for shared-nothing AlwaysOn Availability Group deployments, but others (like the network redundancy alert) definitely can’t. Just a general tip: mission-critical SQL Servers should be connected to two separate network switches, as should all other mission-critical servers. That way when one network switch fails, the mission-critical servers can still all talk to each other and nothing will go down.