Amazon Relational Database Service (RDS) now supports multi-availability-zone SQL Servers. This means you can have a primary database in one data center, and a secondary replica in another data center.
When your primary server goes down, OR when the entire AZ goes down, you can fail over to the secondary with zero data loss and keep right on going. When the primary comes back online, you can fail right back, again with no data loss.
Amazon uses replication for their MySQL, Oracle, and PostgreSQL multi-AZ feature, but they chose a different database technology when it came time for SQL Server. They’re using database mirroring, and there’s a few things you need to know before you rely on it.
Technically, database mirroring is deprecated. This means support will be removed in a future version of SQL Server. I’m here to say you shouldn’t care – if it meets your needs today, great. Keep right on using it, because it’s still around in SQL Server 2014, and that’s going to be supported for at least a decade. If the next version of SQL Server actually did remove mirroring support, you’d still have years to make Plan B.
Cross-database transactions are not supported. Distributed transactions and multi-db commits mean your application changes data in multiple databases inside a single transaction. Books Online explains the gotcha. Same thing here – if you’re already using Amazon RDS for SQL Server, and you’re doing cross-database transactions, and you don’t have a Plan B yet, you could start using multi-AZ RDS just to get yourself the beginnings of a safety net. However, you need to start working on
Plan B Plan C.
Amazon’s implementation picks safety over performance. They’re using synchronous mirroring, which means whenever you do a delete/update/insert:
- The data is written to the primary’s log file
- The data is sent across the network wire to the secondary
- The data is written to the secondary’s log file
- The confirmation is sent back to the primary
- The transaction is considered committed, and the app is informed
This is going to add latency to your existing application because steps 2-4 aren’t being done today. If you’re worried about performance, this means you have to pay particular attention to each of those steps.
SQL Server also offers asynchronous mirroring where steps 2 through 4 are done in the background without impacting throughput. However, you don’t get automatic failover with async – after all, you wouldn’t want to automatically fail over to a replica that has less data than your primary. Also, async mirroring is an Enterprise Edition feature, and that’s only available in RDS if you bring your own licensing.
Mirroring protects you from some storage corruption. When the primary reads a page off disk and finds corruption, it requests a clean copy of the page from the mirror. This works because mirroring sends log transactions, not data file pages, over to the replica. The secondary responds (hopefully) with a clean copy of the page, and SQL Server fixes the corruption with nary an error message or outage. It’s called automatic page repair, and it gives you another measure of insurance for database reliability.
And finally, mirroring is actually pretty cool for this. You’re probably using Amazon RDS to reduce your systems administration overhead. You want to get more availability and performance without having to run your own SQL Server, and that means you probably don’t have your own full time DBA. Database mirroring meets those needs pretty well, and I can see why Amazon picked it. You’re definitely going to see slower transactional throughput – but hey, that’s where provisioned IOPs comes in. The faster you want to go, the more quarters you put in the front.
I wouldn’t necessarily recommend database mirroring if you were going to manage this infrastructure yourself. Mirroring has some challenges around cryptic error messages and scaling to large numbers of databases, but neither of those are weaknesses in this particular implementation. But with Amazon managing it, and having a relatively small number of databases involved (less than 100), it makes perfect sense.