We love to help people plan for disasters
We’re not pessimists, we’ve just seen one too many servers go belly up in the middle of the night to think that having only one is a good idea. When people ask us to help them, we have to take a lot into consideration. The first words out of the gate are almost always “we’ve been thinking about Availability Groups”, or some bastardized acronym thereof.
Don’t get me wrong, they’re a fine thing to think about, but the problem is that usually people’s only exposure to them, if they have any exposure to them outside of thinking about them, is just in setting them up.
Usually with VMs.
On their laptop.
With a few small test databases that they just created.
And they’re really easy to set up! Heck, even I can do it.
But this is the worst way to judge how well a solution fits your team’s abilities.
Everything’s easy when it’s working
When things get hard, and when most people figure out they’re in way over their heads, is when something goes wrong. Things always wait to go wrong until you’re in production. In other words, driving a car is a lot easier than fixing a car.
If you don’t have 2-3 people who are invested mainly in the health and well-being of your Availability Groups, practicing disaster scenarios, failing over, failing back, and everything in between, you’re going to really start hating the choice you made when something goes bump in the night.
And stuff goes wrong all the time. That’s why you wanted HA/DR in the first place, right?
Play to your strengths
If you’re a team of developers, a lone accidental DBA, or simply a few infrastructure folks who don’t spend their time reading KB articles on SQL and Windows patches, testing those patches in a staging environment, and then pushing your app workload on those patches, you’re going to have a tough time.
Things that are still great:
No, you don’t get all the readable replica glamour, and the databases failing over together glitz, but you also don’t get to find out you’re the first person to Google the error your Availability Group started throwing at 2am, shortly before you, your app, and all your users stopped being able to connect to it.
Try scaling up first
If part of your move to Availability Groups is Enterprise licensing, get some really fast CPUs, and enough RAM to cache all or most of your data. You may not need to offload the stuff that’s currently a headache.
Try some optimism
Optimistic isolation levels like RCSI and SI can help relieve some of the burden from large reporting queries running over your OLTP tables.
Get your script on
No, I don’t mean getting your baby mama’s name tattooed on your neck. I mean scripting out parts of failing over Mirroring or Log Shipping so that it’s not such a bleary-eyed, manual process. Availability Groups don’t keep agent jobs, users, and other custom settings synced from server to server, so you’re going to have to figure that part out anyway.
Still interested in coming up with a HA/DR solution that works for you? Drop us a line!
Thanks for reading!
Doug says: And remember folks, HA/DR solutions sometimes differ between on-premises and cloud. Make sure the cloud features you want to use are fully supported.