Maybe You Shouldn’t Even Be Using Clustering or AGs.
18 Comments
Sandra Delany (LinkedIn) wrote a well-thought-out blog post called, “Should a SQL Server DBA Know Windows Clustering?” She’s got about 20 years of DBA experience, and she works for Straight Path (a firm I respect) as a consultant. You can probably guess based on her background that yes, she believes you should know how to set up, configure, and troubleshoot Windows clustering. It’s a good post, and you should read it.
But… I don’t agree.
Two things. First, I have a problem with any blog post (even my own) that say, “If you call yourself an X, you should definitely know Y.” The term “DBA” encompasses a huge variety of jobs, held by people with a huge variety of seniority levels. If someone’s in their first year in a DBA job, maybe even their fifth, I don’t necessarily expect them to know Windows clustering.
For example, I once (briefly) worked at a global company where the DBAs weren’t even given permissions to glance at the cluster. If the SQL Server service wouldn’t start, the DBAs had to transfer the issue to the Windows team, who handled clustering. The company’s logic was that clustering is a foundational part of Windows, and it doesn’t really have anything to do with SQL Server – and in fact, is reused by other cluster-savvy application. Clustering troubleshooting is a giant pain in the ass that involves Windows logs, DNS, IP addresses, etc, all things that DBAs aren’t good at – but the Windows team is (or at least should be.)
Which brings me to another blog post…
Chrissy LeMaire (LinkedIn – Bluesky), the creator of the powerful and popular DBAtools PowerShell stuff, wrote a solid blog post called Have You Considered Not Using SQL Server High Availability? You should read that post too.
A lot of you are full time database administrators, and you’re already taking a deep breath in anticipation of yelling back at the screen, but hang on a second.
First, your SQL Servers still all need disaster recovery. That’s different. When we say disaster recovery, we’re usually talking about things like native SQL Server backups, log shipping, storage replication, etc. These are techniques that you can use to rebuild the SQL Server in a different place, like after a ransomware attack or a natural disaster. Nobody’s suggesting you get rid of that.
We’re specifically talking HA features here: failover clustered instances (FCIs), Always On Availability Groups, and database mirroring. Chrissy writes about the operational challenge of those features.
HA features require 2 kinds of labor from 4 kinds of people. Chrissy’s post points out that the teams who manage the databases, the storage, Active Directory/DNS, and networking all have to get involved. I’d add that it requires 2 kinds of labor from all of these people: both planned, and unplanned/chaotic. When there’s a production database outage, there’s a lot of finger pointing, and management demands that everybody drop their work and jump into conference calls. Everybody starts stabbing at various switches and dials, groping blindly and wasting time, until things go back to normal – and the next server has its next emergency.
Small companies don’t have 4 kinds of people. They just have a core handful of IT people who do everything. They’re experts in how the entire stack is configured at this shop, but they’re not experts in all of the underlying technologies. When things go wrong, the work is usually single-threaded, dependent on the one person unlucky enough to be on call. That person is even more likely to stab at various switches and dials, making unplanned, chaotic changes that end up making the environment even less stable over time.
Virtualization provides pretty dang good HA for many failures, in many shops. Properly configured, it protects you from individual host hardware failures and single network cable problems. No, it doesn’t protect you from a bad Windows or SQL Server patch, but in small shops, they don’t do a lot of patching anyway. (Ease up on the outrage – I’ve seen your SQL ConstantCare® data, I know you’re several CUs behind, and I know you’ve muted that recommendation to patch.)
Virtualization HA is easier to manage. It’s just one technology that works to protect all of your VMs. That’s less learning that the overworked staff have to do, and besides, they have to learn it anyway to protect the rest of your servers. As long as they’re using it for everything else, they might as well lean on it to protect SQL Server as well.
So when clients are talking to me about easy ways they can improve their uptime, and they’re already running SQL Server in a VM, we take a step back and look at their virtualization high availability setup. If that’s working well, I explain the part about the net-new planned & unplanned work for all the different roles, and then I ask about their on-call rotations, and their plans to hire more staff in order to handle this net-new work.
If they haven’t been adding staff, and don’t plan to, then I’d rather have the staff focus on improving their ability to rapidly restore SQL Server backups, provision & configure new servers, and just generally automate their environment. That’ll come in handy more often, and help with both disaster recovery, ransomware rebuilding, recovering from “oops” delete queries, and just generally reacting to day to day issues.
In summary:
- If you don’t know how to troubleshoot DNS, file shares, Windows cluster validation, or PowerShell, then Chrissy’s blog post is right for you, and you should probably try virtualization for high availability.
- If your company’s HA/DR needs require Availability Groups and/or failover clustered instances, then Sandra’s blog post is right for you, and you probably have a big learning journey ahead.

Everybody in tech has private equity stories, some good, some terribad. Private equity hands money to companies not out of generosity, but because they believe they can turn it into even more money over time. The new PE owners want to pump up the company’s revenues, cut expenses, and raise profits as quickly as possible. That way, better numbers help them turn around and offer the company’s stock to the public, getting their money back out plus a nice profit.










It’s a really slippery slope, and it goes downhill fast.














