When people buy my virtualization training video, one of the followup questions I get most often via email is, “Can I build SQL Server clusters in VMware and Hyper-V?”
In theory, yes. Microsoft’s knowledge base article on SQL Server virtualization support says they’ll support you as long as you’re using configurations listed in the Server Virtualization Validation Program (SVVP).
But the real question for me isn’t whether or not Microsoft supports virtual SQL Server clusters.
The question is about whether you can support it.
Us geeks usually implement clusters because the business wants higher availability. Higher availability means faster troubleshooting when the system is down. We need to be able to get the system back up and running as quickly as possible. Getting there usually means reducing the amount of complexity; complex systems take longer to troubleshoot.
Adding virtualization (which also means shared storage) makes things much tougher to troubleshoot. If the business wants a highly available SQL Server, ask yourself these questions before virtualizing a SQL Server cluster:
- Do you have a great relationship between the SQL Server, storage, and network teams?
- Do all of the teams have read-only access to each others’ tools to speed up troubleshooting?
- Do all of the teams have access to the on-call list for all other teams, and feel comfortable calling them?
- Do you have a well-practiced, well-documented troubleshooting checklist for SQL Server outages?
- Does your company have a good change control process to avoid surprises?
- Do you have an identical environment to test configuration changes and patches before going live?
If the answer to any of those questions is no, consider honing your processes before adding complexity.
But the Business is Making Me Do It!
They’re making you do it because you haven’t clearly laid out your concerns about the business risk. Show the business managers this same list of questions. Talk to them about what each answer means for the business. Was there a recent outage with a lot of finger-pointing between teams? Bring that up, and remind the business about how painful that troubleshooting session was. Things will only get worse under virtualization.
To really drive the point home, I like whiteboarding out the troubleshooting process for a physical cluster versus a virtual cluster. Show all of the parts involved in the infrastructure, and designate which teams own which parts. Every additional team involved means longer troubleshooting time.
Once the business signs off on that increased risk, then everyone’s on the same page. They’re comfortable with the additional risk you’re taking, and you’re comfortable that you’re not to blame when things go wrong. And when they do go wrong – and they will – do a post-mortem meeting explaining the outage and the time spent on troubleshooting. If the finger-pointing between the app team, SQL Server DBAs, network admins, virtualization admins, and sysadmins was a problem, document it and share it (in a friendly way) with management. They might change their mind when it’s time to deploy the next SQL Server cluster.