When people buy my virtualization training video, one of the followup questions I get most often via email is, “Can I build SQL Server clusters in VMware and Hyper-V?”
In theory, yes. Microsoft’s knowledge base article on SQL Server virtualization support says they’ll support you as long as you’re using configurations listed in the Server Virtualization Validation Program (SVVP).
But the real question for me isn’t whether or not Microsoft supports virtual SQL Server clusters.
The question is about whether you can support it.
Us geeks usually implement clusters because the business wants higher availability. Higher availability means faster troubleshooting when the system is down. We need to be able to get the system back up and running as quickly as possible. Getting there usually means reducing the amount of complexity; complex systems take longer to troubleshoot.

If this is how your SAN team, VMware team, and DBA team hang out, you’re good with virtual clusters.
Adding virtualization (which also means shared storage) makes things much tougher to troubleshoot. If the business wants a highly available SQL Server, ask yourself these questions before virtualizing a SQL Server cluster:
- Do you have a great relationship between the SQL Server, storage, and network teams?
- Do all of the teams have read-only access to each others’ tools to speed up troubleshooting?
- Do all of the teams have access to the on-call list for all other teams, and feel comfortable calling them?
- Do you have a well-practiced, well-documented troubleshooting checklist for SQL Server outages?
- Does your company have a good change control process to avoid surprises?
- Do you have an identical environment to test configuration changes and patches before going live?
If the answer to any of those questions is no, consider honing your processes before adding complexity.
But the Business is Making Me Do It!
They’re making you do it because you haven’t clearly laid out your concerns about the business risk. Show the business managers this same list of questions. Talk to them about what each answer means for the business. Was there a recent outage with a lot of finger-pointing between teams? Bring that up, and remind the business about how painful that troubleshooting session was. Things will only get worse under virtualization.
To really drive the point home, I like whiteboarding out the troubleshooting process for a physical cluster versus a virtual cluster. Show all of the parts involved in the infrastructure, and designate which teams own which parts. Every additional team involved means longer troubleshooting time.
Once the business signs off on that increased risk, then everyone’s on the same page. They’re comfortable with the additional risk you’re taking, and you’re comfortable that you’re not to blame when things go wrong. And when they do go wrong – and they will – do a post-mortem meeting explaining the outage and the time spent on troubleshooting. If the finger-pointing between the app team, SQL Server DBAs, network admins, virtualization admins, and sysadmins was a problem, document it and share it (in a friendly way) with management. They might change their mind when it’s time to deploy the next SQL Server cluster.

Ayman El-Ghazali September 27, 2012 | 11:02 am
I’ve implemented a cross training program between the DB team and the Ops team. I’ve found it very useful to explain to the Ops team how DBs, logs, DB security, etc work. They’ve also done some sessions on network security and other stuff. I’m hoping to get to Clustering and VMWare soon so we can all be on the same page with many of the technologies that both of our teams are involved with. We’ve managed to get a good DR process using DB Mirroring to a remote location working well; both teams worked very well on this effort.
I want to highlight Brent’s first question “Do you have a great relationship between the SQL Server, storage, and network teams?” Without a doubt, this has been the most important part of breaking the Silos and working towards a successful solution that works out for everyone.
After all DBAs are good at relations since we work with relational data all the time, right?
Andy Galbraith (@DBA_ANDY) September 27, 2012 | 11:29 am
We spend a lot of time talking about how the technology *can* and *does* work, but this underlines the point that many people miss – with bad (or no) processes in your infrastructure group, it doesn’t matter what the technology can do – you won’t be able to support it effectively.
Thanks Brent!
Robin H. September 28, 2012 | 11:58 am
Good points. The relationships between people and teams is often overlooked when implementing technology. Having the knowledge and a wallet doesn’t always mean I should buy into something.
This summer I wrestled with the idea of virtualizing SQL Server in a farm vs. Clustering SQL Server on Virtual Servers. I considered both as High Availability options. The bottom line as I saw it came down to this: if the virtual host dies, your down time is the time it takes for your VM to start on another host; if your clustering node fails, your down time is the length it takes for SQL Server to start on the secondary server.
The relationships are something to ponder when evaluating the difference between the two. How important is that extra minute or two and much effort do I invest in this relationship to make that work in my environment?
Thanks for the post Brent.
Chuck Rummel October 2, 2012 | 7:13 pm
You’re bottom line is exactly what a network admin and I tried to help a technical manager understand when they called a meeting yesterday to discuss how to soon they could set up a sql cluster on a db that was already virtualized. They still want some additional HA beyond VM reboot because they think those few minutes of downtime are still too much, but at least I think we helped them see the other options.
David Eaton September 28, 2012 | 12:49 pm
What I think is funny, is that clustering isn’t really needed in the Virtual environment. VMWare has the VMotion technology that will move the server if needed. Hyper-V is no where near as elegant but has a brute force way to do hte same thing.
When you have these in place, clustering only causes more issues.
One time I was contracting for a public entity that had virtualized and clustered the servers and there was nothing but problems that lead to the complete de-virtualization of the SQL server infrastructure.
Respectfully,
David
Brent Ozar September 28, 2012 | 12:56 pm
David – so how do you do SQL Server service packs in your environments?
Ayman El-Ghazali September 28, 2012 | 12:59 pm
Aww man, Brent beat me to it.
We were going that route at my last job and I argued for clustering since we only have 5 days of downtime a year (our production servers were 24/7). Clustering would help with Windows and SQL Patching.
David K February 19, 2013 | 2:19 pm
Good points, I was wondering the same thing, why would anyone consider a cluster if you’re SQL box was a vm that could be vmotioned? However, I did not consider Windows/SQL updates. What about a hybrid solution for those that already have a vm environment. A physical SQL Server clustered with a virtual server?
Brent Ozar February 19, 2013 | 2:21 pm
David – well, keeping in mind the points mentioned in the posts, how do you think the experience would be? Would a mixed physical & virtual cluster be easy to troubleshoot?
Allen McGuire October 2, 2012 | 7:39 am
I agree – I’m not a fan of this approach at all. Not too many DBA’s are experts in VMWare so would they really put their names and careers on the line for a technology that someone else supports – assuming you have a VMWare/VMotion expert in-house? I would not roll those dice personally, even if I did have a great relationship with that individual. And that’s only considering the unknown – as Brent noted, you know you have to apply patches. I could not implement a solution in a 24×7 environment knowing I’m going to be having planned downtime – that is counter-intuitive.
Luis Figueroa October 2, 2012 | 8:27 am
Great blog post. I have been in that situation and No, the DBA team did not have a great relationship with the Storage team or the virtualization team. The DBA team shared their tools but did not have any kind of access into the the other team’s tools. And the change management process was there but the risks were often dismissed as unlikely. Infrastructure had the green light to call the shots as to what would be virtualized in the enterprise and they decided all SQL Servers had to be virtualized. All I can say is DBA’s just could not do their jobs and the DBA’s kept quitting after only a few months. It is not always about how much can be saved with virtualization. Organizations need to be mature enough in order to support solutions that depend on so many different pieces owned by different teams.
Ron Sexton October 2, 2012 | 1:19 pm
At one point I was the VMware admin and the DBA. However thing change and I now no longer even have visibility into all the virtualization aspects and the ‘new’ VMware guy isn’t a SQL guy at all. I have my disks being changed, cpu assignments being discussed, overcommitment (cpu and memory and number of CPus) happening as people try to save money and look good. Now some users are complaining about performance. It really annoys when people are saving so much by virutalizing in the first place and then don’t want to spend the money to ensure good performance for virtualized SQL Server. The fact is that virtualization while great in theory (and can be in practice) does add a lot of complexity technically and also organizationally. I would avoid a virtual SQL cluster and would be carefull with virutalizing production due to these considerations.
Kendra Little October 2, 2012 | 1:21 pm
I like your point, Ron– and that’s totally true that what’s supportable one day may become harder down the line when duties get separated out and an organization grows!
Ron Sexton October 3, 2012 | 1:21 pm
Thanks!
Here is one scenario that occurred. Somehow the RDMs assigned to a virtual cluster for disks such as the quorum showed up as available to be assigned in VCenter for the VMware cluster. So naturally the VMware admin requested they be reclaimed by the storage team which they were. However they were actually still in use by the Microsoft Application cluster when they were somewhat rudely yanked awasy. Well the microsoft cluster wasn’t very happy with this. Forutanately it was not a Microsoft SQL Server cluster.
However since the Microsoft SQL Server cluster guy usually has the most experience with cluster issues I sometimes get asked to assist with such issues. The point is that this created a company wide service outage and it could easily have been a SQL Server cluster issue if I had virtualized one. If it was a physical cluster this would have been even more unlikely to have ever occurred as an issue in the first place.
If someone wants to virtualize a SQL Cluster I would recommend possibly devoting stand alone hosts for seperate cluster nodes in VMware just like you would have to in Hyper-V as this could help avoid issues.
David Eaton October 5, 2012 | 4:10 pm
I am now where you were at Ron. Hosts are way overloaded and perfomance is fading fast. And to top that off, data growth will be doubling in the next 90 days.
We have a new higer performance SAN in place, and new dedicated servers ordered for the SQL infrastucture. I just hope they arrive in time. And in this case we are sending the SQL servers back to physical machines and will use clustering for high availability.
One of things I have picked up over the years is there is a critical point where the data has out grown the infrastucture supporting it. And I am definitely there right now.
In our case
David Eaton October 5, 2012 | 4:11 pm
We are working hard to extend the critical point for at least a 5 fold increase in size.
Ron Sexton October 5, 2012 | 4:57 pm
Resource intensive SQL Servers are definitely a special use case for virtualization. Virtualization definitely has its advantages but you lose a lot of control over the resources supporting SQL. And I don’t see that getting any better. To guarantee any level of service the resources have to be tightly controled and devoted to the SQL Server. If its a small SQL Server not really doing that much then probably it could ‘play well with others’. The heavy use SQL Servers are trickier to manage. VMware/Microsoft does give guidelines/recommendations such as build your virtual SQL Server the same way you would your physical SQL Servers with dedicated resources and disks and also put in a reservation for all the memory assigned to the server. Split out the drives across all the virtual SCSI controllers (helps with multi-tasking). These rules aren’t usually followed by VMware admins. Also don’t have it competing with a lot of lower vCpu count virtual machines either. But over time the host will get over commited, the LUNs will get consolidated, and the memory reservations will get reversed by somoeone and they may even want to ‘standardize’ on one virtual SCSI controller per VM. They may even load a lot of high I/O VMs on one host. This creates a management headache for the DBA trying to manage performance on the SQL servers. Say the SQL server needs a lot of CPU for 2 hours a day to complete a load within a time frame. But someone says ‘vKernal says the CPU is largely unused so I am reducuing it’. Or maybe doesn’t even tell you. Suddenly your load may be taking longer.
Organizationally I don’t see how this situation can get better. Great benefits but management complexity and inefficiences are introduced.
I am not saying this is always going to be the case, but it is too easy for it to happen.
Some great benefits though. High availabilty (at the server level). Can easily migrate to more powerful hardware. Can easily add storage and increase CPU. Can easily change storage or even have it automatically use the appropriate storage as needed (SSD, FC, SATA..). DR is easy with replicated LUNs for the VMs or even VMware SRM for a more true DR solution. These are just some of the advantages. For many this can work out well. But sometimes it can become a real PITA.
Brian McElroy May 2, 2013 | 11:35 am
very helpful discussion