Why Your SQL Server Cluster Shouldn’t Be Virtualized

By Brent Ozar · September 27, 2012 · 66 comments

When people buy my virtualization training video, one of the followup questions I get most often via email is, “Can I build SQL Server clusters in VMware and Hyper-V?”

In theory, yes. Microsoft’s knowledge base article on SQL Server virtualization support says they’ll support you as long as you’re using configurations listed in the Server Virtualization Validation Program (SVVP).

But the real question for me isn’t whether or not Microsoft supports virtual SQL Server clusters.

The question is about whether you can support it.

Us geeks usually implement clusters because the business wants higher availability. Higher availability means faster troubleshooting when the system is down. We need to be able to get the system back up and running as quickly as possible. Getting there usually means reducing the amount of complexity; complex systems take longer to troubleshoot.

Campfire on Honeymoon Beach, Isla Danzante — If this is how your SAN team, VMware team, and DBA team hang out, you’re good with virtual clusters.

Adding virtualization (which also means shared storage) makes things much tougher to troubleshoot. If the business wants a highly available SQL Server, ask yourself these questions before virtualizing a SQL Server cluster:

Do you have a great relationship between the SQL Server, storage, and network teams?
Do all of the teams have read-only access to each others’ tools to speed up troubleshooting?
Do all of the teams have access to the on-call list for all other teams, and feel comfortable calling them?
Do you have a well-practiced, well-documented troubleshooting checklist for SQL Server outages?
Does your company have a good change control process to avoid surprises?
Do you have an identical environment to test configuration changes and patches before going live?

If the answer to any of those questions is no, consider honing your processes before adding complexity.

But the Business is Making Me Do It!

They’re making you do it because you haven’t clearly laid out your concerns about the business risk. Show the business managers this same list of questions. Talk to them about what each answer means for the business. Was there a recent outage with a lot of finger-pointing between teams? Bring that up, and remind the business about how painful that troubleshooting session was. Things will only get worse under virtualization.

To really drive the point home, I like whiteboarding out the troubleshooting process for a physical cluster versus a virtual cluster. Show all of the parts involved in the infrastructure, and designate which teams own which parts. Every additional team involved means longer troubleshooting time.

Once the business signs off on that increased risk, then everyone’s on the same page. They’re comfortable with the additional risk you’re taking, and you’re comfortable that you’re not to blame when things go wrong. And when they do go wrong – and they will – do a post-mortem meeting explaining the outage and the time spent on troubleshooting. If the finger-pointing between the app team, SQL Server DBAs, network admins, virtualization admins, and sysadmins was a problem, document it and share it (in a friendly way) with management. They might change their mind when it’s time to deploy the next SQL Server cluster.

More Microsoft SQL Server Clustering Resources

Whether you want help choosing between an active/passive and an active/active cluster, or if you’re the kind of DBA who knows that’s not even the right name for failover clustered instances anymore, check out our SQL Server clustering training page.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

66 comments

Ayman El-Ghazali

September 27, 2012 at 11:02 am

I’ve implemented a cross training program between the DB team and the Ops team. I’ve found it very useful to explain to the Ops team how DBs, logs, DB security, etc work. They’ve also done some sessions on network security and other stuff. I’m hoping to get to Clustering and VMWare soon so we can all be on the same page with many of the technologies that both of our teams are involved with. We’ve managed to get a good DR process using DB Mirroring to a remote location working well; both teams worked very well on this effort.
I want to highlight Brent’s first question “Do you have a great relationship between the SQL Server, storage, and network teams?” Without a doubt, this has been the most important part of breaking the Silos and working towards a successful solution that works out for everyone.

After all DBAs are good at relations since we work with relational data all the time, right? 🙂

Reply
Andy Galbraith (@DBA_ANDY)

September 27, 2012 at 11:29 am

We spend a lot of time talking about how the technology *can* and *does* work, but this underlines the point that many people miss – with bad (or no) processes in your infrastructure group, it doesn’t matter what the technology can do – you won’t be able to support it effectively.

Thanks Brent!

Reply
Robin H.

September 28, 2012 at 11:58 am

Good points. The relationships between people and teams is often overlooked when implementing technology. Having the knowledge and a wallet doesn’t always mean I should buy into something.

This summer I wrestled with the idea of virtualizing SQL Server in a farm vs. Clustering SQL Server on Virtual Servers. I considered both as High Availability options. The bottom line as I saw it came down to this: if the virtual host dies, your down time is the time it takes for your VM to start on another host; if your clustering node fails, your down time is the length it takes for SQL Server to start on the secondary server.

The relationships are something to ponder when evaluating the difference between the two. How important is that extra minute or two and much effort do I invest in this relationship to make that work in my environment?

Thanks for the post Brent.

Reply
1. Chuck Rummel
  
  October 2, 2012 at 7:13 pm
  
  You’re bottom line is exactly what a network admin and I tried to help a technical manager understand when they called a meeting yesterday to discuss how to soon they could set up a sql cluster on a db that was already virtualized. They still want some additional HA beyond VM reboot because they think those few minutes of downtime are still too much, but at least I think we helped them see the other options.
  
  Reply
David Eaton

September 28, 2012 at 12:49 pm

What I think is funny, is that clustering isn’t really needed in the Virtual environment. VMWare has the VMotion technology that will move the server if needed. Hyper-V is no where near as elegant but has a brute force way to do hte same thing.

When you have these in place, clustering only causes more issues.

One time I was contracting for a public entity that had virtualized and clustered the servers and there was nothing but problems that lead to the complete de-virtualization of the SQL server infrastructure.

Respectfully,

David

Reply
1. Brent Ozar
  
  September 28, 2012 at 12:56 pm
  
  David – so how do you do SQL Server service packs in your environments?
  
  Reply
  1. Ayman El-Ghazali
    
    September 28, 2012 at 12:59 pm
    
    Aww man, Brent beat me to it.
    We were going that route at my last job and I argued for clustering since we only have 5 days of downtime a year (our production servers were 24/7). Clustering would help with Windows and SQL Patching.
    
    Reply
    1. David K
      
      February 19, 2013 at 2:19 pm
      
      Good points, I was wondering the same thing, why would anyone consider a cluster if you’re SQL box was a vm that could be vmotioned? However, I did not consider Windows/SQL updates. What about a hybrid solution for those that already have a vm environment. A physical SQL Server clustered with a virtual server?
      
      Reply
      1. Brent Ozar
        
        February 19, 2013 at 2:21 pm
        
        David – well, keeping in mind the points mentioned in the posts, how do you think the experience would be? Would a mixed physical & virtual cluster be easy to troubleshoot?
  2. david craig
    
    July 29, 2013 at 6:57 am
    
    test on an exact replica of the db, then just take a snap shot before the update if it works correctly apply the snpshot
    
    Reply
    1. Brent Ozar
      
      July 29, 2013 at 7:37 am
      
      David – hmm, can you define a little bit more about your suggestion? What do you mean by an exact replica of the DB? Are you talking about database snapshots, VMware snapshots, or SAN snapshots?
      
      Reply
2. Allen McGuire
  
  October 2, 2012 at 7:39 am
  
  I agree – I’m not a fan of this approach at all. Not too many DBA’s are experts in VMWare so would they really put their names and careers on the line for a technology that someone else supports – assuming you have a VMWare/VMotion expert in-house? I would not roll those dice personally, even if I did have a great relationship with that individual. And that’s only considering the unknown – as Brent noted, you know you have to apply patches. I could not implement a solution in a 24×7 environment knowing I’m going to be having planned downtime – that is counter-intuitive.
  
  Reply
Luis Figueroa

October 2, 2012 at 8:27 am

Great blog post. I have been in that situation and No, the DBA team did not have a great relationship with the Storage team or the virtualization team. The DBA team shared their tools but did not have any kind of access into the the other team’s tools. And the change management process was there but the risks were often dismissed as unlikely. Infrastructure had the green light to call the shots as to what would be virtualized in the enterprise and they decided all SQL Servers had to be virtualized. All I can say is DBA’s just could not do their jobs and the DBA’s kept quitting after only a few months. It is not always about how much can be saved with virtualization. Organizations need to be mature enough in order to support solutions that depend on so many different pieces owned by different teams.

Reply
Ron Sexton

October 2, 2012 at 1:19 pm

At one point I was the VMware admin and the DBA. However thing change and I now no longer even have visibility into all the virtualization aspects and the ‘new’ VMware guy isn’t a SQL guy at all. I have my disks being changed, cpu assignments being discussed, overcommitment (cpu and memory and number of CPus) happening as people try to save money and look good. Now some users are complaining about performance. It really annoys when people are saving so much by virutalizing in the first place and then don’t want to spend the money to ensure good performance for virtualized SQL Server. The fact is that virtualization while great in theory (and can be in practice) does add a lot of complexity technically and also organizationally. I would avoid a virtual SQL cluster and would be carefull with virutalizing production due to these considerations.

Reply
1. Kendra Little
  
  October 2, 2012 at 1:21 pm
  
  I like your point, Ron– and that’s totally true that what’s supportable one day may become harder down the line when duties get separated out and an organization grows!
  
  Reply
Ron Sexton

October 3, 2012 at 1:21 pm

Thanks!

Here is one scenario that occurred. Somehow the RDMs assigned to a virtual cluster for disks such as the quorum showed up as available to be assigned in VCenter for the VMware cluster. So naturally the VMware admin requested they be reclaimed by the storage team which they were. However they were actually still in use by the Microsoft Application cluster when they were somewhat rudely yanked awasy. Well the microsoft cluster wasn’t very happy with this. Forutanately it was not a Microsoft SQL Server cluster. 🙂 However since the Microsoft SQL Server cluster guy usually has the most experience with cluster issues I sometimes get asked to assist with such issues. The point is that this created a company wide service outage and it could easily have been a SQL Server cluster issue if I had virtualized one. If it was a physical cluster this would have been even more unlikely to have ever occurred as an issue in the first place.
If someone wants to virtualize a SQL Cluster I would recommend possibly devoting stand alone hosts for seperate cluster nodes in VMware just like you would have to in Hyper-V as this could help avoid issues.

Reply
David Eaton

October 5, 2012 at 4:10 pm

I am now where you were at Ron. Hosts are way overloaded and perfomance is fading fast. And to top that off, data growth will be doubling in the next 90 days.

We have a new higer performance SAN in place, and new dedicated servers ordered for the SQL infrastucture. I just hope they arrive in time. And in this case we are sending the SQL servers back to physical machines and will use clustering for high availability.

One of things I have picked up over the years is there is a critical point where the data has out grown the infrastucture supporting it. And I am definitely there right now.

In our case

Reply
David Eaton

October 5, 2012 at 4:11 pm

We are working hard to extend the critical point for at least a 5 fold increase in size.

Reply
Ron Sexton

October 5, 2012 at 4:57 pm

Resource intensive SQL Servers are definitely a special use case for virtualization. Virtualization definitely has its advantages but you lose a lot of control over the resources supporting SQL. And I don’t see that getting any better. To guarantee any level of service the resources have to be tightly controled and devoted to the SQL Server. If its a small SQL Server not really doing that much then probably it could ‘play well with others’. The heavy use SQL Servers are trickier to manage. VMware/Microsoft does give guidelines/recommendations such as build your virtual SQL Server the same way you would your physical SQL Servers with dedicated resources and disks and also put in a reservation for all the memory assigned to the server. Split out the drives across all the virtual SCSI controllers (helps with multi-tasking). These rules aren’t usually followed by VMware admins. Also don’t have it competing with a lot of lower vCpu count virtual machines either. But over time the host will get over commited, the LUNs will get consolidated, and the memory reservations will get reversed by somoeone and they may even want to ‘standardize’ on one virtual SCSI controller per VM. They may even load a lot of high I/O VMs on one host. This creates a management headache for the DBA trying to manage performance on the SQL servers. Say the SQL server needs a lot of CPU for 2 hours a day to complete a load within a time frame. But someone says ‘vKernal says the CPU is largely unused so I am reducuing it’. Or maybe doesn’t even tell you. Suddenly your load may be taking longer.
Organizationally I don’t see how this situation can get better. Great benefits but management complexity and inefficiences are introduced.
I am not saying this is always going to be the case, but it is too easy for it to happen.
Some great benefits though. High availabilty (at the server level). Can easily migrate to more powerful hardware. Can easily add storage and increase CPU. Can easily change storage or even have it automatically use the appropriate storage as needed (SSD, FC, SATA..). DR is easy with replicated LUNs for the VMs or even VMware SRM for a more true DR solution. These are just some of the advantages. For many this can work out well. But sometimes it can become a real PITA. 🙂

Reply
Brian McElroy

May 2, 2013 at 11:35 am

very helpful discussion

Reply
Michael Webster

June 3, 2013 at 7:02 pm

This seems to be a case of avoiding something that is beneficial because processes are broken. Should we not be looking to address the process gaps as they will fundamentally break any environment regardless of technology choices. There doesn’t have to be increases in risk just because you virtualize SQL Clusters if you approach the management and operations in a disciplined and methodical way as you should approach the design and implementation. It is the approach, design and process that is important when supporting any large SQL database regardless if it is virtual of physical. If the VMware Admins simply throw the database into an already overloaded cluster and everyone expects it to work you’re all kidding yourselves and you should get some new admins. This is not the way to treat business critical applications. The production environments shouldn’t be getting into an overcommitted state where performance suffers in the first place and again this points back to point 1 regarding broken processes. Broken processes and lack of discipline can make any technology look bad. I’ve designed, implemented and seen many an environment that is well run and where clustered SQL databases work great. But the dependant teams have a good working relationships and the processes aren’t broken. I’ve seen just as many environments where this isn’t the case. An investment in good inter team relationships and good processes will pay you back in spades, regardless if you are virtualizing or not.

Reply
1. Brent Ozar
  
  June 3, 2013 at 7:03 pm
  
  Michael – yep, we both agree. If you do make the investment in good inter-team relationships and processes, then you can leverage that investment to do cool stuff like virtual clusters. However, if you try to run before you learn to walk, you’ll end up in a body cast. 😉 The investment has to come first.
  
  Reply
Ron Sexton

June 4, 2013 at 11:17 am

Investing in good inter-team relationships and processes is worthwhile. I am in a situation where we consolidated 4 IT teams down to one bigger team. There was ‘right sizing’ and many changed responsibilities, roles and management. In a situation such as this most of those relationships are changed completely and previous agreements are pretty much gone.
In such a time a debate about whether to consolidate those SQL disk LUNs or not may not be the best start of a new relationship. It can get back to waiting until there is a reason to review the current setup based on performance and try and work out the ‘new’ solution with new processes and agreements if this is open to discussion.
Everything is always changing and any planning should be done with this in mind. And for SQL Clusters planning is key to the configuration holding up over time for performance and reliability.
And the team members adapting to change is key to them being around to work on the SQL Clusters.
Case in point: Patching SQL Cluster nodes. The ‘old’ patching would migrate the instance back to it’s original planned primary node after patching. The ‘new’ patching does not and this is considered a possible enhancement not planned on being looked at until 2014. (So time to look into Failback! 🙂 )

Reply
Andrew

August 28, 2013 at 9:38 am

Great article Brent! Do you feel that VMware HA can replace a failover cluster for many systems, assuming they have planned maintenance windows? You could take a snapshot, patch, reboot, and go right back to the snapshot if there are problems. This would be a ‘stability through simplicity” approach and would require not much more time to restart than a failover cluster would take to fail over.

Reply
1. Brent Ozar
  
  August 28, 2013 at 3:13 pm
  
  Andrew – you’re not just assuming planned maintenance windows, but you’re also assuming that all changes happen inside those windows. If you work in a shop where everyone pre-announces every change, coordinates it with the sysadmins for good snapshots ahead of time, and then does user acceptance testing afterward to make sure the change had the planned effect, then sure, VMware HA can take the place of clustering. Back here in reality, though… 😉
  
  Reply
Jose

September 10, 2013 at 12:39 pm

I started working with VMware technology back to 2004, I think. I was still working for HP and I remember we were one of the DBAs in USA to deploy SQL 2000 failover instances (lab environment and for testing). It was a real pain! It was cool, because we were the VMware admins but we were also the DBAs, so we have total and complete control, no black boxes.

Fast forward to 2013, VMware has improved a lot and configure a Windows 2008R2 Cluster is easier than it was before. However, there is still a problem: how to troubleshoot SQL performance issues when SQL runs on a virtual machine.

If the DBA has no rights on the VMware Server (which is usually the case) it is really hard, if not impossible, to troubleshoot I/O, CPU, or RAM issues.

Another thing that concerns me is the use of DMVs (and actually a question to Brent, how can a DBA troubleshoot performance issues?). Inside a virtual machine, those DMVs won’t provide actual I/O or RAM usage; those parameters are just a mere abstraction of the real work or SAN.

But, seeing the industry’s trend and the desire of management team to consolidate and save money, I honestly believe that professional DBAs (Oracle as well) won’t have other option than learn VMware too or even get certified.

I think VMware is fantastic, but it is not the “one size fit all” solution that some VMware experts believe it is.

Reply
Waged Nasser

September 23, 2013 at 12:53 pm

excuse me if my questions sound stupid but i am a ‘noob’ to this technology. however, with the intent of mastering it.
the general consensus seems to steer away from clustering in virtualisation(also on others sources over the web).
would you still be of the same opinion if you had a stand alone server connected to the SAN? i currently use it for Mirroring as part of my HA strategy. Mirroring saved my beef too often to give away 🙂

Reply
1. Brent Ozar
  
  September 24, 2013 at 6:48 am
  
  Waged – every situation is different, and it boils down to your business’s RPO and RTO. What are they?
  
  Reply
  1. Waged Nasser
    
    September 24, 2013 at 7:05 am
    
    hi Brent,
    
    thanks for getting back to me. in a nutshell my application is extremely sensitive to any data loss and down time.
    RPO = 30min
    RTO = 5-10min max.
    so far Mirroring has served me well in the fact i mirrored to a stand alone machine outside my SAN. when a storm arrives, server failure, storage etc, i have a procedure to shift application interfacing to the mirrored database, we achieve that in minutes and my application is online again.
    
    by the way, the material you and your team provide on youtube, website and books is invaluable. thank you very much, you’ve been great influence. 🙂
    
    Reply
    1. Brent Ozar
      
      September 24, 2013 at 7:10 am
      
      If your recovery time objective is 5-10 minutes, it needs to be a fully automatic process. If the server goes down while you’re in an elevator or in the bathroom, you won’t be able to manually fail over in time. You’re looking at a cluster or AlwaysOn Availability Groups in sync mode, or database mirroring in sync with a witness.
      
      Reply
      1. Waged Nasser
        
        September 24, 2013 at 7:29 am
        
        agreed, I automated the procedure with powershell scripts.
        i think my solution lies with AlwaysOn Availability Groups, i could motivate the licensing.
        and thanks to AlwaysOn i can finally get rid of replication and it’s frustrations :/
        
        thank you again
NeilM

February 18, 2014 at 5:40 am

Very helpful post and follow-up comments. We currently have mirrored SQL Server 2008 instances in our primary data centre and a single SQL Server 2008 instance in our disaster recovery data centre with merge replication between the primary and disaster recovery instances.

We have a remote mirroring witness server which is also the merge replication distributor. This represents a single point of failure in our high availability architecture (!). We are looking to eliminate this and have a number of options:

SQL Server clustering on physical servers.
SQL Server clustering on VMWare virtual servers with VMware HA.
VMware HA.

We don’t face some of the challenges outlined above as we have a small operations team responsible for the server, database and VMware environment.

As the DBA my inclination is to go down the SQL Server clustering route. I have taken a look at http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1037959 which highliglights some of the limitations of running MSCS on VMware: vMotion migration, storage vMotion, hot adding memory and CPU ….

Would welcome any thoughts.

Reply
1. Brent Ozar
  
  February 19, 2014 at 7:35 am
  
  Neil – as much as I’d love to be able to do customized infrastructure planning in each blog comment, that’s kinda beyond what I have the bandwidth to do. 😀
  
  Reply
  1. Neil Macehiter
    
    February 19, 2014 at 7:39 am
    
    Understand completely. I was really hoping for a steer on the factors that I need to consider when evaluating SQL Server Clustering on physical servers; SQL Server Clustering on VMWare HA and VMware HA.
    
    Reply
    1. Brent Ozar
      
      February 19, 2014 at 7:47 am
      
      Yeah, that’s what the post is about, plus our other virtualization posts here. Hope that helps!
      
      Reply
Ron Sexton

February 18, 2014 at 1:33 pm

if the server is big enough to take all the VMware host resources then maybe the extra VMware licensing costs are not worth it.
The RDM’s are a little unfriendly so maybe only use a couple of hosts with these so the start up timeouts will not be much of an issue. (WSFC does SCSI locks on them so others cannot get to them.)
Clustering is pretty good. But VMware HA may be enough depending on your needs.
(WSFC = MCSC now)
Anyway , some thoughts…

Reply
Neil Macehiter

February 18, 2014 at 2:20 pm

Thanks for the comments. If we do virtualise then it would be hosted on an ESXi server cluster with HA and iSCSI SAN.

The service that depends on the mirroring/merge replication is 7*24. We do not take the service down for routine maintenance and have a custom-built application failover solution for both intra- and inter-site availability. As such, we really can’t afford for the witness/distributor to be unavailable for a prolonged period as it leaves us vulnerable in the event that another failure requires us to failover.
Would VMware HA be enough given these requirements? What other factors would influence SQL Server Cluster? If we were to adopt SQL Server Cluster would deployment on VMware HA make sense?

Reply
Rob Silver

December 1, 2014 at 3:56 am

The Microsoft Support Policy has the following exception which makes answering the question easier:

http://support2.microsoft.com/?id=956893

Exceptions:
If multiple SQL VMs are tightly coupled with one another, individual VMs can failover to the disaster recovery (DR) site but SQL high availability (HA) features inside the VM need to be removed and re-configured after VM failover. For this reason the following SQL Server features are not supported on Hyper-VM Replica:
?Availability Groups
?Database mirroring
?Failover Cluster instances
?Log shipping
?Replication

Reply
Parvinder Nijjar Student since 2017

December 16, 2014 at 9:26 am

After having managed 100+ virtual clusters on Vmware and Hyper-V I am so glad we have moved back to more physical clusters and slowly decommissioning the virtual servers. It was the added troubleshooting required which was a huge and unnecessary layer of complexity.

Reply
Olivier D

February 22, 2015 at 6:07 pm

Hello,
We have a failover system with 3 nodes over the primary site and 2 nodes in the secondary as DR solution. The business wants to access the real data and so to avoid more problems, I would like to add another node to support readonly features of availability groups.
We consider to add this to a virtualised node. The point is just use async communications and its own disks system. The node will not participate to the qorum and will not be used as a failover solution. It is just there as a solution for reporting.
Do you think in that case that virtualised server can be ok ?
I tried other solutions, but as our databases are part of the cluster and we are using a third party backup solution (backupexec) there are only one option left, participate to the always on to get the mirroring solution.
Thanks

Reply
1. Brent Ozar
  
  February 23, 2015 at 4:27 am
  
  Oliver – sure, a virtual server might work fine, but of course the answer will involve the performance requirements of the read-only queries.
  
  Reply
  1. Olivier D
    
    February 23, 2015 at 5:50 am
    
    The purpose of this server is to provide a read only access to OLTP data without slowing dow the production. They can access a copy of the data but as is and without any support as we have a BI layer that they can use and the operational data reports…
    So basically, performance is not a real issue.
    Thank you.
    
    Reply
Kris

July 23, 2015 at 1:31 am

Hi, I have reading this blog with great interest. I just wonder as the blog started way back in 2012 and we are now in 2015 with MS SQL server 2016 and new versions of for instance VMware. If, next to getting the support teams working together still a physical cluster is the preferred solution above a virtual one? I also do not see anybody mentioning that old Hardware needs to be replaced at certain point in time which costs also money. And also takes some work. That you need to arrange HW support from the vendor and if things break down that you need to arrange access and probably some one to guide the HW engineer. Unfortunately that can not be automated. Hope to hear from you.
Regards
Kris

Reply
1. Brent Ozar
  
  July 23, 2015 at 6:49 am
  
  Kris – what’s your question for us?
  
  Reply
  1. Kris
    
    July 23, 2015 at 9:16 am
    
    I just want to know if Physical solution is still the prefered solution for SQL clustering/AlwaysOn Availability Groups instead of virtual?
    
    Reply
    1. Brent Ozar
      
      July 23, 2015 at 9:28 am
      
      Kris – yes, because of what I explained in the post.
      
      Reply
Mike duby

July 28, 2015 at 10:19 pm

In my case we have all servers virtualized and a lot of active\pasive SQL clusters for SQL 2008/2012 for the past 3+ years. I have to say that after 3 years of compilations, the majority of outages were the SQL clusters itself that broke the most and had the most downtime because of trouble shooting. The reasons were never the same, san updates, windows patchs, network blink, etc… For me that best part was to patch ealisy but that is not a must, there is not that many SP realeases to justify the cluster need. So, still debating, our vmware environment as mature nicely over the years and it is also setup in HA, I think that the SQL cluster is an over kill in my case and I could save a lot of ressources by just using single nodes. I’m not a guru at vmware but I think if there is a serious issues with my nodes ressources that the vmware HA can take care of it. Any changes in my case need to be planned so it’s always possible to get down time. True that a failover is faster than a reboot but again that is not needed that often, and I’m always windering is the failover will be successfull as oppose to the reboot.
Still trying to find a good reason to keep the SQL cluster on vmware….

Reply
Kevin Ho

June 29, 2016 at 9:02 am

Having experienced a major Vmware Vsphere outage due to SAN issues. Business fails to realize how critical your backup and recovery strategy is in a virtualized world. When failure happens, people panic and your thought process is clouded. The recovery effort can take longer or go awry if there is lots of complexity or interdependency on the failed system. Way too many times I see eggs all thrown into one basket and that data loss and business downtime occur due to either cost saving measures beating logic.

Simplification and having recovery strategy well documented makes system failure just an exercise rather than heart transplant operation.

Reply
Derek

February 14, 2017 at 6:50 am

Confused – intend to use SQL clustering in SQL SERVER 2016 as mirroring is being deprecated. We have 2 physical servers in separate server rooms hosting VM windows 2016 server with sql server 2016 installed.
Do you not recommend this configuration using SQL server 2016 on windows server 2016 even though two physically separate machines but none the less VMs running on them?

Reply
1. Brent Ozar
  
  February 14, 2017 at 6:51 am
  
  It’s totally okay to use mirroring if it works for your needs today. It’s fully supported still. Microsoft is just warning you that it’ll go away in a future version, but it’s not any worse than it was a release or two ago. We use mirroring all the time.
  
  Reply
Chris Lanzi

March 22, 2017 at 1:06 pm

This was written in 2012. Have the more recent versions of Windows Server, HyperV, and SQL Server given you any reason to modify your opinion regarding the advisability of virtualizing SQL Server clusters?

Reply
1. Brent Ozar
  
  March 22, 2017 at 1:46 pm
  
  Chris – I’ll turn that around with a question: what specifically has changed that would change the situation, in your eyes?
  
  Reply
  1. Chris Lanzi
    
    March 22, 2017 at 2:50 pm
    
    We have three concerns regarding virtualizing SQL Server 2014 clusters:
    
    (1) Recently, we migrated a single SQL Server instance from a physical server that was part of a cluster to a virtual server that was not part of a cluster. After we did so, an I/O intensive job slowed down dramatically. So, we moved the database back to its original home, which we had not uninstalled yet. This concerned us.
    
    (2) What concerned us even more is that the storage team and the server team pointed fingers at each other and that it was not entirely clear how to diagnose the problem, which your article above addresses.
    
    (3) We had a dilemma regarding the type of disk volumes to use for the cluster groups in a virtual environment. If we used direct attach volumes, Veeam could not back them up. But if we used CSVs, we could not use Always On Availability Groups (since the target servers are in another data center with a different SAN and since the cluster seems to want each node to “see” the volumes).
    
    With 2016, we think that shared VXDs might address (3). But (2) would seem to be an intrinsic to virtual environments.
    
    Reply
Kal

July 19, 2017 at 3:22 am

Dear Brent
My company wants to implement SQL Server 2016 SP1 Enterprise Edition on WSFC using Windows 2012 R2 Standard Edition. Your article above is for SQL Server 2012 so has virtualization become more stable on SQL Server 2016?
Do let me know so i can convince the management otherwise
KY.

Reply
1. Flemming Didriksen
  
  October 1, 2017 at 12:23 pm
  
  I have been using WSFC for 11 years and have always been happy with it. My only problems used to be on the SAN level(firmware update etc.). My last clusters was based on SQL Server 2012 and 2014 on top of Windows Server 2012 R2 with no problems. Recently we wanted to consolidate all our SQL Server clusters in a Windows Server 2016 Hyper-V environment using the new Shared VHDX files. Unfortunately we see a lot of failure on the 2016 hypervisors, most likely in combination with shared vhdx files, so we are moving to AlwaysOn Availability Groups to avoid shared vhdx files. But using Windows Server 2012 R2 you should be fine using WSFC.
  
  Reply
Jivan

October 23, 2017 at 8:15 pm

Hi, I’m planning to install MSSQL 2016 on vSphere 6.5. I understand that vMotion only protects the vm, not the sql instance. Any suggestion on how to trigger vMotion automatically when the sql instance becomes non-responsive or hang?

Reply
1. Brent Ozar
  
  October 24, 2017 at 5:04 am
  
  Jivan – for Q&A, head on over to https://dba.stackexchange.com.
  
  Reply
Mel Chandler

February 26, 2019 at 2:47 pm

Brent,
Do you think this is still true today?

Reply
1. Brent Ozar
  
  February 26, 2019 at 3:50 pm
  
  Mel, what do you think has changed?
  
  Reply
Eric Swiggum

March 30, 2019 at 11:25 am

Brent – I was brought late into the game on an implementation & I feel I’m running into the “The Business is Making Me Do It!” scenario right now! They are saying “We want AlwaysOn.”, but I feel they are thinking of the traditional fail-over clustering flavor and are unaware of the offerings that AG’s have in SQL Server 2017 EE. They want a reporting db (via t-repl) & also have t-repl going to a “public” db. Maybe FCI is ok here, but what questions/points can i throw back at them!?!

Reply
1. Brent Ozar
  
  March 30, 2019 at 11:27 am
  
  Eric – sure, that’s exactly the kind of thing I tackle during my SQL Critical Care. Architecture advice like this is beyond the scope of a blog post comment, though. Sorry about that!
  
  Reply
  1. Eric Swiggum
    
    March 30, 2019 at 12:03 pm
    
    OK, is this a pre-recorded video that can be purchased where you cover these types of scenarios? It seems like things have changed quite a bit recently as far as your pre-recorded inventory…
    
    Reply
    1. Brent Ozar
      
      March 30, 2019 at 12:04 pm
      
      No, I stopped teaching clustering and HA/DR – the technologies just change faster than I can record videos, heh. I just do consulting on ’em.
      
      Reply
      1. Eric Swiggum
        
        March 30, 2019 at 12:22 pm
        
        Yeah I hear ya… and its probably too late now to make any recommendations as the prod environment has to be built today… Thanks Brent!
swarren

April 29, 2019 at 9:28 am

Hi,

I’m gonna poke this bear again…

I’ve seen a few msg asking if this article is still relevant 6yrs after posted, in each, Brent asks what’s changed on our side.

For my org, is all about os patching, we have an instance, shared by many applications, and we want to keep the os on the sql server patched as per our security person, which more often than not requires a reboot. So, easier to fail-over and keep a number of applications up vs bringing down a number of applications to reboot the db server back end.

I understand adding complexity (and I hate it) and I’ve lived through all the questions Brent has brought up. The boss still wants it.

Reply
swarren

April 29, 2019 at 9:32 am

to answer the question of what’s changed on our end, the amount of patches to deal with, we’re dealing with multiple patches monthly, yes, we can leverage change management but there’s an appetite for more options.

Reply

Why Your SQL Server Cluster Shouldn’t Be Virtualized

But the Business is Making Me Do It!

More Microsoft SQL Server Clustering Resources

Get my new posts by email

Keep digging

66 comments

Leave a comment Cancel reply