I’ve blogged about why you should virtualize SQL Server, but it’s not all unicorns and rainbows. Today we’re going to talk about some of the pitfalls and problems.
It’s Tougher to Get More Storage Throughput
Servers connect to Storage Area Networks (SANs) with Host Bus Adapters (HBAs). They’re like fancypants network cards, and they come in either fiberoptic (FC) or iSCSI varieties. These components are the place to focus when thinking about virtualization.
If your SQL Server:
- Has 2 or more HBAs connected to the SAN
- Uses active/active load balancing software like EMC’s PowerPath to get lots of storage throughput
- Actually takes advantage of that throughput
Then you’ll be dissatisfied with the current state of storage access in virtualization. Generally speaking, without doing some serious voodoo, you’re only going to get one HBA worth of throughput to each virtual machine, and that’s the best case scenario.
If you’re running multiple servers on the same virtual host, the IO situation gets worse: it becomes even more important to carefully manage how many SQL Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.
Never mind how much more complex this whole thing gets when we throw in shared storage: a single raid array might have virtual server drives for several different servers, and they can all compete for performance at the same time. Think about what happens on Friday nights when the antivirus software kicks off a scheduled scan across every server in the shop – goodbye, performance.

No-Good Liar
It’s Tougher to Get Good Performance Reporting
Let’s look at the very simplest performance indicator: Task Manager. On a virtual server, Task Manager doesn’t really show how busy the CPUs are. The CPU percentages are a function of several things, and none of them are transparent or detectable to the database administrator.
Other virtual servers might be using up all of the CPU.
The virtualization admin might have throttled your virtual server. They can set limits on how much CPU power you actually get.
Your host’s CPU can change. Your server can get moved from a 2ghz box to a 3ghz box without warning.
And even if you dig into the underlying causes to find out what’s going on, there’s no reporting system that will give you a dashboard view of this activity over time. You can’t look at a report and say, “Well, last Thursday my production SQL Server was hitting 100% CPU, but it’s because it was on a slow shared box, and on Thursday night at 5:00 PM it was migrated live over to a faster box, and that’s why pressure eased off.”
Not Everything Works As Advertised
Virtualization vendors have some amazing features. We talked about vMotion and Live Migration, the ability to move virtual servers from one physical host to another on the fly without downtime. While that does indeed work great, it doesn’t necessarily work great for every server in every shop. If you’ve got a heavily saturated network, and your SQL Server’s memory is changing very fast (like in high-transaction environments or doing huge queries), these features may not be able to copy data over the network as fast as it’s changing in memory. In situations like this, the live migration will fail. I’ve never seen it bring the virtual server down, but I’ve seen it slow performance while it attempted the migration.
New features and new versions of virtualization software come out at a breakneck pace, and like any other software, it’s got bugs. A particularly nasty bug surfaced in VMware ESX v3.5 Update 2 – on a certain date, VMware users couldn’t power on their servers because the licensing was expired – even if it wasn’t. Imagine shutting down a server to perform maintenance, then trying to turn it back on and getting denied. “Sorry, boss, I can’t turn the server back on. I just can’t.” It took VMware days to deploy a fixed version, and in that time span, those servers just couldn’t come back on.
That’s an extreme case, but whenever more complexity is introduced into the environment, risk is introduced too. Injecting virtualization between the hardware and the OS is a risk.
It’s Not Always Cost-Effective
All of the virtualization vendors have a free version of their software, but the free version lacks the management tools and/or performance features that I touted in my earlier articles about why sysadmins want to virtualize your servers. The management tools and power-up editions cost money, typically on a per-CPU basis, and there’s maintenance costs involved as well. If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.
Combining multiple guest servers onto less physical servers still doesn’t always pay off: run the numbers for all of your virtualization tool licenses, and you may end up being better served by a SQL Server consolidation project. I did a webcast last year with Kevin Kline and Ron Talmage about choosing between consolidation and virtualization. That information is still relevant today.
My Virtualization Recommendations for SQL Server
My recommendations are:
- Virtualize only when it’s going to solve a problem, and you don’t have a better solution for that problem.
- Get good at performance monitoring before you virtualize, because it’s much tougher afterwards.
- Start by virtualizing the oldest, slowest boxes with local storage because they’ll likely see a performance gain instead of a penalty.
- Avoid virtualizing servers that have (and utilize) more than 2 HBAs.
If you’ve virtualized production SQL Servers, I’d love to hear about your experiences.

[...] I have interesting memories of when the then-DBA showed me a few tricks for looking after the database for the times when he was out of office. Everything he taught me was backwards in some way or another. For example, when the database seemed slow, I was shown to how to log onto the server and check the stats thrown out by the Task Manager. And it’s when I think of these “lessons” that I see how much I’ve grown. (And for those of you who don’t know why Task Manager isn’t the best gauge of system performance, please read the section of this post by Brent Ozar ( Blog | @BrentO ) entitled It’s Tougher to Get Good Performance Reporting. [...]