I’ve blogged about why you should virtualize SQL Server, but it’s not all unicorns and rainbows. Today we’re going to talk about some of the pitfalls and problems.
It’s Tougher to Get More Storage Throughput
Servers connect to Storage Area Networks (SANs) with Host Bus Adapters (HBAs). They’re like fancypants network cards, and they come in either fiberoptic (FC) or iSCSI varieties. These components are the place to focus when thinking about virtualization.
If your SQL Server:
- Has 2 or more HBAs connected to the SAN
- Uses active/active load balancing software like EMC’s PowerPath to get lots of storage throughput
- Actually takes advantage of that throughput
Then you’ll be dissatisfied with the current state of storage access in virtualization. Generally speaking, without doing some serious voodoo, you’re only going to get one HBA worth of throughput to each virtual machine, and that’s the best case scenario.
If you’re running multiple servers on the same virtual host, the IO situation gets worse: it becomes even more important to carefully manage how many SQL Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.
Never mind how much more complex this whole thing gets when we throw in shared storage: a single raid array might have virtual server drives for several different servers, and they can all compete for performance at the same time. Think about what happens on Friday nights when the antivirus software kicks off a scheduled scan across every server in the shop – goodbye, performance.

No-Good Liar
It’s Tougher to Get Good Performance Reporting
Let’s look at the very simplest performance indicator: Task Manager. On a virtual server, Task Manager doesn’t really show how busy the CPUs are. The CPU percentages are a function of several things, and none of them are transparent or detectable to the database administrator.
Other virtual servers might be using up all of the CPU.
The virtualization admin might have throttled your virtual server. They can set limits on how much CPU power you actually get.
Your host’s CPU can change. Your server can get moved from a 2ghz box to a 3ghz box without warning.
And even if you dig into the underlying causes to find out what’s going on, there’s no reporting system that will give you a dashboard view of this activity over time. You can’t look at a report and say, “Well, last Thursday my production SQL Server was hitting 100% CPU, but it’s because it was on a slow shared box, and on Thursday night at 5:00 PM it was migrated live over to a faster box, and that’s why pressure eased off.”
Not Everything Works As Advertised
Virtualization vendors have some amazing features. We talked about vMotion and Live Migration, the ability to move virtual servers from one physical host to another on the fly without downtime. While that does indeed work great, it doesn’t necessarily work great for every server in every shop. If you’ve got a heavily saturated network, and your SQL Server’s memory is changing very fast (like in high-transaction environments or doing huge queries), these features may not be able to copy data over the network as fast as it’s changing in memory. In situations like this, the live migration will fail. I’ve never seen it bring the virtual server down, but I’ve seen it slow performance while it attempted the migration.
New features and new versions of virtualization software come out at a breakneck pace, and like any other software, it’s got bugs. A particularly nasty bug surfaced in VMware ESX v3.5 Update 2 – on a certain date, VMware users couldn’t power on their servers because the licensing was expired – even if it wasn’t. Imagine shutting down a server to perform maintenance, then trying to turn it back on and getting denied. “Sorry, boss, I can’t turn the server back on. I just can’t.” It took VMware days to deploy a fixed version, and in that time span, those servers just couldn’t come back on.
That’s an extreme case, but whenever more complexity is introduced into the environment, risk is introduced too. Injecting virtualization between the hardware and the OS is a risk.
It’s Not Always Cost-Effective
All of the virtualization vendors have a free version of their software, but the free version lacks the management tools and/or performance features that I touted in my earlier articles about why sysadmins want to virtualize your servers. The management tools and power-up editions cost money, typically on a per-CPU basis, and there’s maintenance costs involved as well. If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.
Combining multiple guest servers onto less physical servers still doesn’t always pay off: run the numbers for all of your virtualization tool licenses, and you may end up being better served by a SQL Server consolidation project. I did a webcast last year with Kevin Kline and Ron Talmage about choosing between consolidation and virtualization. That information is still relevant today.
My Virtualization Recommendations for SQL Server
My recommendations are:
- Virtualize only when it’s going to solve a problem, and you don’t have a better solution for that problem.
- Get good at performance monitoring before you virtualize, because it’s much tougher afterwards.
- Start by virtualizing the oldest, slowest boxes with local storage because they’ll likely see a performance gain instead of a penalty.
- Avoid virtualizing servers that have (and utilize) more than 2 HBAs.
If you’ve virtualized production SQL Servers, I’d love to hear about your experiences.


I have friends that have successfully virtualized SQL Servers, and clusters, and don’t enable vMotion for those instances, precisely because of some of your issues. You don’t want HBA pathing changing, CPUs changing, etc.
These are good reasons to not do it, and skip these instances. Go for the easy wins first.
Agreed. I’m a big fan of VMware’s dynamic resource scheduling, but not necessarily for SQL Servers. I want to finely tune resources and set specific guests on specific hardware, and then keep ‘em there. I know DRS does a great job, but I’d rather have predictable hardware performance for SQL than have to guesstimate what the bottleneck was when I’m looking at historical reporting.
Awesome. My sentiments exactly. You inspired me to write a post about what this means for SharePoint: http://www.sharepointmadscientist.com/Lists/Posts/Post.aspx?ID=28
First of all I would like to say that these are some great articles!
Storage throughput:
When utilizing SQL you have to make a decent design. Have database and tranaction logs on seperate LUNs (which in turn have the required underlaying RAID level), but I don’t have to explain it here, since you all know about this better than I do. Bottom line is you are not stuck to 1 LUN per vm.
In a current VMware ESX environment there are 2 options available for active/active loadbalancing.
1. You can set a fixed loadbalancing on a per LUN basis.
2. you can even utilize round robin load balancing policy. It is yet not suppored (experimental), but I’ve heared some good things about it.
Bottom line is that you are not stuck to 1 HBA per VM.
Offcourse it is nothing compared to a decent smart load balancing software like EMC’s PowerPath, but there’s a light at the end of the tunnel. In the upcoming release of VMware ESX, there is PSA (pluggable storage architecture), and EMC has created a powerpath module for ESX on it, so decent loadbalancing will be hitting ESX very shortly.
Using DRS/VMotion:
When you keep your hardware similar in your cluster, which I think you should do, it doesn’t matter if DRS moves your SQL server around. DRS only moves your server if it can yield a performance increase.
Cost-Effectiveness:
“If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.”
This is not necessarily true. IMHO you do not virtualize SQL from a consolidation point of view. Take a look at all the flexibility you get when your SQL server is virtualized.
- You can utilize HA without any complex clustering setup
- You can do hardware maintenance without any downtime
- and what about your DR scenario? If you replicate storage you just have to power up the server and your done.
So while initially a cost increase, your management is simplified and more flexible and you will get your ROI on the long run.
but hey that’s just my 2 cents.
-Arnim van Lieshout
Arnim – about storage throughput via unsupported round robin load balancing – if we’re talking about a database server, “unsupported” is about the last thing a DBA will consider implementing. There’s all kinds of things you can do if you want to roll your own and avoid support, but that’s not the kind of thing you usually see in the enterprise database server market.
When you say that “decent loadbalancing will be hitting ESX very shortly”, that depends on two things: first, the next version of VMware has to hit the streets, and second, it has to be installed in the user’s datacenter. Until it’s actually available for the database administrator to use, then it doesn’t matter to them. We can talk about all kinds of future technologies, but we need to see it in action before we bank on it.
About the cost-effectiveness – you note that “If you replicate storage,” but that’s also not something I typically see in the DBA market. I often ask DBAs if they’re using SAN-based replication, and the answer is almost always no. SQL Server database administrators tend to rely on the replication (log shipping, mirroring, etc) options built into SQL Server instead of relying on something external to their application. They’ve got an aggressive trust issue.
Thanks for the points though!
The neverending battle between the DBA and the System Administrator. LOL
I agree on your points considering a production environment. However there are many test/dev environments where some experimenting and third party technologies are allowable. This is the Demilitarized Zone where the DBA and the system administrator come together.
I just wanted to point out the options you have.
When the “DBA camp” does not want to rely on storage replication, you could use SQL replication techniques and still benefit from the other flexibilities VMware provides you with.
However for those “daredevils” that want to virtualize their SQL servers should read this VMware document.
http://www.vmware.com/files/pdf/solutions/sql_server_virtual_bp.pdf
-Arnim van Lieshout
First of all, I want to say that I find your posts very informative and I appreciate the fact that you are sharing your time and knowledge with the less experienced like me.
This post is a comment/question so forgive me if I’m taking advantage of your post.
My Alice-in-Wonderland experience with our recent SQL Server virtualization:
It all started as I ventured in the rabbit’s hole of Microsoft Hyper-V virtualization. We just completed an all-in-one-shot migration, OS, SQL & Virtual, following Microsoft’s Best Practices White Paper (http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/SQL2008inHyperV2008.docx).
After the upgrade, users are complaining that our custom ASP based application (developed by a vendor) is running slow. I’ve been trying to pinpoint what’s causing the “slowliness” and so far I can’t really come to a conclusion.
To isolate the issue for this particular server, only one virtual SQL Server 2008 is running on the host with more CPUs and RAM assigned to it than what it had on the physical server. We have an average of 50-70 users accessing the application at one time with a single instance and one database 6GBs in size. CPU sometimes spikes to 75%-90% for periods of 3-4 seconds, then fluctuates between 20%-50%.
Scenarios Tested & Possible culprits on my list:
1) Disk I/O (discarded, files separated by
Bad code / query optimization (most likely, how can I prove this and put this nicely without agitating the ant’s nest…programmers ? )
A) RAID 1 – Host and VM C: drives,
B) RAID 1+0 for database files and
C)RAID 1 for log files
2) Processor (discarded, no CPU oversubscibing)
3) Memory (discarded, bumped from 8GB to 16 GB with no noticeable performance gain, 36 GB overall available)
4) Maintenance (discarded, index rebuilt, reorganized, statistic updated nightly, log file backup and shrink every 48 hours)
5) Network (discarded, traffic never spikes more then 50%, 1 GB throughput)
6) Transactional replication (discarded, stopped/deleted to test impact…no major performance gain after it was stopped)
7) Indexes (possible, according to DB Tuning Advisor 70 indexes missing/needed)
I have done SQL Profiler traces, disk I/O stress tests (SQLIO, SQLIOSIM) and the response times seem ok according to several blogs.
Any other ideas? Have I overlooked anything on my config or maintenance? Anything specific I should look on my traces?
I appreciate your comments on this. SQLJoe.
SQLJoe – sounds like you’ve got some interesting challenges there. I see a lot of things that concern me, but frankly, there’s so many that you probably want to get a local consultant involved. Have you got someone you can bring in to help? There’s so many things that I can’t do justice to it in the time & space that I’ve got here.
Interesting comments.
We placed some small production SQL servers in the virtual environment but I’m finding that placing SQL servers into a large server with multiple instance is better. Currently we have 3 large HP servers (DL585). 2 with 10 instances of SQL 2005 on each and 1 with 3 instances of 2008. We are also using CA’s XOSoft to provide High Availability (HA). I am very happy with the performance and manageability too.
With this configuration I have now removed 25 physical server from our data center with plans to eventually go from 110 SQL server to may 4 – 6 in this multi-instance configuration. We are using the virtual environment for all our development/test SQL servers.
Thanks,
Rudy
Rudy – cool, glad you’re liking virtualization. There are several challenges with using one large server with multiple instances. First, if you have any one or two databases that need to go to a newer version of SQL Server, you can have challenges running different versions on different instances on the same OS. Second, if any third party vendor or consultant wants sysadmin rights on the OS in order to do troubleshooting, there’s a chance they can mess up other instances. Finally, coordinating downtime becomes much more complex when you have to do OS patches, since now you have to bring everybody down at exactly the same time. It does work in the right environments, though.
Thanks for your quick reply. I must be lucky as we don’t seem to have any issues (knock on wood).
All SQL server instance are the same version. New databases/instances would be moved to our newer server. I’ve setup these servers to be purely data only. Any/all applications that talk the SQL servers have been virtualized so if a vendor/consultant needs higher rights, we can give it to them on the VM and just provide a SQL server ID with rights they need.
Our solution may not be for everyone, but I recommend you give it a try. Using both as a mix works great.
Keep up the great work Brent and see you (virtually) on July 21 on your virtual training v conference.