Reasons Why You Shouldn’t Virtualize SQL Server

I’ve blogged about why you should virtualize SQL Server, but it’s not all unicorns and rainbows.  Today we’re going to talk about some of the pitfalls and problems.

When Virtualization Goes Right
When Virtualization Goes Right

It’s Tougher to Get More Storage Throughput

Servers connect to Storage Area Networks (SANs) with Host Bus Adapters (HBAs).  They’re like fancypants network cards, and they come in either fiberoptic (FC) or iSCSI varieties.  These components are the place to focus when thinking about virtualization.

If your SQL Server:

  • Has 2 or more HBAs connected to the SAN
  • Uses active/active load balancing software like EMC’s PowerPath to get lots of storage throughput
  • Actually takes advantage of that throughput

Then you’ll be dissatisfied with the current state of storage access in virtualization.  Generally speaking, without doing some serious voodoo, you’re only going to get one HBA worth of throughput to each virtual machine, and that’s the best case scenario.

If you’re running multiple servers on the same virtual host, the IO situation gets worse: it becomes even more important to carefully manage how many SQL Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.

Never mind how much more complex this whole thing gets when we throw in shared storage: a single raid array might have virtual server drives for several different servers, and they can all compete for performance at the same time.  Think about what happens on Friday nights when the antivirus software kicks off a scheduled scan across every server in the shop – goodbye, performance.

No-Good Liar
No-Good Liar

It’s Tougher to Get Good Performance Reporting

Let’s look at the very simplest performance indicator: Task Manager.  On a virtual server, Task Manager doesn’t really show how busy the CPUs are.  The CPU percentages are a function of several things, and none of them are transparent or detectable to the database administrator.

Other virtual servers might be using up all of the CPU.

The virtualization admin might have throttled your virtual server.  They can set limits on how much CPU power you actually get.

Your host’s CPU can change.  Your server can get moved from a 2ghz box to a 3ghz box without warning.

And even if you dig into the underlying causes to find out what’s going on, there’s no reporting system that will give you a dashboard view of this activity over time.  You can’t look at a report and say, “Well, last Thursday my production SQL Server was hitting 100% CPU, but it’s because it was on a slow shared box, and on Thursday night at 5:00 PM it was migrated live over to a faster box, and that’s why pressure eased off.”

Not Everything Works As Advertised

Virtualization vendors have some amazing features.  We talked about vMotion and Live Migration, the ability to move virtual servers from one physical host to another on the fly without downtime.  While that does indeed work great, it doesn’t necessarily work great for every server in every shop.  If you’ve got a heavily saturated network, and your SQL Server’s memory is changing very fast (like in high-transaction environments or doing huge queries), these features may not be able to copy data over the network as fast as it’s changing in memory.  In situations like this, the live migration will fail.  I’ve never seen it bring the virtual server down, but I’ve seen it slow performance while it attempted the migration.

New features and new versions of virtualization software come out at a breakneck pace, and like any other software, it’s got bugs.  A particularly nasty bug surfaced in VMware ESX v3.5 Update 2 – on a certain date, VMware users couldn’t power on their servers because the licensing was expired – even if it wasn’t.  Imagine shutting down a server to perform maintenance, then trying to turn it back on and getting denied.  “Sorry, boss, I can’t turn the server back on. I just can’t.”  It took VMware days to deploy a fixed version, and in that time span, those servers just couldn’t come back on.

That’s an extreme case, but whenever more complexity is introduced into the environment, risk is introduced too.  Injecting virtualization between the hardware and the OS is a risk.

It’s Not Always Cost-Effective

All of the virtualization vendors have a free version of their software, but the free version lacks the management tools and/or performance features that I touted in my earlier articles about why sysadmins want to virtualize your servers.  The management tools and power-up editions cost money, typically on a per-CPU basis, and there’s maintenance costs involved as well.  If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.

Combining multiple guest servers onto less physical servers still doesn’t always pay off: run the numbers for all of your virtualization tool licenses, and you may end up being better served by a SQL Server consolidation project.  I did a webcast last year with Kevin Kline and Ron Talmage about choosing between consolidation and virtualization.  That information is still relevant today.

My Virtualization Recommendations for SQL Server

My recommendations are:

  • Virtualize only when it’s going to solve a problem, and you don’t have a better solution for that problem.
  • Get good at performance monitoring before you virtualize, because it’s much tougher afterwards.
  • Start by virtualizing the oldest, slowest boxes with local storage because they’ll likely see a performance gain instead of a penalty.
  • Avoid virtualizing servers that have (and utilize) more than 2 HBAs.
Previous Post
Why Would You Virtualize SQL Server?
Next Post
Open Letter to Non-Technical Friends with Windows Machines

100 Comments. Leave new

  • I have friends that have successfully virtualized SQL Servers, and clusters, and don’t enable vMotion for those instances, precisely because of some of your issues. You don’t want HBA pathing changing, CPUs changing, etc.

    These are good reasons to not do it, and skip these instances. Go for the easy wins first.

    Reply
  • Agreed. I’m a big fan of VMware’s dynamic resource scheduling, but not necessarily for SQL Servers. I want to finely tune resources and set specific guests on specific hardware, and then keep ’em there. I know DRS does a great job, but I’d rather have predictable hardware performance for SQL than have to guesstimate what the bottleneck was when I’m looking at historical reporting.

    Reply
  • Awesome. My sentiments exactly. You inspired me to write a post about what this means for SharePoint: http://www.sharepointmadscientist.com/Lists/Posts/Post.aspx?ID=28

    Reply
  • First of all I would like to say that these are some great articles!

    Storage throughput:
    When utilizing SQL you have to make a decent design. Have database and tranaction logs on seperate LUNs (which in turn have the required underlaying RAID level), but I don’t have to explain it here, since you all know about this better than I do. Bottom line is you are not stuck to 1 LUN per vm.

    In a current VMware ESX environment there are 2 options available for active/active loadbalancing.
    1. You can set a fixed loadbalancing on a per LUN basis.
    2. you can even utilize round robin load balancing policy. It is yet not suppored (experimental), but I’ve heared some good things about it.
    Bottom line is that you are not stuck to 1 HBA per VM.

    Offcourse it is nothing compared to a decent smart load balancing software like EMC’s PowerPath, but there’s a light at the end of the tunnel. In the upcoming release of VMware ESX, there is PSA (pluggable storage architecture), and EMC has created a powerpath module for ESX on it, so decent loadbalancing will be hitting ESX very shortly.

    Using DRS/VMotion:
    When you keep your hardware similar in your cluster, which I think you should do, it doesn’t matter if DRS moves your SQL server around. DRS only moves your server if it can yield a performance increase.

    Cost-Effectiveness:
    “If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.”
    This is not necessarily true. IMHO you do not virtualize SQL from a consolidation point of view. Take a look at all the flexibility you get when your SQL server is virtualized.
    – You can utilize HA without any complex clustering setup
    – You can do hardware maintenance without any downtime
    – and what about your DR scenario? If you replicate storage you just have to power up the server and your done.
    So while initially a cost increase, your management is simplified and more flexible and you will get your ROI on the long run.

    but hey that’s just my 2 cents.

    -Arnim van Lieshout

    Reply
  • Arnim – about storage throughput via unsupported round robin load balancing – if we’re talking about a database server, “unsupported” is about the last thing a DBA will consider implementing. There’s all kinds of things you can do if you want to roll your own and avoid support, but that’s not the kind of thing you usually see in the enterprise database server market.

    When you say that “decent loadbalancing will be hitting ESX very shortly”, that depends on two things: first, the next version of VMware has to hit the streets, and second, it has to be installed in the user’s datacenter. Until it’s actually available for the database administrator to use, then it doesn’t matter to them. We can talk about all kinds of future technologies, but we need to see it in action before we bank on it.

    About the cost-effectiveness – you note that “If you replicate storage,” but that’s also not something I typically see in the DBA market. I often ask DBAs if they’re using SAN-based replication, and the answer is almost always no. SQL Server database administrators tend to rely on the replication (log shipping, mirroring, etc) options built into SQL Server instead of relying on something external to their application. They’ve got an aggressive trust issue. 😀

    Thanks for the points though!

    Reply
  • The neverending battle between the DBA and the System Administrator. LOL

    I agree on your points considering a production environment. However there are many test/dev environments where some experimenting and third party technologies are allowable. This is the Demilitarized Zone where the DBA and the system administrator come together. 😀 I just wanted to point out the options you have.

    When the “DBA camp” does not want to rely on storage replication, you could use SQL replication techniques and still benefit from the other flexibilities VMware provides you with.

    However for those “daredevils” that want to virtualize their SQL servers should read this VMware document.
    http://www.vmware.com/files/pdf/solutions/sql_server_virtual_bp.pdf

    -Arnim van Lieshout

    Reply
  • First of all, I want to say that I find your posts very informative and I appreciate the fact that you are sharing your time and knowledge with the less experienced like me.

    This post is a comment/question so forgive me if I’m taking advantage of your post.

    My Alice-in-Wonderland experience with our recent SQL Server virtualization:

    It all started as I ventured in the rabbit’s hole of Microsoft Hyper-V virtualization. We just completed an all-in-one-shot migration, OS, SQL & Virtual, following Microsoft’s Best Practices White Paper (http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/SQL2008inHyperV2008.docx).

    After the upgrade, users are complaining that our custom ASP based application (developed by a vendor) is running slow. I’ve been trying to pinpoint what’s causing the “slowliness” and so far I can’t really come to a conclusion.

    To isolate the issue for this particular server, only one virtual SQL Server 2008 is running on the host with more CPUs and RAM assigned to it than what it had on the physical server. We have an average of 50-70 users accessing the application at one time with a single instance and one database 6GBs in size. CPU sometimes spikes to 75%-90% for periods of 3-4 seconds, then fluctuates between 20%-50%.

    Scenarios Tested & Possible culprits on my list:

    1) Disk I/O (discarded, files separated by
    A) RAID 1 – Host and VM C: drives,
    B) RAID 1+0 for database files and
    C)RAID 1 for log files
    2) Processor (discarded, no CPU oversubscibing)
    3) Memory (discarded, bumped from 8GB to 16 GB with no noticeable performance gain, 36 GB overall available)
    4) Maintenance (discarded, index rebuilt, reorganized, statistic updated nightly, log file backup and shrink every 48 hours)
    5) Network (discarded, traffic never spikes more then 50%, 1 GB throughput)
    6) Transactional replication (discarded, stopped/deleted to test impact…no major performance gain after it was stopped)
    7) Indexes (possible, according to DB Tuning Advisor 70 indexes missing/needed)
    8) Bad code / query optimization (most likely, how can I prove this and put this nicely without agitating the ant’s nest…programmers ? )

    I have done SQL Profiler traces, disk I/O stress tests (SQLIO, SQLIOSIM) and the response times seem ok according to several blogs.

    Any other ideas? Have I overlooked anything on my config or maintenance? Anything specific I should look on my traces?

    I appreciate your comments on this. SQLJoe.

    Reply
    • I wonndered if you got all this tuned up? A few things I saw was –
      where do you have tempdb? I hope on it’s own lun?
      i wouldn’t do shrinking – ever. this fragments the indexes after you defragged them earlier.
      can you pinpoint if the performance was great if directly connected to the sql server and run a query from Studio? Is the query only slow when the application runs the query(from the web page)?

      Reply
    • Are you running the MSDTC ? Turn it off unless essential !

      Reply
      • Richard – I don’t say this often, but that is VERY BAD ADVICE.

        I’m sure you didn’t come up with it yourself, and I want to get to the root of that so we can fix wherever it’s coming from. Can you elaborate on where you got that advice from?

        Reply
        • Brent – I have seen developers build solutions that use the DTC for all SQL transactions…even when there only is a single database involved i the unit of work.So why use a 2PC protocol if you dont need it? Systems that fall into this category have SQL performance that can be an order of magnitude slower than if you used a direct connection.

          Please explain why you feel the above is bad advice.

          Reply
          • Richard – I’ve also seen developers building bad/slow solutions that use triggers. Should we disable triggers?

            I’ve seen developers building bad solutions using functions. Should we disable functions?

            Come to think of it, I’ve even seen them build bad tables. Let’s disable tables!

            Whew. Our job is so much easier now. Yay, us!

            The reality is that sometimes you need distributed transactions to do commits across several servers. The answer isn’t to disable features – especially when you might need DTC to do an emergency update via linked servers, for example.

        • This is a reply to my 2nd post:

          As I made quite clear in my original post Brent, dont use MSDTC **unless its essential**. I.e. if you have a need for it then use it but if you dont have a need to coordinate transactions over multiple dispensers then you’d be mad to create an overhead for yourself.

          Reply
          • Richard – then if you’re determined to do that, turn it off in development. Don’t ever turn it off in production, because the DBA may need to make an emergency update via linked servers. You’ll be pretty sad when that kind of thing fails.

            What you’re suggesting isn’t anywhere near a performance fix, and can make existing applications fail if they’re already coded to use DTC.

          • Ahh I see the problem here. I said ‘turn the DTC off’. I meant figuratively. I.e. dont disable the service, just dont use it in your coding (**unless you have to**).

          • Richard – bingo.

            Plus, this isn’t a post about coding – it’s a post about virtualization. If you’re going to give (seriously dangerous) advice like that, you’ve gotta pay a little more attention to where you’re giving it. Thanks for the dialog!

          • Back on topic and watching my Ps and Qs !
            My investigation has shown that a solution that uses the DTC in a virtualised configuration degrades at a greater rate than same solution using DTC in physical SQL configuration. Your mileage may vary but may be worth considering.

          • There you go! That’s a better way of saying it. Great job!

          • Not really a virtualization issue. If you have a sub optimal virtual server configuration it may have more effect but really more a performance issue. And performance issues across linked servers that tend to be involved in DTC can be painful. An example is that the user that has datareader across on the linked server didn’t have permissions to see the stats and do a good query plan. Quite unexpected and was considered a bug. Just one example. For more on this example see: https://support.microsoft.com/en-us/kb/2683304
            A trace would also show the missing permission before the fix.

  • SQLJoe – sounds like you’ve got some interesting challenges there. I see a lot of things that concern me, but frankly, there’s so many that you probably want to get a local consultant involved. Have you got someone you can bring in to help? There’s so many things that I can’t do justice to it in the time & space that I’ve got here.

    Reply
  • Interesting comments.

    We placed some small production SQL servers in the virtual environment but I’m finding that placing SQL servers into a large server with multiple instance is better. Currently we have 3 large HP servers (DL585). 2 with 10 instances of SQL 2005 on each and 1 with 3 instances of 2008. We are also using CA’s XOSoft to provide High Availability (HA). I am very happy with the performance and manageability too.

    With this configuration I have now removed 25 physical server from our data center with plans to eventually go from 110 SQL server to may 4 – 6 in this multi-instance configuration. We are using the virtual environment for all our development/test SQL servers.

    Thanks,

    Rudy

    Reply
    • Rudy – cool, glad you’re liking virtualization. There are several challenges with using one large server with multiple instances. First, if you have any one or two databases that need to go to a newer version of SQL Server, you can have challenges running different versions on different instances on the same OS. Second, if any third party vendor or consultant wants sysadmin rights on the OS in order to do troubleshooting, there’s a chance they can mess up other instances. Finally, coordinating downtime becomes much more complex when you have to do OS patches, since now you have to bring everybody down at exactly the same time. It does work in the right environments, though.

      Reply
  • Thanks for your quick reply. I must be lucky as we don’t seem to have any issues (knock on wood).
    All SQL server instance are the same version. New databases/instances would be moved to our newer server. I’ve setup these servers to be purely data only. Any/all applications that talk the SQL servers have been virtualized so if a vendor/consultant needs higher rights, we can give it to them on the VM and just provide a SQL server ID with rights they need.

    Our solution may not be for everyone, but I recommend you give it a try. Using both as a mix works great.

    Keep up the great work Brent and see you (virtually) on July 21 on your virtual training v conference.

    Reply
  • aleksey.fomchenko
    August 5, 2010 5:05 pm

    Hi Ozar,

    Do you know something about behavior of SQL Server Cost Based Optimizer in Virtual environment. How properly it works. Because it should be aware about all physical resources which are available to SQL Server to create good execution plans. But if we run several SQL Server within virtual environment which share resources it can be a problem. Unfortunately, I have found nothing about this issue.

    Reply
    • Aleksey – right now there’s not any good resources online for that. SQL Server works fine with shared resources – it’s been heavily used in shared SAN environments and active/active clustering for years. You may want to investigate resources for active/active clustering which will teach you how to do things like affinity masking, although that’s not something I usually recommend for virtual environments.

      Reply
  • Hey guys…

    Good point of view. From now on I’m totally confortable with the IBM pSeries POWER Virtualization technology. The critical point: The anti-virus software will never blow up my server, service or even the hardware because of I don’t need an AV software running and eating everything…..

    Ooops, sorry my fault. This is a MS MSQ blog. Take care guys.

    😉

    Cheers, Davi.

    Reply
    • Davi – I’m curious about a few things.

      1. If you use the pSeries, why are you reading my blog? I’ve never discussed anything about those kinds of systems.

      2. What’s MS MSQ?

      3. You do realize that leaving snide comments under your real name, with your real address, is not such a good idea? If anyone searches for things like “Davi Baldin resume” or “Davi Baldin job experience”, they might run across your comments. Just because the web site doesn’t normally show your email address, baldin@gmail.com, doesn’t mean it’s private. You want to be careful when doing things like this, lest someone who’s looking for the Davi Baldin find more than just your Eclipse postings in the newsgroups from 2006. You were a developer back then, I understand? You definitely appear to have made some progress in your comments since those question & answer days, and that’s great.

      http://www.eclipse.org/forums/index.php?t=rview&goto=453381&th=144005

      I look forward to hearing your answers. Have a great day!
      Brent

      Reply
  • Hi

    everybody seems to be missing the crucial issue here. Licensing!!
    DRS and VMotion are great but having a virtual SQL Server move to an unlicensed host can get you into a lot of trouble with uncle Bill and his croneys. Might even lumber you with a tasty law suit 🙂

    The answer here is clearly, if you do virtualise SQL Servers understand which hosts are licensed for virtual SQL Servers and take steps to ensure they remain within their boundaries!

    I am a big fan of virtualisation and have used both VMWare server(freebie) and ESX\VI3 to virtualise production\dev\test SQL Servers. I do have a rule of thumb though, if you get to the point where you start to assign ridiculous amounts of resources to a VM it probably shouldnt be a VM. I also have a series of wikis that detail how to build a virtual SQL Server cluster which many people have found useful for testing\educational purposes (how popular would you be if you hosed the corporate SQL Server cluster whilst testing 🙂 ).

    regards

    Perry

    Reply
    • Perry – the licensing issue is the same regardless of whether you’re in a physical or virtual environment. You need to license it where you use it. Pretty straightforward there. 😉

      Reply
  • Perry there is some useful information with examples of VM SQL Server license calculations here:

    http://www.microsoft.com/sqlserver/2008/en/us/licensing.aspx

    Check out the license guide – very nice and easy to follow (better than the previous efforts from Microsoft)

    Reply
  • Hi

    If you license the processors on the ESX host at Enterprise level this affords unlimited SQL Server VMs. Any other edition doesn’t offer the same flexibility, it depends how many SQL VMs you intend to have,

    Regards

    Perry

    Reply
  • Hi Brent

    i take your point and it’s obvious to see that MS have seen the gap in the market there!! Rather than freely give away VM licensing, Datacenter (and not Enterprise) now offers unlimited virtualisation (i bet it costs a lot more too, double in fact). But you can’t surely now be saying that you’re basing this whole topic upon SQL 2008 R2.
    What about real world, to go back to SQL 2005 and SQL 2008 which most shops are still using in production today, they offer unlimited virtualisation when licensing the physical host at enterprise level, do you agree?

    Even taking Datacenter into consideration, the point I am making is that licensing the hosts at Enterprise (2005 or 2008) or Datacenter (2008R2) affords the unlimited licence scenario. When this is done you need to control VMs so they do not move to an unlicensed host!

    Regards

    Perry

    Regards

    Perry

    Reply
    • Perry – typically I don’t see shops deploying new SQL Servers using older versions. Some still do, but most plan their deployments around the current version.

      If we’re talking about licensing existing servers, the licensing on that is usually already done – they don’t replan licensing. Interesting thought though.

      Reply
  • We are considering virtualising our SQL server. Any pointers and or pitfalls would be appreciated

    Reply
    • Rob – I’ve written extensively about it here. If you click on Articles at the top of this page and then virtualization, you’ll find a lot of posts. Thanks.

      Reply
  • Brent,

    We have a core banking solution here that has a front end browser and is running on a virtualized sql server 2005. Major issues with app freezing and slowness. We had the issues prior to the virtualization but things are getting worse. Would it make sense to upgrade to 2008 or should we bring in a sql dba to troubleshoot. Not sure its because we virtualized. VSphere here….Enterprise Edition..

    Reply
    • MW – generally speaking, once things start going wrong, I don’t want to make any more *new* changes to a system unless I’ve got a proven reason as to why that will help. Adding more variables makes it that much tougher to determine the root cause. For example, if you’ve got full text queries, there are significant changes between 2005 and 2008 that could make your performance less predictable.

      If I was in your shoes, yes, I’d definitely bring in someone to help. I’ll send you an email to find out more.

      Reply
  • Perry whittle
    December 2, 2010 2:44 pm

    Answered your own question really. If issues exist before virtualisation, unfair to blame this technology at this point. Interestingly, what made you think virtual would make things better?
    Hire yourself a good consultant to check it out for you

    Reply
  • I think we have lost track of the original purpose of the article and one thing that I had never worried about until we started to virtualize our SQL Servers and that is performance monitoring. We use Vmware VCenter and have no access to it. So now I am at a loss in my SharePoint environment to make sensce of these weird reading like 130% CPU usage and SAN thrashing on our drives. We are looking at VMM (Virtual Machine Manager) or vFogLight. Are their any other third party tools that can come to the DBA’s rescue??

    Reply
  • Jim

    Which version of VMWare are you using?
    How many vcpu’s are assigned to the VM’s?
    How are the VM’s virtual disks setup?

    Regards

    Perry

    Reply
  • Has anyone tried setting up a SQL 2008 R2 cluster (active/passive) on one physical and the other node virtual? Is this a good idea?

    Reply
  • Jeffry Altamar
    February 22, 2011 4:29 pm

    Hello

    it is my understanding that VM has performed at more 90% of an equivalent physical machine. is this correct ? or using phisical server it is the best choice ? what do you think about ?

    Reply
  • Racim BOUDJAKDJI
    March 22, 2012 4:02 pm

    Hi Brent,

    Though it has improved on the last few years, virtualized hosts for databases have proved to be more of a fad than anything. Hypervizor IO hog, Lack of Transactional Fault Tolerance won’t be compensated by better salesman’s pitch. Virtualization nothing but a buzz that is already beginning to be replaced the new cloud hype.

    Conducted several above TB virtualized POC with VMWare best experts on the tuning. Verdict: Max Throughput = 120MB/s, up to 5% transactions lost during failover when > 1000 Transactions/sec. Most whitepapers/bentchmarks by VMWare rarely go above 10 Transactions/sec.

    There is pretty much nothing you can do with virtualization that you can’t do better without. Virtualized database hosts are like XML. As a good friend of mine puts it: a solution looking for a problem.

    The superiority of physical hosting will become even more obvious now that MS is moving to storage/subnet independent clustering and that physical consolidation becomes more central.

    Finally it is good to be a DBA again…

    Regards…

    Reply
    • Racim – hmm, I wouldn’t agree, but as long as your company’s able to continue to do physical hosting of your SQL Servers, that’s great. I’m having much better luck with virtualization than it sounds like you’re having, though – I’ve got clients large and small running pretty big servers under VMware. In fact, the storage/subnet independent clustering stuff actually works even better with virtualization.

      Reply
  • Racim BOUDJAKDJI
    March 22, 2012 4:14 pm

    Again it is a matter of limitations..

    Since most database don’t need above > 100MB/s bandwidth and people do not care any more about data quality, hence transactional fault tolerance during failover, virtualization can work for low to medium RPO, RTO but when things get serious, forget it.

    But again, my point is that there is nothing you can do with virtualized host that you can’t do better without. Simpler, hence more robust..

    I gave up virtualization 3 years ago. My life has gotten much better since 🙂

    Reply
    • Racim – that’s interesting. When you say there’s nothing you can do with virtualization that you can’t do better without, I’d like to hear more about that, specifically these scenarios.

      Say you have a SQL Server 2000 or 2005 application that needs high availability. The server name can’t change, and you need to be able to withstand hardware or datacenter failure. How would you accomplish that in a physical infrastructure? With virtualization, it’s fairly easy with VMware SRM, but I haven’t found an easy way to do that in a physical-only environment.

      Say you have dozens of SQL Servers that regularly need hardware replacement because the company leases hardware on a 2-3 year basis. How do you accomplish regular rotation of hardware in a physical-only environment with a minimum amount of work?

      Reply
  • Racim BOUDJAKDJI
    March 22, 2012 4:27 pm

    <>
    > Efficient Routing Scheme is the key issue to effective DSR on any version. Applicative redirection is less efficient.
    > The rest is Log Shipping/multisite clustering/usual stuff
    > On SQL 2012, that simply becomes a non issue with AlwaysOn topology. Not sure the AlwaysOn will actually port Instance Name Referencing for legacy systems though…

    <>
    > Backup/Copy/Restore
    > Add Node/Failover/Evict Old Node/Dump Old Node
    > Use OS Virtualization (Ex: Parallels Virtuozzo, which supports Failover)

    Reply
  • Racim BOUDJAKDJI
    March 22, 2012 4:41 pm

    Please define Minimum Amount of Work…

    With Asynchronous Database Miroring and a good network layer, I can pretty much Failover entire farms in less than hour.

    Reply
    • Minimum amount of work under virtualization means right click, Migrate. You can move dozens of servers at a time between hosts or between data centers while they’re online, no downtime, no lost data, no SQL knowledge required, no DBA required, during business hours. That’s a pretty tall order for us SQL guys to beat.

      Reply
  • Racim BOUDJAKDJI
    March 22, 2012 5:04 pm

    <>
    Last time I tried that with VMWare HA I simply lost 5% of transactions at time of failure on an above 1000/Transactions/sec at time of failure transactional throughput…So it looks like a trade in.

    As long as the move is planned, VMotion will kick in but in the other case, not because no buffer means no transactional fault tolerance.

    Most of the time Disaster Site is unplanned hence virtualization buffering capabilities won’t be available…

    I think they thought about associating VMWare HA(unplanned) and VMotion (planned buffer) but that there is still a limit to the number of cores associated…Forgot the name of the product…

    Well, I had to do a little scripting (I am not a good clicker) but I do not lose transactions during a failover…AlwaysOn simply will kill once and for all the need for virtualization platforms throughout storage/subnet independent role switching (which is done through Right Click too).

    Reply
  • Racim BOUDJAKDJI
    March 22, 2012 5:07 pm

    I am also dumping SAN storage…SAN simply does not cope with today’s volume explosion anymore…Returning to good ol’DAS with SQL 2012…

    Reply
  • Timothy Batts
    April 3, 2012 1:47 pm

    My entire production environment is virtualized, and I can’t get the sysadmin to give me more than one virtual processor per SQL VM, because he says it will increase processor waits on the host due to vmWare not beginning the CPU transaction until all virtual cores are available. Sounds like a bunch of BS to me, but is he right?

    I know for a fact the host runs a consistent 15-20% CPU right now, my SQL VM holds steady at 40-45% with regular spikes at 80-100%, and frequent Processor Queues 15-20+, I assume because there is only one processor to schedule. I assuming performance will greatly improve on 4 virtual processors, he says it won’t. Thoughts?

    Reply
    • Timothy – it depends on what you mean by performance improving. You’re both right – the VMware hosts will perform worse because they’ll have more CPU load to handle. Your SQL guests may perform better depending on the type of query.

      Reply
      • Timothy Batts
        April 3, 2012 2:21 pm

        I suppose personally since the host is only using 15-20% of available processing power, I’m personally not too concerned about host performance. I’m concerned about only SQL performance. It’s primarily transactional, with perhaps 100-150 concurrent users at any given time. I understand the type of queries won’t use/benefit from parallelism, but wouldn’t additional virtual processors eliminate the Processor Queue since those other transactions can be scheduled on another processor?

        Reply
        • Yep, or another way to do it might be to tune those queries to reduce the CPU load. I’d probably start by going to 2 vCPUs for SQL Server and then tune the queries.

          Reply
          • Racim BOUDJAKDJI
            April 3, 2012 3:29 pm

            Hey Brent…Thanks for the nice thread 🙂

            What is the point gaining time in punctual fail overs if one wastes into recurring time costly applicative tuning ? Nowadays developers simply don’t have a clue on how to do proper database application design. As an analogy, Subsystem level tuning is much more time efficient but the hypervizor overhead layer simply makes that type of tuning moot…

          • Racim – you’ve got a few things thrown in there together, and I’m not quite sure where to begin.

            Do virtual SQL Servers have costly application tuning? Yes, just as physical ones do.

            “Developers don’t have a clue on how to do database application design” – I’d disagree with that. I know a lot of really, really good developers.

            “The hypervizor overhead layer simply makes that type of tuning moot” – I’m not sure what you mean there. If you’re saying you can’t tune virtual servers, I’d strongly disagree. Hardware tuning is still very much an option – I make a living doing it. 😀

  • Racim BOUDJAKDJI
    April 4, 2012 11:47 am

    There is a misunderstanding..(Online exchange sometime sucks..)

    <>
    I believe you miss my point. I did not say there was no need for application tuning on physical. I simply said that subsystem tuning on virtual platform is less effective than on physical platforms because that is what happens when you have more layers. And I said that doing application tuning is more time consuming than doing subsystem tuning. If I move to virtual just to spend more time fixing code that is a waste of time.

    <>
    Guess my audience differs. Most developers I know simply don’t have a clue how to do relational modeling and database design. For instance, they don’t even know what a logical primary key is and often mistake the logical and physical layer in relational design. I guess you are lucky.

    <>
    See above. We do the same job but my perspective/experience seems different from yours and drives me to different conclusions.

    Regards…

    Reply
    • Racim – I simply don’t agree that subsystem tuning is less effective on virtual platforms. I do a lot of that type of tuning – in fact, I spent Monday doing it for a SQL Server in VMware, working on improving TempDB response time. I disagree that fixing code is a waste of time, too.

      Your experience is valuable, though, and I respect your viewpoint. Take care!

      Reply
      • Racim BOUDJAKDJI
        December 11, 2012 7:24 am

        Brent,

        Coming to think about it, perhaps we simply are not talking about the same thing. I can easily imagine that there is a VM specific way of tuning SQL Server systems but my point is that I have not seen any situation where virtualization is a better physical consolidation solution than a mutualized SQL server pool. Sure, you can tune up virtualized instance in a different way than physical instances but what exactly would be the point in doing that ?

        You pay extra licensing cost for VMWare, you do not have full transactional fault tolerance (unless you clusterize VM’s or use a VMWare failover buffer), an inherently poor IO subsystem throughput and IOPS (due to an additional layer), and extra consulting costs to make VM specific tuning for SQL Server.

        In conclusion, I do not believe we are disagreeing on anything since we are not discussing the same subject in the first place.

        Thanks for your input.

        Reply
        • Racim – hmm, I actually don’t agree with what you’re talking about there either, but if you’re not having pains that virtualization can solve, it probably doesn’t make sense to discuss it. Take care!

          Reply
          • Racim BOUDJAKDJI
            December 11, 2012 6:23 pm

            Quite frankly I don’t see any situation where virtualization can solve anything without creating bad side effects on performance or limiting physical consolidation, especially on SQL 2012.

            On AlwaysOn, on a single quad socket platform using, all I have to do to set up a failover replica on different server is 3 clicks for set up and 1 click to failover. Plus MS does not play ping pong in support with VMWare with my customers when they realize transactional fault tolerance is not guaranteed in event of a system level failover.

            How many SQL database applications have you consolidated on a single quad socket server pair using VMWare while insuring decent IO throughput and IOPS for each application ?

            The cost of database hosting on *per application* basis is much higher on a hypervizor than mutualized/consolidated physical SQL hosting because of the following:

            > VMWare licenses
            > You will consolidate less database applications because of the hypervizor’s limitations on IO subsystem throughput and IOPS
            > You will pay more for specialized tuning and additional VMWare expertise

            If the basic premise of this enlightening thread named,

            “Reasons Why You Shouldn’t Virtualize SQL Server”

            is “How to live better with something you don’t really need” then perhaps, I could actually comprehend your viewpoint and even agree with it.

            But if the premise of the thread is “Virtualization is cool because there is a way to use it and way to tune it ” then I can understand the source our current misunderstanding.

            Regards

  • Racim BOUDJAKDJI
    April 4, 2012 12:04 pm

    <>
    I have interest into your experience. I have had several VMWare Certified engineers failing to respond simply IO contention issues on above terabyte. But again the last time I had to do a SQL Server Consolidation POC was 2 years ago and perhaps things have changed since. The IO subsystem simply could not cope when concurrency increases.

    <>
    Ah OK. We can agree to disagree…:)

    <>
    Yep. Always good to look at it from a different viewpoint.

    Reply
  • Brent,
    I wonder how your opinion has been informed/impacted by the advent of Microsoft’s enhanced design of Windows Server 2012 and SQL Server 2012? MS touts the “improvements” repeatedly at different product seminars with statements similar to, “With these new enhancements, the old idea that you shouldn’t virtualize SQL server for things like SharePoint is no longer relevant.”
    I have no ax to grind with MS, and have followed your writings for some time with a ton of respect and admiration of your experience/expertise. I am trying to update my own opinion, but have had too few real opportunities with the new versions of Windows Server and SQL Server. Have you had enough opportunity to comment much on it?

    Reply
  • Brent,

    You da man. Your SQLIO article with Quest has helped me get metrics that are golden right now.

    I’m considering migrating a SQL server to ESX 5. We currently use multipathing of several (8 – 4 per san) gigabit connections trough software iscsi to a pair of Dell MD3220i SAN’s. Is this considered active/active load balancing?

    If so, how can I achieve the same level of performance virtualizing?

    Thanks!

    Chris

    Reply
    • Thanks, glad I could help.

      The term “active/active load balancing” can cover so many different things – it’s just too tough to cover here in a blog post comment. Your best bet is to do your own performance testing using tools like SQLIO and CrystalDiskMark to discover how much throughput you’re getting. Compare your numbers to Kendra Little’s bandwidth poster here: https://www.brentozar.com/archive/2012/03/how-big-your-pipe-check-out-bandwidth-reference-poster/ and you’ll see if you’re getting more throughput than a single cable would provide.

      If you’re not getting the performance you want, feel free to contact us and we can help with the performance configuration. Again, though, this is just way beyond something I can address quickly in a blog post comment.

      Good luck with the project!

      Reply
      • Brent,

        Thanks again. The bandwidth chart puts it into perspective with regards to how much data we can in fact transfer.

        So only count on one single connection? If I decide to drop the advanced features, in lieu of bandwidth I could technically just use software iscsi at the host level and pass through as many physical NIC’s to vNIC’s as I need for the bandwidth correct? This would eliminate the ability to use VMotion and the rest of the awesome tools, but would give me a virtualized machine and the bandwidth I need?

        I can feel your judgement already……

        Reply
  • mehdoini Abdallah
    November 9, 2014 8:07 am

    Brent Ozar, excellent list and thanks for sharing this consolidated view!

    Reply
  • Hello,
    What is your opinion about creating a DR plan that replicates sql server from physical to virtual.
    Is that feasible? What would be your suggestions recomendations on that scenario?
    Regards,
    Tom

    Reply
    • Tom – generally, we say to back up data, not servers. Copy the data instead.

      Reply
      • Hello Brent,
        Transactional replication from a physical SQL server to a virtual SQL server as a DR solution. Would that be possible to implement or is it a bad idea to even think about?

        Reply
        • Kendra Little
          March 10, 2015 1:14 pm

          Ooo, for an instant, detailed answer check out Brent’s video on HA/DR basics. He covers transactional replication and compares it to other options: https://www.brentozar.com/archive/2011/12/sql-server-high-availability-disaster-recovery-basics-webcast/

          Reply
          • Thank you Kendra, I have watched the webcast and it is really very informative. I was just wondering if the virtaulisation technology can support a scenario like this. I think that moving the physical server to a virtual server (SQL server) and then using another virtual machine on a remote location for DR it is possible, I do not know if the replication from the physical SQL server to the remote virtual SQL server would be possible,
            Best Regards,
            Tom

          • Kendra Little
            March 11, 2015 9:42 am

            Ah, OK, I thought you meant replication in terms of SQL Server transactional replication. It sounds like you mean it more generally in terms of data replication or copying. Totally understandable, that term is super confusing.

            There are some tools from multiple vendors which can take snapshots of a physical server that can be recovered to a virtual guest. For each of them you need to evaluate if they can meet your RPO and RTO, because typically they’ll either take unsafe copies of the data or need to quiesce the writes on the production server, and they aren’t things you can run constantly. Having those tools running also adds complexity to your troubleshooting process if performance is important. So while these tools exist, they aren’t a great fit for most folks.

  • Hi Brent- I have a question around how does vmware manage memory after the vmotion.

    Imagine a ESX cluster with 2 hosts A with guest L and B with guest M.

    Due to some memory pressure guest L had to be vmotioned to Host B.

    When this happens, will the buffer cache for guest L instance be flushed and it has to re-read the data as it continues it operation?

    Reply
    • SN – when you say “Due to some memory pressure guest L had to be vmotioned to Host B” – let’s examine that statement. If the VMware *host* is under memory pressure, odds are it’ll see that coming from quite a ways away, and it’ll use DRS to proactively vMotion guests around to balance load. You shouldn’t see this happening very frequently unless you’re dramatically starting up lots of guests and stopping them all the time – something I don’t usually see – and even then, you just want to do a better job of picking which hosts you’re starting them on.

      Lemme zoom back a little and ask, what are the symptoms you’re seeing? Describe the practical situation rather than a theoretical one.

      Reply
      • Thank you for responding, Brent!

        We are in the process of revisiting our infrastructure. Several times in the past when we had performance issues the SQL instance (guest) we traced them back to the vmotion. And hence my question, “will the buffer cache gets flushed when the guest is vmotioned to another ESX host”.

        We had the same thing happening happening for our Oracle guests too. I am not sure what the reason was but there were a LOT of vmotioning happening. We found that. Reduced the level of DRS setting so that vmotion doesn’t happen frequently. And now the guest are performing ~15% better.

        Reply
  • Does anyone have any advise or guidelines for what percentage of usage is a good time to add CPUs to a SQL 2008 R2 VM? My sys admin says 75%.

    Reply
  • It seems like on a SQL Server you want to perform as fast as possible that using VMware it is hard to get TempDB put on a very fast device. It seems no one even wants to discuss it let alone do it so … nothing. Also stingy on memory. What do you find with this? What do you do for a fast TempDB solution with SQL Server on VMware?

    Reply
    • Ron – generally speaking, if your primary bottleneck is TempDB IO speed, then it’s usually time to take a step back and see what the VM’s overall growth plans and development plans are. That’s a point where it often makes sense to go physical.

      If you’re just trying to be proactive and go as fast as you can inside VMware, then just stick with putting your data on the SAN. Keep VMotion easy and cheap.

      Reply
      • I have a SQL Server being used to convert a lot of data from different geographical locations for going to SAP. For many things virtualization can be fine but when you get to a certain point, a rack mount physical is better. Only getting part of a half height blade in an enclosure for a VM is challenging for high performance. And as I am sure you are aware the configuration of the I/O for the enclosure can very a lot and provide tuning challenges also. As you pointed out in your video, memory does help a lot, if you can get someone to provision it! That is another challenge to virtualization. Having to spend time going back and forth negotiating and proving and communicating. Can be very time consuming and rack up a lot of hours for everyone and impact timelines severely. Anyway it seems that in this case a physical may have been nice to work with.

        Reply
  • Lazaro Fernandez
    July 28, 2016 12:41 pm

    Brent, is this article still valid today? Any changes in your recommendations at this time?

    Thanks for the great work.

    Reply
  • Michael Mancusi
    January 31, 2017 9:16 pm

    Hey now that its 2017 would you change your opinion?

    Reply
  • All our customers use SQL on VMWare running on ESX hosts, and we are now migrating to AWS EC2 servers. No issues. I think your article is outdated.

    Reply
    • OK, cool – so did you follow the section with recommendations? For example, have you identified SQL Servers actively using multipathing on 2 or more HBAs?

      If that answer is no, then don’t worry – your customers never COULD get enough storage throughput anyway, because you didn’t have active/active multipathing set up, so they’re going to get the same performance on EC2 that they got on VMware.

      And hopefully (for you) they don’t benchmark their systems with any of your competitors. 😉

      Reply
  • This is year 2023, this article is very old. And still very, it is not outdated.
    I was the designer and pusher to go virtual. My only condition was to pin each node (WFCI) to a node and no virtual movement (in VMware – V-motion) should be applied for active node. Last night, it was done for active node…
    I was lucky I had to deal with only 4 corrupted DBs out of 100s.
    Please, think trough and be vigilant when you are designing your environment.
    Friendly DBA,
    Vlad

    Reply
  • Todd Pettijohn
    August 7, 2023 11:32 pm

    Our shop recently moved to Nutanix. I have noticed a performance change. Or it may be my imagination. Has anyone else moved from VMWare to Nutanix and what was your experience?

    Reply
  • Going through this blog post, I wonder if these considerations about keeping SQL Server on Physical server vs VM are still as true today

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.