Blog

The most successful deployments of virtual SQL Servers have a few things in common.  When a company is doing all of these things, odds are they’re going to be very happy with their virtual SQL Servers:

10. Use vSphere 4.1, not earlier versions. vSphere 4.1 introduced major improvements in multi-CPU scheduling that vastly improve performance.  SQL Server will break queries out across multiple virtual CPUs, and it expects all of those vCPUs to perform identically.  vSphere 4.1 helps make that happen.  You can read more about the vSphere CPU scheduler here.

Lacie IAmAKey Drive9. When using blades, avoid overcommitting switch uplinks. Virtualization works great with blades – but only when the SAN and network teams monitor utilization rates between the blade chassis switches and the rest of the networks.  All too frequently, the SQL Server is thrown into a blade chassis with dozens of other servers all competing for the same small amount of storage and network bandwidth leaving the blade chassis.  Everyone looks at their monitoring systems and can’t understand why the backups, big queries, and maintenance jobs run so slow, and the culprit is the hidden small uplink.

8. Avoid 1Gb iSCSI for storage. While this cheap and easy-to-use storage works great for most servers, it doesn’t work as well for SQL Server.  If you’re happy with the speed of SQL Server running on local storage, you can be in for a rude surprise with 1Gb iSCSI.

7. Test storage performance before deployment. I’m partial to SQLIO, which might be the worst-named tool in history.  It has absolutely nothing to do with SQL Server – it doesn’t require SQL Server, and it doesn’t mimic SQL Server.  It just flat out hammers IO using whatever parameters you pass in.  Using a test file of the same size as the expected database, the storage should consistently exceed 100MB/sec whether it’s random or sequential, reads or writes.  Higher numbers than that may be required due to the application design or end user expectations, but that’s a good foot-in-the-door number.

6. Compensate for slow storage with memory. If you can’t get the storage performance you need, you can help by caching as much of the database as possible in memory.  If the projected database size is 50GB but the virtual server only has 16GB of memory, then it’s not going to be able to cache the entire database.  Perhaps the users won’t be querying old data, in which case you might be able to get by with less.

5. Ensure CPU power saving is turned off. While Intel’s latest Xeon processors provide impressive power savings, they won’t ramp up to full processor speed unless the CPU is under heavy load.  SQL Server will rarely push the processors that hard, which means they stay slow – sounds good in theory, but in reality, every query takes 70-100% longer than it did pre-virtualization.  If you care about query performance, turn this setting off in the hardware BIOS.  You can read more about the power-saving CPU issue here.

4. Coordinate VMware reservations and shares with SQL Server settings. VMware vSphere has a great set of options to ensure that a guest OS gets the right amount of resources for its needs.  These settings need to be set in concert with SQL Server’s min and max memory settings.  There’s plenty of bad advice out on the web saying things like, “Just disable the balloon driver and give SQL Server all the memory” – that’s not right either.  There’s a happy medium in letting SQL Server and VMware cooperate to balance resources across multiple guests, but it only works when these settings are aware of each other.

3. Use Resource Pools to track SQL Server licensing. Microsoft’s virtualization licensing for SQL Server is notoriously complex.  Starting with SQL Server 2008R2, only Datacenter Edition ($60k/cpu socket) provides unlimited virtualization rights.  Enterprise Edition ($30k/cpu socket) provides just four VMs.  Tracking these closely with VMware Resource Pools can result in huge cost savings.  If SQL Servers are allowed to move to any host in the VMware cluster, then a licensing audit can produce staggering costs.

2. Use VMware HA for high availability. If your users can tolerate a SQL Server outage of 2-3 minutes, VMware HA is much easier to manage than a SQL Server cluster.  If your users require less than 30 seconds of downtime, consider implementing a physical SQL Server cluster instead.  SQL Server clusters are tricky enough to manage on their own, and doing them inside VMware adds an impractical level of management on servers that can’t be down for more than 30 seconds.

1. Virtualize small SQL Servers first. Start by gaining experience with 1-2 vCPU servers with under 16GB of memory.  As the company’s sysadmins grow accustomed to how SQL Server uses CPU and memory in a virtual environment, they’ll be more confident virtualizing larger servers.  If the DBAs and sysadmins don’t get along when trying to pin down performance problems on smaller servers, they’re going to be very adversarial when dealing with larger servers.

Wow – That’s a Demanding List!

I know what you’re thinking: “This outside consultant is greedy and expects everybody to dump tons of money into their SQL Servers.  He’s asking for the moon.  Nobody does this in real life.”

Let’s rewind back to the beginning of the recommendations where I said, “When a company is doing all of these things, odds are they’re going to be very happy with their virtual SQL Servers.” You can certainly skimp on some of these items and still stand a pretty good chance of being happy with your SQL Server performance.

The more items you skimp on, the worse your chances become.  If you implement a virtual SQL Server with 32GB of memory trying to cache 250GB of databases on RAID 5 storage, hooked up via 1GB iSCSI, with no prior experience virtualizing SQL Servers of that size, odds are you’re going to be miserable.  Users will scream and complain, and you’ll bring in an outsider who will track the problems back to these types of recommendations.  Keep my contact info handy.

Our Training Videos: VMware, SANs, and Hardware for SQL Server DBAs – our 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview:

Buy it now.

↑ Back to top
  1. Good article Brent.
    Some really clever thinking in there.
    I have one query. We looked at 10GB vs 1GB iSCSI there is a considerable cost difference. With VMWare 4.1 we were able to use the multipathing module to get up to 4 GB connectivity to our storage. Running SQLIO against this we were able to satisfy ourselves that it would meet our performance requirements, we now have it in production and are quite happy with it, the perfmon counters back this up! The only issue is all the extra cables in server room!

    • Tony – Thanks! I’m a little confused though – you said you had a query, but is there a question in here for me?

      • Brent,
        I was just wondering if we overlooked something. Is there a particular reason you recommend avoiding 1GB iSCSI, with our configuration we get 4GB throughput, how does this stack up against 4GB FC, in your experience?
        Tony

        • Tony – ah, gotcha. You’re most likely not getting 4GB read and write throughput to just one SQL Server data file unless you’re doing connection bonding at the switch level, and if you’re doing that, all 4 ports are going through one switch – which means you have no switch failover capabilities.

          • Thanks Brent,
            I am going to put a post together with my analysis and configuration.
            Tony

          • Brent,

            Some of today’s switches, namely the Cisco Nexus models, allow you to aggregate ports across multiple physical switches. May be a good note for the toolbox.

            Also, many folks tend to think that multiple aggregated links equal the same as one link of the same speed (i.e. 10 x 1Gb bonded vs 1 x 10Gb port). That’s frankly not true, as there’s overhead in managing the bond and you will find that the load balancing algorithm may only put one connection on one of the physical links. It’s always better to go with the faster link speed, if the budget will support it.

            Tom

          • Tom – thanks for the heads-up, but I haven’t seen anyone successfully managing port aggregation in the wild. It’s one of those things that yes, the hard-core guys can do it, but the rest of us (well, “them”, hahaha) don’t go to that level of detail.

  2. I’ve got a question about #7: “Test storage performance before deployment”.
    My guess is that you frequently look at systems *after* deployment. Are there ways (some perf counter over time maybe?) to look at performance of the disk subsystems after deployment?

    • Michael – unfortunately, no, there’s not a good way to see after deployment other than monitoring how much throughput you’re getting with SQL Server, but then, it can be a SQL problem instead of a SAN problem.

  3. Nice list Brent but quick comment on #7. If you’re going to run SQLIO, be very careful ESPECIALLY if that LUN is being shared by other VM’s. Your tests could potentially drag down a ton of other VMs not at all related to your application and you’re going to get a flood of calls with “my application is running like crap!”.

  4. “Avoid 1Gb iSCSI for storage”… is 10 GB iSCSI the way to go instead?

  5. Another great VM article. You do a wonderful job of pulling back the curtain on VM technology. This is a world many DBAs in large organization rarely get to see and have little control over. It is especially difficult to find a one-shop stop for all the best practices for virtualizing SQL Server. Thanks for your contributions!

  6. Brent, this is a great list! Thanks for taking the time to compile it.

    One comment regarding #3 (using resource pools to track licensing): I might be misunderstanding you here, but if you want to control which physical systems run SQL Server for the purposes of controlling license costs, resource pools are not the right tool. Instead, look at DRS host affinity rules, which will allow you to define rules to constrain the SQL Server VMs to a subset of ESX/ESXi hosts.

    Otherwise, great work!

  7. Content like this is knowledge gold to the DBA looking to get their learn on or has been thrown in the deep in end with Virtualization.

    I’ve got a lot of projects involving virtualization coming my way and there’s info in here that’s new to me right now. If it’s new to the Server Admins too then we’re in trouble :-)

    Thanks for sharing!

  8. Thanks for a great checklist Brent! I was at a SQL Server on Vmware session at EMC World and the presenter mentioned using the “Lock pages in memory” permission for the SQL service user. Would you suggest this for 64bit systems as well?

  9. Brent, thank you for this awesome checklist.
    I’m a dba and virtualized servers & db servers were “pushed” down our throats, and since then, performance has been slow as heck.
    At this stage we are incorrectly set to #10,9,8,7,6 & 4 as starters, even #1.
    Now I finally have something I can go to our vm/sys admin about and say, hey you have these set wrong!!!!!
    THANK YOU!
    He always thought we were being whiny dba’s!, But I KNEW I was right and our infrastructure wasn’t built properly for all that was virtualized.

  10. Great article. One other point that we have found important when migrating an existing database to vmware is to really baseline the performance of the existing server using perfmon with specific scenerios and even over specific length of time. When a user comes in and says the app seems slower. Having a target (real numbers) for acceptable performance makes fixing performance issues much more straight-forward. Using your suggestion above becomes more about picking one trying it out and checking to see if fixes the performance issue. For us it was #5, 70%-100% performance hit is very very true.

  11. Pingback: Something for the Weekend – SQL Server Links 03/06/11

  12. Awesome article and just in time as we are starting a plan to virtualize most of our sql servers when we upgrade from 2005 to 2008R2.

  13. Thank you for this list, much appreciated!! Can I get a little more direction on #4 “These settings need to be set in concert with SQL Server’s min and max memory settings … but it only works when these settings are aware of each other.” Do you really mean “aware” or do you mean that the settings are manually configured to same values?

    • Eve – they have to be manually configured in concert with each other – definitely not the same, though. SQL Server’s memory settings are only the amount of memory SQL uses by itself, not including the OS. It requires some SQL expertise to set these correctly in relation to the VMware reservation.

  14. Great. Now I have to research our current 1GB iSCSI to see where we stand and how we could get to 10GB iSCSI. If anyone has any research they’ve done or good links feel free to send them to me by email: arsqldba at gmail.

  15. Great timing, we are just looking at virtualizing our first SQL Server, so this will help.

    The first thing I asked about was storage I/O and they told me the blade chassis has 2) 10 Gb connections shared across 4 blades. Given that the SQL Server in question is currently generating up to 11 Gb of DAS storage I/O, I don’t think things are going to go well with it sharing 20 Gb with 20-50 other virtual servers.

    • Michael – wow, how are you generating 11Gb of DAS storage IO? That’s enough to saturate not one but two of the latest generation 6Gb SAS controllers. What kind of server is that, and what kind of storage are you using?

      • This is an old HP ProLiant ML570G4, and as things went we eliminated bottlenecks. The current bottleneck is the CPUs, and there is room to add 2. (We were just about to replace the server when they decided they wanted to go virtual.)

        The storage is setup this way:

        * P400 SmartArray controller with 512MB cache hooked to 18) 3Gb 10k SAS drives
        * P800 SmartArray controller with 512MB cache hooked to 9) 3Gb 15k SAS drives and 3) 1.5Gb 7.2k SATA drives
        * P812 SmartArray controller with 1GB cache hooked to 12) 3Gb 15k SAS drives

        Each controller is connected to the drive cage with a SAS 4x cable, 4) 3Gb physical links, and is capable of up to 1.2GB/sec of throughput if the drives can handle it. (9.6 Gb/sec if my calculations are correct.)

        The 11Gb measurement is during actual production load; if I run SQLIO on all of the drives at the same time I come to an aggregate of just over 2GB/sec of sequential reads which I calculate as 16Gb/sec. (The current storage bottlneck is the old PCIe bus in the server and the CPUs.

        The P812 controller is 6Gb SAS and is capable of 2.4GB/sec or about 19Gb/sec if we had gotten a 6Gb cage and drives. (And the PCIe bus could handle it.) Are you looking at SAS controllers that only have one 6Gb physical link?

        Needless to say I was shocked that you mentioned 100MB/sec as a minimum target for storage. (Our random writes drop to as low as 50 MB/sec on one of the arrays, but it is used for almost exclusively reads.)

        • Michael – Yep, absolutely, if you’re using three dedicated RAID controllers in a single server, you shouldn’t virtualize that box on anything less than 10Gb iSCSI.

          Also, just as a sanity check – you are WAY outside the norm there. Very, very, *very* few folks hook up multiple DAS RAID controllers in their servers. Usually when a server is that performance-critical, they’ve moved on to SAN storage.

          • Yeah, when talking to an HP support rep once he mentioned that “we sure have that server tricked out.”

            We only have 10 servers, all using DAS, and 2 of them have multiple RAID controllers. (The main SQL and the Exchange servers.)

            We looked at getting a SAN but the startup cost was out of this world. (Not to mention the FTEs and training to manage the SAN.)

            For just over $10k I can get a P812 controller, a drive cage that holds 12 drives, the SAS 4x cable, and 12) 450GB 3G 15k LFF SAS drives. For about $12.3k I can get the P812, a drive cage that holds 25 drives, the SAS 4x cable, and 25) 146GB 6G 15k SFF SAS drives. (Both come with three years of warranty/support.)

            How much would it cost to get 2.4GB/sec of throughput and 3.6TB of raw drive space from a SAN?

            Going back to my problem where they plan to “outsource” my server they currently have people complaining about the performance on a physical Oracle database server. (They are getting a maximum of 384MB/sec from the SAN with dual 4Gb HBAs.) Their answer to me was that they were moving away from the SAN and going to Network Appliances for the storage needs. I’m not getting any warm fuzzy feelings about this. (It will be way cheaper, but I suspect management will be very unhappy with the performance.)

        • Correction: the P812 is capable of 8GB/sec (4 GB in each direction) if it is hooked to at least 2) 6Gb SAS cages with drives fast enough to support the speed.

  16. Hi Brent, Regarding #5, you say to turn this off in the hardware BIOS. This may be a naive question, but the VM also has BIOS, right? I assume you’re referring to the BIOS of the ESX host or hosts that the SQL VM is living on?

  17. We are reviewing virtulizing our SQL 2005 server into VMWare 4.1 using a iSCSI 1gb SAN. Our databases are mostly under 1GB in size and the number of users hitting the database at the same time is usually under 10.

    #8 recommends not using iSCSI but under small loads would we notice much delay?

  18. Brent –

    I’m in the process of researching mirgrations to a VM environment and I keep getting led back to your list here. I have just two questions.

    1. HA – I currently support a few A/A clusters and haven’t had many issues with them and the downtime when failing over is usually less than 10 seconds. Is it that it takes that long to move the VM and spin it up on a different part of the VM cluster that takes the 2 – 3 minutes?

    2. I’ve researched and have seen several people on the ‘Net that have had numerous issues with IO performance. From your experience (just looking for general guidance), would you feel comfortable putting highly transactional OLTP enviornments on a shared ESX cluster?

    • Andrew:

      1. The 2-3 minutes is the time it takes to boot Windows from scratch and fully start SQL Server to the point where it can accept incoming queries. In active/active clustering, you can achieve sub-10 second failover when you manually fail nodes over, but it takes longer during real-life failures. SQL Server’s LooksAlive and IsAlive checks need to be run, for starters, and LooksAlive only runs every 10 seconds to begin with – and that’s before the failover process even starts. Be careful when you compare manual failover to a real emergency.

      2. I’ll answer this with another question: have you also found people on the net who have had numerous issues with IO performance on physical boxes?

      • Thank you Brent. You are right about the clustering. Our manual failovers take about 10 seconds. When we simulated power failure to one of the boxes, it took about 45 seconds for it to fully fail over, recover, and come back online.

        Also, with your question, you have answered mine.

  19. Great article..Quick question. How do you go about allocating CPUs to Virtual SQL Servers. I am in the process of Virtualizing some physical servers and VMWare’s best practices say dont allocate more CPUs than needed, but the servers originally have 2 quad core processors. I would like to start with 1 or 2 and add more as needed but that may not work for our DBAs. I know that the way VMWare works is if you assign a machine a processor the machine has access to all the cores of the physical processor however in Windows and in SQL only 1 processor is seen. Our DBA was curious about a setting called ” Degree of Parallelism” which he said may needed to be changed in SQL. Have you had to deal with this before.

  20. I would like to understand from you why iSCSI is not a good idea when using VMware?

    DO you also beleive iSCSI is not a good idea without Vmware for SQL?

    thanks

  21. Hi Brent

    This is good information, but with vsphere 5.1 and SQL 2012 AG’s, do you have an updated article that takes the new features into account? I’m looking at rolling out a new SQL enviroment with vsphere 5.1 and server 2012 – i’d be interested in hearing some real world experience with that. We’re still only using 1GB uplinks to the SAN storage (Equalogic) but we have 4 per ESXi host, so i’m hoping it won’t be so bad.

    Thanks. Loving the site so far

    • Steven – thanks, glad you like the site! Right now I don’t have an updated version of the post for SQL 2012 AGs, though. If you’re interested in consulting help, let me know. I’ve been too busy doing the implementations to blog much lately with the details – they’ve been too fluid with a lot of the recent changes.

  22. Pingback: Microsoft – SQLServer – Frenemy | Daniel Adeniji's – Learning in the Open

  23. Brent,
    Regarding item #9 in your listing, what would be the best way to monitor this? What tools and metrics would best cover uplink switches. We have both SAN and network teams, and I would like to approach both with a performance troubleshooting template that includes their view of performance for these vague issues where each team cannot find the problem. Lately, a lot of SQL VM performance troubleshooting is starting to resemble the old story about 3 blind men and an elephant…

    • Jeff – you actually can’t cover #9 from a SQL Server perspective. You have to use network monitoring tools for that, and whiteboard out the switch infrastructure so you know where the bottlenecks may be.

  24. Hello,

    I am just in the process of planning a SQL 2012 Cluster with 2 nodes in an active-active setup.
    I read some articles but most articles refer to the Enterprise Version for the AlwaysOn functionality.
    So if I want to use a simple 2 node cluster setup with SQL 2102, what do I have to keep in mind ? I have already planned for a FC SAN, that will also run some VMs from other hosts.
    Would it be wise to create a virtual SQL Cluster ? The background is that my developers want to make copies of the VMs to use them in testing environment, but I think that it is not a good idea, mainly because I don’t know if a snapshot of a complete cluster is possible in a consistent state.
    There are a lot of ressources, that a virtual cluster is possibel and works great, but I have my doubts. Unfortunately the bad sides of virtual clustering SQL are not mentioned often. Are the cons still valiud with Windows and SQL 2012 ( using VMware 5.1 ) ?
    Many thanks for your thoughts on that.

    Thomas

  25. Brent – great article as always. I found a few things we weren’t doing the best way. Since this article is 2 years old now, do you still stand by these as the top 10? Or are there different things to be concerned about with newer hypervisors and the upcoming release of SQL 2014? Keep up the good work!

  26. Hi Ozar
    I have question on the sizing :
    1)MSSQL 2008R2 64 bit and 2)number of Databases =250 and 250 users and 400 Transaction per DB /per Day
    how do i size the server a)Memory b) HardDisk
    Thank you in advance
    Day.

    • Hi, Day. Sizing a SQL Server is much trickier than we can address in a blog comment. If you’d like personalized consulting help, feel free to email us by clicking on the Contact link at the top of this site. Thanks!

  27. Brent, regarding your advice to start servers with 1 or 2 vCPUs, do you recommend any metrics to watch to tell if I have too many vCPUs assigned? For example, if I have 4 vCPUs, is it just a matter of watching total CPU usage, and if I’m well below 50% to lower it to 2 vCPUs to avoid CPU contention? Or do I need to dig deeper into CPU ready time, and similar counters? Thank you!

    • Andrew – unfortunately this kind of tuning advice is beyond something I can address in a blog comment – that’s why I’ve got the Storage, Hardware, and Virtualization training videos. Hope that helps!

    • Andrew there are few metrics in vmware that will tell you if you have over provisioned vCpus to a VM.

      Always start with single vcpu and work the way up. One of the important metric to watch is % RDY. If it’s going above value of 10 then it’s very likely you have over provisioned. I would also recommend having couple of SCSI contollers rather than just one for SQL vm’s to ensure there is no bottle neck. Hope this helps.

  28. Thanks Brant for another great article.

    ~Leon

  29. Hello Brent,

    we are in the process of SQL-VM configuration right now, and I just read your article. I can see that it is almost 3 years old, would you add something, change or emphasize even more in your text?

    Thanks.

  30. Dear Brent Ozar,
    I install Windows 2012 Datacenter + SqlServer 2008 R2 Enterprise ( + 20 vCPU, 128 GB RAM), VMware 5.5 iSCSI 10GB (Switch Force 10) Blade DELL + Squalogic PS6210XS (7 SSD 400 + 13 SAS 10K) Raid 6 (high performance).
    When you run the query the first time, I see the disk access throughput of up to 100MB / S, is ok, but .. when running the second time there is no disk access because everything is in HD SSD (CACHE) but the time implementation is similar … have no idea what it is?

    Thanks,
    Marcio Borghi

    • Marcio – I wish I could personally help everyone troubleshoot their issues in blog comments, but I just can’t do justice to this kind of detailed troubleshooting here. If you’d like personalized help, feel free to check out our SQL Critical Care at the top of the page.

  31. Hey Brent, I have another scenario for you. Do you mind if I upload the visio diagrams of our network?

    but no seriously, great article! will bookmark the site and check back often.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php