What Do You Wanna Know About Storage?

The Great Hardware Robbery of 2016

During Dell DBA Days 2016, we got to hang out with Doug Bernhardt of Dell’s storage team. He’s the guy who publishes Dell’s SQL Server documentation, builds their Fast Track reference architectures, and gets to play with all kinds of cool toys.

Doug emailed us and asked:

Is there any solutions type of info (or any info in general) on either Dell or EMC SAN and SQL Server that you think would be helpful to you, your customers, or the SQL community in general?  Anything that would help make better decisions about architecture, design, purchasing, or optimization of any of the Dell EMC gear?

And I figured there’s only one person I should ask – and that’s you, dear reader. So what documentation on storage would you like to see?

Previous Post
When Always On Isn’t: Handling Outages in Your Application
Next Post
High Compiles and Multi-Statement Table Valued Functions

35 Comments. Leave new

  • I-think-Brent-Ozar-Is-A-National-Treasure
    January 25, 2017 11:57 am

    If the hardware vendor could provide a dashboard of sorts that would allow the SAN Admin and the Database Admin to agree, “Yeah, we have a storage problem.” That would be most helpful. I am watching that fight right now, no gloves, not pretty.

  • Denise Crabtree
    January 25, 2017 12:02 pm

    Measuring/finding latency. If I run perfmon and see latency in the read/sec or write/sec, what are some ways to check similar counter from the storage side. Acknowledging that there are usually layers between: vm and network. Any insight into evaluating the entire picture would be helpful.

  • Along the same lines as Mr. Treasure. If I think I have a storage issue, what information should I be collecting and how should I present it to my storage team?

  • Stephen Archbold
    January 25, 2017 12:55 pm

    Apart from quickstart/common troubleshooting guides, I’d like to see something formal around discrepancies between what Windows/SQL sees, and what the storage sees.

    My use case here is that we would see 200ms latency on the OS, and 20ms on the LUN. It was a virtualized environment, using Paravirtual adapter etc.

    The hidden component here was queue depth. It was 200+ in Windows, but only 64 on the LUN. VMWare has a default value of a queue depth of 64 for guests, which was throttling our workload.

    We got to the bottom of that one, but some troubleshooting when there are layers between your workload and the drives would also be really useful (Filter drivers, virtualization, MDS switches etc.)

    • This is great, and I think I might be in the same situation. Perfmon shows >200ms latency spikes on our SQL boxes (ESXi 6 VMs) but our SAN admin says things look fine from his end. How did you know (and prove) your storage adapter queue depth was overwhelmed? Is there a way to tell what the controller’s QD is from the guest OS?

      • I’m in this camp too. Figuring out whether the problem is in the SAN (Dell), the server NIC/drivers (Dell), VMWare, or the network in the middle somewhere (Some Dell, some other vendors) seems very difficult without specific guidance.

        My favorites are intermittent 833 errors (“SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete…”)

  • Back In the bad old days when I was an Oracle DBA we spent a lot of time arguing about raid levels and spreading files over magnetic Frisbees. These days I’m quite removed from the hardware. We run VM’s on VMware and storage is on a SAN which I have never ever seen let alone touched. How much does Raid and disk arrangement really matter to SQL these days? Windows just sees it as a volume. Should I just leave it to IT and treat it as a black box? Or is there scope for me to make significant improvements through tuning. configuration.

  • Running multi-Tbyte databases in a VM is a challenge. I wish Dell/EMC would publish something I can use when I talk to the SAN Admin that will convince them that provisioning me one, “the SAN’s really fast” 20TB partition is not really a solution. Either that or show me why it is a good solution.

  • From the architecture perspective:
    I’m the virtualization\storage guy, so I can give you my perspective.
    If your systems aren’t architected correctly, then you will have performance issues. We are in the middle of re-architecting our SQL VMs, not due to horrible I/O or complaints but due to “it could be better” and learning as we all adapt to SQL in a virtual environment.
    If DBAs say they see latency on the DB pointing to a disk, then it’s our job to make sure the systems are architected correctly and minimize the latency not eliminate it. The DBAs just see the numbers, it is our job to find an explanation and work together to find out where compromises can be made.
    Here are some resources.
    SQL on VMware.
    Figure 6.16, 6.17 and 6.18 in the Pearson link below is the best example we have put together and have been building our architecture based on these. Not strictly on these designs because every environment is different, especially with server side caching (vFRC) on the data files and temp files, our I/O testing yielded far better results.
    Also there is a great video from vmworld 2015 on the SQL virtualized topic. At 18:02 there is a slide that says “Strict Best Practice” then a few minutes later there is a “Realistic Disk layout”. This is the compromise I am referring to, the design and speeds will all be based on your design, environment factors, and frankly compromises.
    Keep in mind not every SQL server in your environment has to be architected exactly the same, again compromises based on workload, environment and feasibility.
    Let me know if you would like to see my own diagrams of the SQL servers we are working on.

  • It would definitely be worth doing a series on the differences between new SSD all-flash SAN’s and old skinny types.
    A lot of the old rules about different raid types and separating log and data go out the window. We use all flash DELL Compellents and the rules go out the window with 1-3ms latency at 100000 64k IOPS.

    Snapshot and de duplication at the SAN level are also really interesting and evolving with solid state.

  • This feels like asking what I don’t know. It depends on the scope that you want to write for, But start with a general over view of what you need to consider when scoping/provisioning/ordering a system and then start zooming in from there.

    Others on this thread obviously have more experience with this than I, and may find the first one or two parts to be old rope, but to others up to date information on storage options and setup, shriven of the usual big brand marketing speak that gets in the way of comprehension, would be very valuable.

  • The DW reference architectures that Dell publishes are great, but I’d like to see ’em for OLTP, too. “Best OLTP box you can buy from Dell for $10k and under 1Tb”, best for $20k & 4Tb, etc.

    • Doug Bernhardt
      August 6, 2019 10:58 am

      I like your thinking. the DWFT program is a Microsoft program. I’ve been trying to get them to do an OLTP one, but they haven’t been willing to fund it.

  • Patrick Glennon
    January 27, 2017 11:30 am

    Long time listener, first time caller. Would be interested to hear about SQL performance on hyperconverged platforms like VxRail / VSAN and tuning strategies where virtualised storage is making many of the decisions for you in terms of block optimisation, hotspotting, etc

  • There’s lots of great sounding suggestions in here ( and several that I don’t even understand). It would be really cool if we could up vote/like the suggestions like on FB or SE,

  • Thanks for all the great feedback so far. I am listening! Look for follow-up blog posts, webinars, or white papers to address these comments in the near future.

    • Matthew Reed
      August 6, 2019 9:56 am

      Hi Doug,
      I did some digging on here, the Dell site, & Google for the follow-ups & I may have missed them, but I haven’t hit on anything yet. Where would be the best place to start looking?

      Thanks for your time!

  • Let the storage team tell us the real story. There should be AUTO logging of storage related issues we should be able to get for any database, by hour, day, weekly basis. Make is easily available . We don’t have to HUNT for this info. Hate watching perfmon to find storage issues.

    This way, you get to know what you have built is helping the customer or not.

  • Hello Brent, let’s say that SQL will have a high I/O > 500 IOPS, what do you think could be the best option ? – attached SAN storage or SSD?

  • ok, Sorry for bother, but how can I justify that fact ? what tests do you advise me to do ?, in thise case , supos we are talking about a new db design and I dont have servers etc..


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.