What Do You Wanna Know About Storage?

Last Updated 7 years ago

During Dell DBA Days 2016, we got to hang out with Doug Bernhardt of Dell’s storage team. He’s the guy who publishes Dell’s SQL Server documentation, builds their Fast Track reference architectures, and gets to play with all kinds of cool toys.

Doug emailed us and asked:

Is there any solutions type of info (or any info in general) on either Dell or EMC SAN and SQL Server that you think would be helpful to you, your customers, or the SQL community in general? Anything that would help make better decisions about architecture, design, purchasing, or optimization of any of the Dell EMC gear?

And I figured there’s only one person I should ask – and that’s you, dear reader. So what documentation on storage would you like to see?

When Always On Isn’t: Handling Outages in Your Application

High Compiles and Multi-Statement Table Valued Functions

35 Comments. Leave new

I-think-Brent-Ozar-Is-A-National-Treasure
January 25, 2017 11:57 am

If the hardware vendor could provide a dashboard of sorts that would allow the SAN Admin and the Database Admin to agree, “Yeah, we have a storage problem.” That would be most helpful. I am watching that fight right now, no gloves, not pretty.

Reply
- Wes Crockett
  January 25, 2017 4:50 pm
  
  This x 1000.
  
  Reply
  - Aniruddha Shahasane
    January 27, 2017 10:41 pm
    
    Second that!
    
    Reply
Denise Crabtree
January 25, 2017 12:02 pm

Measuring/finding latency. If I run perfmon and see latency in the read/sec or write/sec, what are some ways to check similar counter from the storage side. Acknowledging that there are usually layers between: vm and network. Any insight into evaluating the entire picture would be helpful.

Reply
Kenneth Fisher
January 25, 2017 12:49 pm

Along the same lines as Mr. Treasure. If I think I have a storage issue, what information should I be collecting and how should I present it to my storage team?

Reply
Stephen Archbold
January 25, 2017 12:55 pm

Apart from quickstart/common troubleshooting guides, I’d like to see something formal around discrepancies between what Windows/SQL sees, and what the storage sees.

My use case here is that we would see 200ms latency on the OS, and 20ms on the LUN. It was a virtualized environment, using Paravirtual adapter etc.

The hidden component here was queue depth. It was 200+ in Windows, but only 64 on the LUN. VMWare has a default value of a queue depth of 64 for guests, which was throttling our workload.

We got to the bottom of that one, but some troubleshooting when there are layers between your workload and the drives would also be really useful (Filter drivers, virtualization, MDS switches etc.)

Reply
- Nate Arnold
  January 26, 2017 9:39 am
  
  This is great, and I think I might be in the same situation. Perfmon shows >200ms latency spikes on our SQL boxes (ESXi 6 VMs) but our SAN admin says things look fine from his end. How did you know (and prove) your storage adapter queue depth was overwhelmed? Is there a way to tell what the controller’s QD is from the guest OS?
  
  Reply
  - Mark Freeman
    January 26, 2017 7:13 pm
    
    I’m in this camp too. Figuring out whether the problem is in the SAN (Dell), the server NIC/drivers (Dell), VMWare, or the network in the middle somewhere (Some Dell, some other vendors) seems very difficult without specific guidance.
    
    My favorites are intermittent 833 errors (“SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete…”)
    
    Reply
    - Anthony
      January 30, 2017 8:46 am
      
      Agreed. I see this error constantly, san admins refuse to acknowledge the issue. Would be nice to have.
      
      Reply
    - Ray Herring
      January 30, 2017 11:19 am
      
      Second that 🙂
      
      Reply
Pete
January 25, 2017 1:57 pm

Back In the bad old days when I was an Oracle DBA we spent a lot of time arguing about raid levels and spreading files over magnetic Frisbees. These days I’m quite removed from the hardware. We run VM’s on VMware and storage is on a SAN which I have never ever seen let alone touched. How much does Raid and disk arrangement really matter to SQL these days? Windows just sees it as a volume. Should I just leave it to IT and treat it as a black box? Or is there scope for me to make significant improvements through tuning. configuration.

Reply
Ray Herring
January 25, 2017 3:44 pm

Running multi-Tbyte databases in a VM is a challenge. I wish Dell/EMC would publish something I can use when I talk to the SAN Admin that will convince them that provisioning me one, “the SAN’s really fast” 20TB partition is not really a solution. Either that or show me why it is a good solution.

Reply
Tony
January 25, 2017 4:52 pm

From the architecture perspective:
I’m the virtualization\storage guy, so I can give you my perspective.
If your systems aren’t architected correctly, then you will have performance issues. We are in the middle of re-architecting our SQL VMs, not due to horrible I/O or complaints but due to “it could be better” and learning as we all adapt to SQL in a virtual environment.
If DBAs say they see latency on the DB pointing to a disk, then it’s our job to make sure the systems are architected correctly and minimize the latency not eliminate it. The DBAs just see the numbers, it is our job to find an explanation and work together to find out where compromises can be made.
Here are some resources.
SQL on VMware.
http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/solutions/sql-server-on-vmware-best-practices-guide.pdf
Figure 6.16, 6.17 and 6.18 in the Pearson link below is the best example we have put together and have been building our architecture based on these. Not strictly on these designs because every environment is different, especially with server side caching (vFRC) on the data files and temp files, our I/O testing yielded far better results.
http://www.pearsonitcertification.com/articles/article.aspx?p=2240989&seqNum=3
vFRC
http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmware-vfrc-performance-vsphere55-white-paper.pdf
Also there is a great video from vmworld 2015 on the SQL virtualized topic. At 18:02 there is a slide that says “Strict Best Practice” then a few minutes later there is a “Realistic Disk layout”. This is the compromise I am referring to, the design and speeds will all be based on your design, environment factors, and frankly compromises.
https://www.youtube.com/watch?v=yqJXA1aL4YI
Keep in mind not every SQL server in your environment has to be architected exactly the same, again compromises based on workload, environment and feasibility.
Let me know if you would like to see my own diagrams of the SQL servers we are working on.

Reply
Ed K
January 25, 2017 5:42 pm

It would definitely be worth doing a series on the differences between new SSD all-flash SAN’s and old skinny types.
A lot of the old rules about different raid types and separating log and data go out the window. We use all flash DELL Compellents and the rules go out the window with 1-3ms latency at 100000 64k IOPS.

Snapshot and de duplication at the SAN level are also really interesting and evolving with solid state.

Reply
AlexG
January 26, 2017 3:08 am

This feels like asking what I don’t know. It depends on the scope that you want to write for, But start with a general over view of what you need to consider when scoping/provisioning/ordering a system and then start zooming in from there.

Others on this thread obviously have more experience with this than I, and may find the first one or two parts to be old rope, but to others up to date information on storage options and setup, shriven of the usual big brand marketing speak that gets in the way of comprehension, would be very valuable.

Reply
Ken
January 27, 2017 9:39 am

The DW reference architectures that Dell publishes are great, but I’d like to see ’em for OLTP, too. “Best OLTP box you can buy from Dell for $10k and under 1Tb”, best for $20k & 4Tb, etc.

Reply
- Doug Bernhardt
  August 6, 2019 10:58 am
  
  I like your thinking. the DWFT program is a Microsoft program. I’ve been trying to get them to do an OLTP one, but they haven’t been willing to fund it.
  
  Reply
Patrick Glennon
January 27, 2017 11:30 am

Long time listener, first time caller. Would be interested to hear about SQL performance on hyperconverged platforms like VxRail / VSAN and tuning strategies where virtualised storage is making many of the decisions for you in terms of block optimisation, hotspotting, etc

Reply
- Brent Ozar
  January 27, 2017 11:31 am
  
  Howdy sir! Good to see you here. I happened to be at my email when this came in and went, “Heeeeeey, I know that guy….”
  
  Reply
  - Patrick Glennon
    January 27, 2017 1:36 pm
    
    Hah! Good to see you hear also ;). If you come to town at some point let’s grab Charaka and get some beers.
    
    Reply
Pete
January 29, 2017 7:24 pm

There’s lots of great sounding suggestions in here ( and several that I don’t even understand). It would be really cool if we could up vote/like the suggestions like on FB or SE,

Reply
- Brent Ozar
  January 30, 2017 6:25 am
  
  Pete – come on, now. It’s our blog. All of the comments are likable. 😉
  
  Reply
  - Pete
    January 30, 2017 1:22 pm
    
    touché 😉
    
    Reply
  - A diffferent Pete
    September 19, 2017 4:31 pm
    
    I wanted to “like that”… but couldn’t…
    
    Reply
    - Actaully the same Pete
      September 19, 2017 4:32 pm
      
      Oh no wait.. that was me..
      
      Reply
Doug Bernhardt
January 30, 2017 3:22 pm

Thanks for all the great feedback so far. I am listening! Look for follow-up blog posts, webinars, or white papers to address these comments in the near future.

Reply
- Matthew Reed
  August 6, 2019 9:56 am
  
  Hi Doug,
  I did some digging on here, the Dell site, & Google for the follow-ups & I may have missed them, but I haven’t hit on anything yet. Where would be the best place to start looking?
  
  Thanks for your time!
  
  Reply
  - Doug Bernhardt
    August 6, 2019 10:55 am
    
    There wasn’t much follow-up because I didn’t receive many questions or requests. Feel free to contact me directly for any questions. Doug.Bernhardt@dell.com
    
    Reply
Ranga N
January 30, 2017 3:27 pm

Let the storage team tell us the real story. There should be AUTO logging of storage related issues we should be able to get for any database, by hour, day, weekly basis. Make is easily available . We don’t have to HUNT for this info. Hate watching perfmon to find storage issues.

This way, you get to know what you have built is helping the customer or not.

Reply
Alex Rodríguez
September 19, 2017 9:14 am

Hello Brent, let’s say that SQL will have a high I/O > 500 IOPS, what do you think could be the best option ? – attached SAN storage or SSD?

Reply
- Brent Ozar
  September 19, 2017 9:22 am
  
  My laptop does over 500 IOPS. It does over 5000 for that matter.
  
  Reply
  - Alex Rodríguez
    September 19, 2017 9:30 am
    
    So, basically we can say that when we talk about 500 IOPS SSD is not needed at all, we are
    killing flies with cannonballs … ?
    
    Reply
    - Brent Ozar
      September 19, 2017 10:14 am
      
      Yep.
      
      Reply
Alex Rodríguez
September 19, 2017 12:25 pm

ok, Sorry for bother, but how can I justify that fact ? what tests do you advise me to do ?, in thise case , supos we are talking about a new db design and I dont have servers etc..

Reply
- Brent Ozar
  September 19, 2017 2:02 pm
  
  Alex – unfortunately, personal system design is outside of the scope of what we can do fast in a blog post comment. If you’d like to hire us for consulting, click Consulting at the top of the page, or head to https://dba.stackexchange.com if you’d like free advice from the public. Thanks!
  
  Reply