Blitz Result: Slow Storage Reads or Writes

SQL Server feels the need – the need for speed.  Storage speed, to be more specific.  One of the most common bottlenecks is underperforming storage subsystems.

SQL Server tracks read and write speeds for each database file – both data and log files.  This part of our SQL Server sp_Blitz script checks sys.dm_io_virtual_file_stats looking for average read stalls (latency) over 200 milliseconds and average write stalls over 100 milliseconds. Yes, those thresholds are horrifically high – but that’s the point. We only want to alert you when the numbers are awful.

This script breaks it down by file:

What exactly is a bad number here?  That’s a really good question, and it’s open for debate.  At Microsoft SQL Customer Advisory Team – in point #4 in their Top 10 OLTP Issues, they suggest that reads over 15ms are a bottleneck and that log file writes should be 1ms or less.  Keep in mind that this team usually deals with high-performance environments, though – not every SQL Server can afford to be quite this fast.

No matter whose guidelines you pick, I’ve got bad news: you’re already doing worse.

To Fix the Problem

There’s two ways to fix slow storage: make the storage go faster, or ask the storage do to less work.

The second option – asking storage to do less work – is usually the easiest way to start.  By dropping unused indexes, we can insert/update/delete data faster, run backups faster, do DBCCs faster, and even do index rebuild jobs faster.  By using the right indexes, we can avoid storage-intensive table scans and do less data juggling in TempDB.

Return to sp_Blitz or Ask Us Questions

15 Comments.

  • First I’d like to thank you for all the great work and tips you offer to the guys like me just trying to learn. Your blitz sp provided me with more information than I could have imagined and has helped me catch few misconfigurations on some of our production databases. However I have a question I’m hoping you can help me with or point me in the right direction regarding one of the results. One of the findings I got was that there is Slow Storage Writes on Drive T which is where my TempDB files are, in your details column it says for specific database file speeds, run the query from the information link. Following the URL you provide in the results https://www.brentozar.com/blitz/slow-storage-reads-writes/?VersionNumber=1 which leads me to this page I don’t see a query I can run to get more specific info. I ran the queries to find heap tables, and find either missing indexes or indexes not in use but I am not sure if I missing another query or not?

  • How about this: (I ran into this and read over the sp code, and pulled this out…)

    SELECT mf.Name
    FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
    INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
    AND fs.[file_id] = mf.[file_id]
    WHERE ( io_stall_read_ms / ( 1.0 + num_of_reads ) ) > 100 ;

  • Daniel Liuzzi
    March 13, 2012 4:29 pm

    Hi Brent. Is it normal to get this message on an SSD? I’m setting up a brand new server which has an HDD system drive (C), and then two SSDs that I’ll use for SQL Server (D and E). Currently, D drive has TempDB and transaction logs, and E drive has the user DBs. I’m baffled to see sp_Blitz report Slow Storage Writes on Drive E (D drives seems OK). Both SSDs are Intel X25-E Extreme, which are supposed to be blazing fast. Do you think one of the drives might be faulty? Thanks!

    • Daniel – no, it’s definitely not normal to see that on SSDs. Are you using the SSDs without any RAID controller or mirroring? That’d make me a little nervous. Assuming that you’re using a RAID controller or software mirroring, that might be where your bottleneck is. I’d try running a test with CrystalDiskMark on the SSDs when SQL Server is not running, and see what kind of results you get. Be sure to use 5 passes and the largest test file size (I wanna say it’s 4000MB) – these are dropdowns on the CrystalDiskMark tool You can email me the screenshot if you want.

      • Daniel Liuzzi
        March 13, 2012 5:05 pm

        Brent,

        First of all thanks for such a prompt reply, much appreciated. I actually had to go into the IPMI console to find out how the drives are setup, since I don’t have physical access to this box (it’s a dedicated server in SoftLayer).

        Anyway, the server has an Adaptec RAID 5405Z controller setup as Non-RAID; when I go into the controller configuration, I see three “arrays” JBOD-A, JBOD-B and JBOD-C, each having a single drive in them, and doesn’t give me much information. It just says Array Status: OPTIMAL.

        CrystalDiskMark comes back with a “Drive Not Found” message, so I won’t let me run any benchmarks. I’m wondering if I should try and get rid of this controller, in favor of a standard SATA controller. Since RAID is not used here, maybe it’s creating overheard somewhere.

        Thanks again.

        • Hmm, so stepping back a little – I wouldn’t run SQL Server without redundant storage. It’s pretty risky even without solid state drives.

          • Daniel Liuzzi
            March 13, 2012 7:07 pm

            I see what you mean. I was trying to get as much performance as possible out of those two drives, so I was putting data files and transaction logs in different volumes. But what you say about redundancy is way more reasonable thinking.

            I will follow your advice and setup the two SSDs in RAID1. It is a compromise, as putting everything back in the same volume will have some impact in the performance, and I will also lose half the space, but I’ll gain the extra peace of mind of knowing that data is a little safer.

            Thanks once again.

  • I have a client that is having performance problems (sql server 2008 r2) on a very random basis. is there a way (without having to buy a tool) to some how monitor the database to know what the bottlenecks are? they recently had a timeout problem and narrowed it down to a particular stored procedure but when the stored procedure was run at a different time it ran very fast compared to timing out earlier in the day. We need to know what is going on with the db env when performance is bad.

    Any advice?

    Thank you for your time

    Tim G

  • Hi Brent,

    I ran your script today and its just amazing this is very handy and helpfull for a quick health check. I firstly thank you very much for providing this. I have a quick question for the slow storage out put in the script, is it possible to get the results for a specific mount volume if its slow. My company uses Mount volumes all over the place. You script gave me the result saying the drive G is slow. This is actually a Mount point and have several Mount volumes in it can you pelase advice. I would like to know which Mount Volume is having I/O issues.

    • Abdul – thanks, glad we could help. In the article above, click on the link that says “Find out which data and log files have the most reads/writes/stalls” and it’ll show you exactly which files (and drives) are the slowest.

  • I needed to drill down to the database level for a slow read/write I/O. I modified the Blitz query to isolate the actual values of an actual database.

    SELECT DISTINCT
    36 AS CheckID ,
    ( io_stall_read_ms / ( 1.0 + num_of_reads ) ) AS tReadValue ,
    ‘Performance’ AS FindingsGroup ,
    ‘Slow Storage Reads on Drive ‘
    + UPPER(LEFT(mf.physical_name, 1)) AS Finding ,
    ‘https://www.brentozar.com/go/slow’ AS URL ,
    ‘Reads are averaging longer than 100ms for at least one database on this drive. For specific database file speeds, run the query from the information link.’ AS Details
    FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
    INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
    AND fs.[file_id] = mf.[file_id]
    WHERE ( io_stall_read_ms / ( 1.0 + num_of_reads ) ) > 100;

    SELECT DISTINCT
    37 AS CheckID ,
    ( io_stall_write_ms / ( 1.0 + num_of_writes ) ) AS tWriteValue ,
    ‘Performance’ AS FindingsGroup ,
    ‘Slow Storage Writes on Drive ‘
    + UPPER(LEFT(mf.physical_name, 1)) AS Finding ,
    ‘https://www.brentozar.com/go/slow’ AS URL ,
    ‘Writes are averaging longer than 20ms for at least one database on this drive. For specific database file speeds, run the query from the information link.’ AS Details
    FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
    INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
    AND fs.[file_id] = mf.[file_id]
    WHERE ( io_stall_write_ms / ( 1.0 + num_of_writes ) ) > 20;

  • About the preceeding post, I meant to ask if anyone has found a better way to get the Read/Write info to identify a particular’s database performance.

  • Brent,
    First off, you are a great resource.

    In regards to querying sys.dm_io_virtual_file_stats alone and interpreting the latency, I think this can be misleading since it is cumulative. If your server has been up a while and for example, you have large DW imports or have a full blown nightly maintenance plan (integrity checks and index rebuilds with sort in TempDB), this drastically skew your results especially for TempDB.

    I ran across the following script and it is yielding more accurate results for latency when compared to perfmon counters for disk latency when run for the same time period being monitored: http://blogs.msdn.com/b/mhouse/archive/2012/02/14/5-minute-io-script.aspx.

    I think this DMV maybe widely misinterpreted in regards to disk latency when not digested correctly. I think it is more valuable for overall vf activity.

    Kind Regards!

    Thanks,
    Steve

    • Steve – thanks, yep, I use that query myself from time to time. Thing is, it requires 5 minutes to get numbers – and the point of sp_Blitz is rapid diagnostics right away. They both have great uses. Thanks!