Slow Storage Reads or Writes

Blitz Result: Slow Storage Reads or Writes

SQL Server feels the need – the need for speed. Storage speed, to be more specific. One of the most common bottlenecks is underperforming storage subsystems.

SQL Server tracks read and write speeds for each database file – both data and log files. This part of our SQL Server sp_Blitz script checks sys.dm_io_virtual_file_stats looking for average read stalls (latency) over 200 milliseconds and average write stalls over 100 milliseconds. Yes, those thresholds are horrifically high – but that’s the point. We only want to alert you when the numbers are awful.

This script breaks it down by file:

	SELECT  DB_NAME(a.database_id) AS [Database Name] ,
        b.name + N' [' + b.type_desc COLLATE SQL_Latin1_General_CP1_CI_AS + N']' AS [Logical File Name] ,
        UPPER(SUBSTRING(b.physical_name, 1, 2)) AS [Drive] ,
        CAST(( ( a.size_on_disk_bytes / 1024.0 ) / (1024.0*1024.0) ) AS DECIMAL(9,2)) AS [Size (GB)] ,
        a.io_stall_read_ms AS [Total IO Read Stall] ,
        a.num_of_reads AS [Total Reads] ,
        CASE WHEN a.num_of_bytes_read > 0 
            THEN CAST(a.num_of_bytes_read/1024.0/1024.0/1024.0 AS NUMERIC(23,1))
            ELSE 0 
        END AS [GB Read],
        CAST(a.io_stall_read_ms / ( 1.0 * a.num_of_reads ) AS INT) AS [Avg Read Stall (ms)] ,
        CASE 
            WHEN b.type = 0 THEN 30 /* data files */
            WHEN b.type = 1 THEN 5 /* log files */
            ELSE 0
        END AS [Max Rec Read Stall Avg],
        a.io_stall_write_ms AS [Total IO Write Stall] ,
        a.num_of_writes [Total Writes] ,
        CASE WHEN a.num_of_bytes_written > 0 
            THEN CAST(a.num_of_bytes_written/1024.0/1024.0/1024.0 AS NUMERIC(23,1))
            ELSE 0 
        END AS [GB Written],
        CAST(a.io_stall_write_ms / ( 1.0 * a.num_of_writes ) AS INT) AS [Avg Write Stall (ms)] ,
        CASE 
            WHEN b.type = 0 THEN 30 /* data files */
            WHEN b.type = 1 THEN 2 /* log files */
            ELSE 0
        END AS [Max Rec Write Stall Avg] ,
        b.physical_name AS [Physical File Name],
        CASE
            WHEN b.name = 'tempdb' THEN 'N/A'
            WHEN b.type = 1 THEN 'N/A' /* log files */
            ELSE 'PAGEIOLATCH*'
        END AS [Read-Related Wait Stat],
        CASE
            WHEN b.type = 1 THEN 'WRITELOG' /* log files */
            WHEN b.name = 'tempdb' THEN 'xxx' /* tempdb data files */
            WHEN b.type = 0 THEN 'ASYNC_IO_COMPLETION' /* data files */
            ELSE 'xxx'
        END AS [Write-Related Wait Stat],
        GETDATE() AS [Sample Time],
        b.type_desc
	FROM    sys.dm_io_virtual_file_stats(NULL, NULL) AS a
			INNER JOIN sys.master_files AS b ON a.file_id = b.file_id
												AND a.database_id = b.database_id
	WHERE   a.num_of_reads > 0
			AND a.num_of_writes > 0
	ORDER BY  CAST(a.io_stall_read_ms / ( 1.0 * a.num_of_reads ) AS INT) DESC;

SELECT DB_NAME(a.database_id) AS [Database Name] ,

b.name + N' [' + b.type_desc COLLATE SQL_Latin1_General_CP1_CI_AS + N']' AS [Logical File Name] ,

UPPER(SUBSTRING(b.physical_name, 1, 2)) AS [Drive] ,

CAST(( ( a.size_on_disk_bytes / 1024.0 ) / (1024.0*1024.0) ) AS DECIMAL(9,2)) AS [Size (GB)] ,

a.io_stall_read_ms AS [Total IO Read Stall] ,

a.num_of_reads AS [Total Reads] ,

CASE WHEN a.num_of_bytes_read > 0

THEN CAST(a.num_of_bytes_read/1024.0/1024.0/1024.0 AS NUMERIC(23,1))

ELSE 0

END AS [GB Read],

CAST(a.io_stall_read_ms / ( 1.0 * a.num_of_reads ) AS INT) AS [Avg Read Stall (ms)] ,

CASE

WHEN b.type = 0 THEN 30 /* data files */

WHEN b.type = 1 THEN 5 /* log files */

ELSE 0

END AS [Max Rec Read Stall Avg],

a.io_stall_write_ms AS [Total IO Write Stall] ,

a.num_of_writes [Total Writes] ,

CASE WHEN a.num_of_bytes_written > 0

THEN CAST(a.num_of_bytes_written/1024.0/1024.0/1024.0 AS NUMERIC(23,1))

ELSE 0

END AS [GB Written],

CAST(a.io_stall_write_ms / ( 1.0 * a.num_of_writes ) AS INT) AS [Avg Write Stall (ms)] ,

CASE

WHEN b.type = 0 THEN 30 /* data files */

WHEN b.type = 1 THEN 2 /* log files */

ELSE 0

END AS [Max Rec Write Stall Avg] ,

b.physical_name AS [Physical File Name],

CASE

WHEN b.name = 'tempdb' THEN 'N/A'

WHEN b.type = 1 THEN 'N/A' /* log files */

ELSE 'PAGEIOLATCH*'

END AS [Read-Related Wait Stat],

CASE

WHEN b.type = 1 THEN 'WRITELOG' /* log files */

WHEN b.name = 'tempdb' THEN 'xxx' /* tempdb data files */

WHEN b.type = 0 THEN 'ASYNC_IO_COMPLETION' /* data files */

ELSE 'xxx'

END AS [Write-Related Wait Stat],

GETDATE() AS [Sample Time],

b.type_desc

FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS a

INNER JOIN sys.master_files AS b ON a.file_id = b.file_id

AND a.database_id = b.database_id

WHERE a.num_of_reads > 0

AND a.num_of_writes > 0

ORDER BY CAST(a.io_stall_read_ms / ( 1.0 * a.num_of_reads ) AS INT) DESC;

What exactly is a bad number here? That’s a really good question, and it’s open for debate. At Microsoft SQL Customer Advisory Team – in point #4 in their Top 10 OLTP Issues, they suggest that reads over 15ms are a bottleneck and that log file writes should be 1ms or less. Keep in mind that this team usually deals with high-performance environments, though – not every SQL Server can afford to be quite this fast.

No matter whose guidelines you pick, I’ve got bad news: you’re already doing worse.

To Fix the Problem

There’s two ways to fix slow storage: make the storage go faster, or ask the storage do to less work.

The second option – asking storage to do less work – is usually the easiest way to start. By dropping unused indexes, we can insert/update/delete data faster, run backups faster, do DBCCs faster, and even do index rebuild jobs faster. By using the right indexes, we can avoid storage-intensive table scans and do less data juggling in TempDB.

Return to sp_Blitz or Ask Us Questions

15 Comments.

Ayanes Apolinar
October 25, 2011 12:47 pm

First I’d like to thank you for all the great work and tips you offer to the guys like me just trying to learn. Your blitz sp provided me with more information than I could have imagined and has helped me catch few misconfigurations on some of our production databases. However I have a question I’m hoping you can help me with or point me in the right direction regarding one of the results. One of the findings I got was that there is Slow Storage Writes on Drive T which is where my TempDB files are, in your details column it says for specific database file speeds, run the query from the information link. Following the URL you provide in the results https://www.brentozar.com/blitz/slow-storage-reads-writes/?VersionNumber=1 which leads me to this page I don’t see a query I can run to get more specific info. I ran the queries to find heap tables, and find either missing indexes or indexes not in use but I am not sure if I missing another query or not?
Lisa Bohm
October 26, 2011 12:49 pm

How about this: (I ran into this and read over the sp code, and pulled this out…)

SELECT mf.Name
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
WHERE ( io_stall_read_ms / ( 1.0 + num_of_reads ) ) > 100 ;
Daniel Liuzzi
March 13, 2012 4:29 pm

Hi Brent. Is it normal to get this message on an SSD? I’m setting up a brand new server which has an HDD system drive (C), and then two SSDs that I’ll use for SQL Server (D and E). Currently, D drive has TempDB and transaction logs, and E drive has the user DBs. I’m baffled to see sp_Blitz report Slow Storage Writes on Drive E (D drives seems OK). Both SSDs are Intel X25-E Extreme, which are supposed to be blazing fast. Do you think one of the drives might be faulty? Thanks!
- Brent Ozar
  March 13, 2012 4:32 pm
  
  Daniel – no, it’s definitely not normal to see that on SSDs. Are you using the SSDs without any RAID controller or mirroring? That’d make me a little nervous. Assuming that you’re using a RAID controller or software mirroring, that might be where your bottleneck is. I’d try running a test with CrystalDiskMark on the SSDs when SQL Server is not running, and see what kind of results you get. Be sure to use 5 passes and the largest test file size (I wanna say it’s 4000MB) – these are dropdowns on the CrystalDiskMark tool You can email me the screenshot if you want.
  - Daniel Liuzzi
    March 13, 2012 5:05 pm
    
    Brent,
    
    First of all thanks for such a prompt reply, much appreciated. I actually had to go into the IPMI console to find out how the drives are setup, since I don’t have physical access to this box (it’s a dedicated server in SoftLayer).
    
    Anyway, the server has an Adaptec RAID 5405Z controller setup as Non-RAID; when I go into the controller configuration, I see three “arrays” JBOD-A, JBOD-B and JBOD-C, each having a single drive in them, and doesn’t give me much information. It just says Array Status: OPTIMAL.
    
    CrystalDiskMark comes back with a “Drive Not Found” message, so I won’t let me run any benchmarks. I’m wondering if I should try and get rid of this controller, in favor of a standard SATA controller. Since RAID is not used here, maybe it’s creating overheard somewhere.
    
    Thanks again.
    - Brent Ozar
      March 13, 2012 5:06 pm
      
      Hmm, so stepping back a little – I wouldn’t run SQL Server without redundant storage. It’s pretty risky even without solid state drives.
      - Daniel Liuzzi
        March 13, 2012 7:07 pm
        
        I see what you mean. I was trying to get as much performance as possible out of those two drives, so I was putting data files and transaction logs in different volumes. But what you say about redundancy is way more reasonable thinking.
        
        I will follow your advice and setup the two SSDs in RAID1. It is a compromise, as putting everything back in the same volume will have some impact in the performance, and I will also lose half the space, but I’ll gain the extra peace of mind of knowing that data is a little safer.
        
        Thanks once again.
Tim G
May 23, 2012 10:36 am

I have a client that is having performance problems (sql server 2008 r2) on a very random basis. is there a way (without having to buy a tool) to some how monitor the database to know what the bottlenecks are? they recently had a timeout problem and narrowed it down to a particular stored procedure but when the stored procedure was run at a different time it ran very fast compared to timing out earlier in the day. We need to know what is going on with the db env when performance is bad.

Any advice?

Thank you for your time

Tim G
- Brent Ozar
  May 25, 2012 7:09 am
  
  Tim – sure, absolutely. Explaining this is a little beyond what I could go into in a comment – I’d suggest picking up the book Troubleshooting SQL Server. I reviewed it here: https://www.brentozar.com/archive/2011/11/sql-server-book-review-short-takes/
Abdul
June 10, 2012 3:41 pm

Hi Brent,

I ran your script today and its just amazing this is very handy and helpfull for a quick health check. I firstly thank you very much for providing this. I have a quick question for the slow storage out put in the script, is it possible to get the results for a specific mount volume if its slow. My company uses Mount volumes all over the place. You script gave me the result saying the drive G is slow. This is actually a Mount point and have several Mount volumes in it can you pelase advice. I would like to know which Mount Volume is having I/O issues.
- Brent Ozar
  June 10, 2012 6:57 pm
  
  Abdul – thanks, glad we could help. In the article above, click on the link that says “Find out which data and log files have the most reads/writes/stalls” and it’ll show you exactly which files (and drives) are the slowest.
RR
October 9, 2012 10:32 am

I needed to drill down to the database level for a slow read/write I/O. I modified the Blitz query to isolate the actual values of an actual database.

SELECT DISTINCT
36 AS CheckID ,
( io_stall_read_ms / ( 1.0 + num_of_reads ) ) AS tReadValue ,
‘Performance’ AS FindingsGroup ,
‘Slow Storage Reads on Drive ‘
+ UPPER(LEFT(mf.physical_name, 1)) AS Finding ,
‘https://www.brentozar.com/go/slow’ AS URL ,
‘Reads are averaging longer than 100ms for at least one database on this drive. For specific database file speeds, run the query from the information link.’ AS Details
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
WHERE ( io_stall_read_ms / ( 1.0 + num_of_reads ) ) > 100;

SELECT DISTINCT
37 AS CheckID ,
( io_stall_write_ms / ( 1.0 + num_of_writes ) ) AS tWriteValue ,
‘Performance’ AS FindingsGroup ,
‘Slow Storage Writes on Drive ‘
+ UPPER(LEFT(mf.physical_name, 1)) AS Finding ,
‘https://www.brentozar.com/go/slow’ AS URL ,
‘Writes are averaging longer than 20ms for at least one database on this drive. For specific database file speeds, run the query from the information link.’ AS Details
FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
INNER JOIN sys.master_files AS mf ON fs.database_id = mf.database_id
AND fs.[file_id] = mf.[file_id]
WHERE ( io_stall_write_ms / ( 1.0 + num_of_writes ) ) > 20;
RR
October 9, 2012 10:35 am

About the preceeding post, I meant to ask if anyone has found a better way to get the Read/Write info to identify a particular’s database performance.
digital
October 11, 2012 1:37 pm

Brent,
First off, you are a great resource.

In regards to querying sys.dm_io_virtual_file_stats alone and interpreting the latency, I think this can be misleading since it is cumulative. If your server has been up a while and for example, you have large DW imports or have a full blown nightly maintenance plan (integrity checks and index rebuilds with sort in TempDB), this drastically skew your results especially for TempDB.

I ran across the following script and it is yielding more accurate results for latency when compared to perfmon counters for disk latency when run for the same time period being monitored: http://blogs.msdn.com/b/mhouse/archive/2012/02/14/5-minute-io-script.aspx.

I think this DMV maybe widely misinterpreted in regards to disk latency when not digested correctly. I think it is more valuable for overall vf activity.

Kind Regards!

Thanks,
Steve
- Brent Ozar
  October 11, 2012 4:39 pm
  
  Steve – thanks, yep, I use that query myself from time to time. Thing is, it requires 5 minutes to get numbers – and the point of sp_Blitz is rapid diagnostics right away. They both have great uses. Thanks!