Chicago #SQLPASS Meeting Recap

#SQLPass

John Jones and Ray LaMarca of NetApp came to the Chicago PASS Chapter tonight, and John did a presentation about storage performance.  One of my measures of a good performance is the number of questions asked during the presentation, and John’s presentation was definitely a winner there.  Lots of good questions.

John Jones of NetApp
John Jones of NetApp

John recommended that everyone use an IO stress tool to validate performance.  This way you can use your metrics as a baseline so you can understand what kind of performance you can expect, analyze trends, and troubleshoot issues.  He listed several metrics, but he focused most on I/Os per second (IOPs) and IO size (the amount of data being transferred.)

Workload affects performance because different workloads access data differently.  OLTP databases are generally random in nature for both reads and writes, whereas OLAP (or Decision Support Systems) tend to be sequential, like table scans, index scans, and bulk inserts.  For a RAID array, the worst performance scenario is random writes, and the best case scenario is sustained sequential reads.  An attendee asked how SANs are affected in shared environments where multiple servers share the same disk drives, and John agreed that those systems will be mostly random.  For more on this topic, check out my Steel Cage Blogmatch with Jason Massie.

John pointed out that all SAN vendors get their disk drives from the same hard drive vendors.  A drive in a NetApp SAN doesn’t spin any faster than a drive in an HP SAN – it just boils down to what the SAN vendor does between the server and the drives.  Drive throughput averages are:

  • SATA 7200 rpm – 40 IOPs at 20ms latency
  • FC 10k rpm – 120 at 20ms
  • FC 15k rpm – 220 IOPs at 20ms
  • SAS 15k rpm – 220 IOPS at 20ms

SAS 15k drives are becoming more popular because they’re packaged smaller – NetApp can fit 24 drives in a 4u rack space.

You can use Performance Monitor (Perfmon.exe) to measure your storage performance.  Here’s the metrics John focuses on:

  • Average Disk Queue Length number, but that you have to work with your SAN team to find out the number of spindles behind your array.
  • Avg Disk Sec/Read should average under 20ms for OLTP systems (30ms for DSS)
  • Avg Disk Sec/Write should average under 10ms.
  • Disk Read Bytes/Sec (Throughput) – divide this by 1024 twice to get megabytes.
  • Disk Write Bytes/Sec (Throughput) – divide this by 1024 twice to get megabytes.
  • Avg Disk Bytes/Transfer (IO Size) – divide this by 1024 once to get megabytes.

I’ve got more about capturing and analyzing these statistics in my Perfmon tutorial for SQL Server DBAs.  John also likes using the DMV sys.dm_io_virtual_file_stats to get database-by-database IO statistics.

John talked about NetApp’s FlexVols, which sound like plain shared spindle configurations with some extra goodies like “automatic load shifting” which means “tuning is no longer necessary.”  I have mixed feelings about this.  If you don’t have the time to do a really good SAN configuration and you don’t need to wring every bit of performance out of the SAN, this works well, but if you do it poorly, you can screw yourself.  Try putting SQL Server on the same spindles as a bunch of file servers doing antivirus scans.

He also covered SAN snapshot backups, which can back up huge volumes of data instantaneously.  I like snapshot backups for a few use cases.  If you’ve got a multi-terabyte data warehouse, for example, and you need to quickly refresh your dev and QA environments, SAN snapshots are a neat way to do it.  He kept looking nervously over at me knowing I work for the company that makes Quest LiteSpeed – poor guy.  If I wanted to shoot holes in snapshot backups, I’d ask how they help for disaster recovery or log shipping, and how they manage to save space in environments that do index defrags, but I kept my mouth shut, heh.

NetApp has a Performance Acceleration Module (PAM) card that acts as a 256-512mb cache card.  It’s only for reads, but it gets you faster writes because your drives aren’t burdened with doing so many reads.  He showed some statistics suggesting that these cache cards get the same read performance benefits as adding a bunch of hard drives, but without the cost or space problems.

Great presentation, lots of good information & questions.  Big thanks to NetApp and to John!

Previous Post
Free Day-Long Virtual Event for SQL Server
Next Post
You may ask yourself, How did I get here?

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.