How Fast Does Your SAN Need to Be for SQL Server?

Storage
9 Comments

Let’s oversimplify the bejeezus out of this complex problem.  Suspend your disbelief for a second and work with me:

We have a database server hosting just one 100GB table.  Sure, in reality, we’ve got lots of databases and lots of tables, but we’re going to keep this simple.  We’ve got a simple sales table that stores a row for each sale we’ve ever had.  We don’t have any indexes: this is just 100GB of raw data in our clustered index.

Our database server has 32GB of memory.  Some of that is going to be used by the operating system, drivers, the database software, and that bozo who keeps remoting into the server and playing Angry Birds, but again, we’re going to keep this really simple and pretend that all 32GB of memory is actually used for caching data.  We don’t have enough to cache the entire 100GB table, though.

A user runs a query that needs to scan the entire table.  They want sales numbers grouped by year, by region.  In decision support systems, users run all kinds of wacko queries, and we can’t build indexes to support all of them, but even if we could, we’re keeping this scenario simple and assuming that we have 100GB of raw data and no indexes whatsoever.  To satisfy this query, we have to read all 100GB of data.

Before our query can finish, we have to read 68GB of data from disk.  That’s our 100GB table minus 32GB of it that happens to be cached in memory.  I’m assuming that we’ve got a warm cache here with some 32GB of the data in memory, although I don’t know which 32GB, and it doesn’t really matter.  We can’t fit 100GB of data in a 32GB bag.

The user wants the query to finish in 10 seconds or less – preferably much less.  Presto: now we know how fast storage needs to be.  We need to be able to read 68GB of data in less than 10 seconds.  We can test our storage to see whether it meets that number using my recent post on how to check your SAN speed with CrystalDiskMark.

The Magic SAN Speed Formula

The final formula is beautifully simple: how much time do we have, and how much data do we need to read?  The business is responsible for telling us that first number, but the second number is a heck of a lot harder to gather.  We have to put ourselves into the above scenario and boil things down to the simplest possible illustration of the worst case scenario.

How much memory is available for caching data?  Use these simple DMV queries to find out how much memory each database is using, and even better, how much each object in each database is using.  You might be surprised at how little memory is available for caching because your server needs so much memory for other tasks like keeping the OS’s lights on and sorting your query data.  This is why I’m so emphatic that you should never remote desktop into a SQL Server – by launching programs there, you’re consuming very valuable memory.

How big is the biggest table we need to query?  Use this DMV query to calculate the size of all the tables in your database – both with and without indexes.  The results help explain why more indexes aren’t necessarily better: they’re all competing for the same memory.  When I’ve got two overlapping indexes that are both getting used, I’m cutting my cache capabilities.

Can we use an index to satisfy the query?  Sometimes the answer to faster storage is writing better queries that can leverage indexes rather than doing table scans.  This is why it’s important to understand sargability and implicit conversions.

How much of this data can we guarantee will be in cache?  Think worst case scenario: other queries may be running, or other databases on the system might be more active and taking over the cache.  The more memory I put in the server, and the more I isolate performance-critical databases away from the rest, the more I can guarantee fast queries by caching data.

Microsoft’s Reference Architecture Specs for SAN Speeds

Microsoft’s Fast Track Data Warehouse systems are purpose-built database servers that ship with everything you need to get fast performance.  They’re available from hardware partners like Dell, HP, and IBM, and Microsoft works with ’em to make sure you’ll get the speed you need.

The Fast Track reference architectures assume that we can’t satisfy queries via indexes, and they don’t even try to cache the data in memory.  They just flat out assume queries will be performed using table scans, so they require very high speed storage performance:

“…this system architecture is called the Core-Balanced Architecture. This balanced approach begins with what is called the CPU core consumption rate, which is the input capacity that each CPU core can handle as data is fed to it.”

This is a really different approach, and it starts to explain SQL Server 2012’s licensing of around $7k per core for Enterprise Edition.  If you’re going to pay big money for 40 cores of that licensing, wouldn’t it make sense to ensure that those CPUs can actually do work?  By specifying a minimum IO throughput per core, Microsoft guarantees that the server could actually get busy.  Otherwise, we’re harnessing expensive thoroughbred racehorses to a crappy chariot.  The Fast Track Configuration Guide even goes so far as showing you how to calculate a Maximum Consumption Rate and a Benchmark Consumption Rate for your system before going live. (I love Microsoft.)

In a typical customer environment I worked with recently, their current IO subsystem was able to deliver 300-400MB/sec.  By using the questions above and looking at Microsoft’s Fast Track reference architectures, we calculated that they needed closer to 4,000MB/sec in order to satisfy their end user requirements for query times.  Put another way, if we didn’t change any of the other variables, we needed to make the storage ten times faster.   Obviously, making that kind of improvement ain’t easy or cheap – and suddenly we got buy-in from management to change some of the other variables.

When you see the whole picture – licensing, storage throughput, query design, and end user requirements – it’s much easier to find the right way to get faster performance.  Sometimes it’s insanely fast IO throughput like Microsoft’s Fast Track solution, and sometimes it’s rewriting queries to improve index utilization.  Showing the real cost of storage throughput helps justify why query writers need to step back and rewrite troublesome parts of the app.

Previous Post
How to Test Your Storage with CrystalDiskMark
Next Post
How to Interview Storage Vendors

9 Comments. Leave new

  • Good post, Brent. For your simplified scenario (and in real life), using data compression on that clustered index would likely reduce the query execution time by a factor of two or three. Of course you need Enterprise Edition for that.

    DDR3 ECC RAM is extremely affordable right now, with the price decreasing almost on a weekly basis. Having lots of RAM is a good and relatively cheap band-aid.

    But in the end, having very good sequential I/O performance is so very useful for a lot of different things!

    Reply
  • The answer of course is FASTER!

    Reply
  • Brent, hey!

    It’s cool understand somethings that not explained more detailed. Here i can view many ways to optmize my environment. Thank so much to share this scripts with us.

    And i think that this cenario is useful not use database compression backups, no?

    Tks!

    Reply
  • David Forck
    April 5, 2012 3:32 pm

    Neat stuff. Does it mean anything if tempdb has the most buffer pages on a server?

    Reply
  • Garret Black
    April 30, 2012 2:22 pm

    I’m coming from the SAN side and one of the important factors that is missing and is usually the unknown factor when I ask an application owner is the size of the I/O. For SAS it’s configurable, but for SQL it seems I/O could be in 4k,8k,or even 512k chunks. So from you example of 68GB in 10sec you would need the I/O size to convert that down to IOPS to get a good idea of how many disks you would need.

    Reply
    • Garret – SQL Server only does 8KB minimum, because we store data in 8KB pages. That would be the minimum size. If you’re doing bulk loads of lots of data (like a data warehouse would do overnight) then writes may be in larger operations, but we don’t have the ability to control that in day-to-day OLTP databases. It has to do with the way SQL Server stores data in clustered indexes. Just generally speaking, for OLTP databases, I’d expect 8KB writes for SQL Server, and 8K-64K reads.

      Reply
      • Garret Black
        April 30, 2012 4:21 pm

        Thanks for the clarification Brent. We are primarily data warehouse for our high I/O servers. Good to know I can use 8k for a worst case scenario and not 4k.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.