A few years ago, Erika and I went to the VW dealership to trade in her old Jetta and get a new one. The choice of a Jetta was already a foregone conclusion – she loved VWs at the time – and it was just a matter of picking out colors and options.
We took a test drive of the base model, and then the sales guy asked, “Do you wanna also test drive the 1.8L Turbo version?”
Me (immediately): “No.”
Erika: “Sure, why not?”
Me: “You were just saying how this one is so much peppier than yours. You – and by you I mean we – don’t need a turbo.”
Erika: “Come on, let’s try it.”
As soon as she accelerated onto a highway on-ramp, felt the power surge, and heard the turbo whistle, that was the end of that. Suddenly, I felt like a SAN administrator and Erika was the DBA. “No no no,” I was saying, “You don’t need that. You’re never going to go that fast, and I know because you yell at me when I take highway on-ramps that fast. Let’s not spend the extra money if we’re not getting extra capacity. Besides, let’s bring it back to numbers – let’s measure how fast the base version is to 60mph, and then measure the turbo version.”
She couldn’t hear me because I was in the back seat and she was negotiating price with the sales guy. The seat-of-the-pants feeling of speed was enough for her, and often it’s good enough for us DBAs too.
Measuring Your SAN the Easy Way
I’ve written about how to test your SAN’s performance with SQLIO, but I’ll be honest with you: that’s the hard way. It takes knowledge and time, and you only have one of those. (I’ll be charitable and not tell you which one.)
Instead, let’s get seat-of-the-pants numbers for your storage. Go download the portable edition of CrystalDiskMark (NOT CrystalDiskInfo) and put it on a network share. Run it on an idle server (not your live SQL Server, because it’ll slow things down while it runs.) It’ll look like this:
Across the top, there’s three dropdowns:
- 5 – the number of test passes you want to run. If you want a fast seat-of-the-pants guess, do 1, but keep in mind it can be wildly variant between passes if something else happens to be going on in the SAN.
- 4000MB – the test file size. I like using 4000MB to reduce the chances that I’m just hitting cache and getting artificially fast numbers. Smaller test file sizes may look fast but don’t really reflect how a large database will work.
- E: – the drive letter to test. Keep an eye on the free space there – you don’t want to create a test file that can run your server out of drive space.
After making your choices, click the All button. While it runs, here’s an explanation of each row’s results:
- Seq – long, sequential operations. For SQL Server, this is somewhat akin to doing backups or doing table scans of perfectly defragmented data, like a data warehouse.
- 512K – random large operations one at a time. This doesn’t really match up to how SQL Server works.
- 4K – random tiny operations one at a time. This is somewhat akin to a lightly loaded OLTP server.
- 4K QD32 – random tiny operations, but many done at a time. This is somewhat akin to an active OLTP server.
The more astute readers (and by that I mean you, you good-looking charmer) will notice that 4K operations don’t really measure SQL Server’s IO. SQL Server stores stuff on disk in 8K pages, and zooming out a little, groups of 8 8KB pages (64K extents). We’re not looking to get an exact representation of SQL Server’s IO patterns here – we’re just trying to get a fast, one-button-click-easy measurement of how storage performs. Usually I find that during the first round of storage tests, it’s not performing well period – and it doesn’t make sense to bring SQL Server into the game just yet.
Interpreting CrystalDiskMark Results
For magnetic hard drives (individually or in RAID arrays), sequential operations (the top column) are often 10x-100x the rest of the results. This metric is often limited by how the computer is connected to the storage, and you can get those numbers from the bandwidth rates in Kendra Little’s “How Big Is Your Pipe?” bandwidth reference poster. Keep in mind that the MB/sec numbers on the poster are theoretical limits, and in practice, we’ve got 5%-20% overhead involved.
For solid state drives, the difference between sequential and random operations isn’t always as dramatic, but it can still be 2-3x. If there’s no difference, then I’d look even closer at the connectivity method – the SSDs are probably outperforming the connection method (like 3Gb SATA, 1Gb iSCSI, or 2/4Gb FC.)
So what’s a good or bad number? If your server boots from a mirrored pair of local drives, and stores its SQL Server data somewhere else (like on a larger array or on a SAN), then test the local mirrored pair too. Compare the numbers for where you’re storing the valuable, high-performance data to where you’re storing the OS, and you might be surprised. Often I find that the OS’s drives perform even better because we just haven’t configured and tuned our storage.
Keep these original CrystalDiskMark screenshots in a shared folder for the group to access, and then challenge everyone involved to do better. Simple tuning techniques like tweaking the read/write bias on the RAID controller’s cache, right-sizing the NTFS allocation units, and working with different stripe sizes can usually yield double the storage performance without spending a dime.