Brent Ozar - SQL Server DBA Rotating Header Image

Steel Cage Blogmatch: How to Configure a SAN

Jason Massie and Brent Ozar work with SQL Server for a living and write blogs for fun, but that’s where the similarities end.  Jason’s a fan of shared storage SANs: putting SQL Server data and logs on the same set of physical hard drives.  Brent’s a cheerleader for dedicated configurations where the arrays are separated by purpose.  We locked the two of them in a virtual room to see what would happen.  No database administrators were harmed in the making of this article.

Brent Says: You’re crazy.  Well, I mean, you’re crazy just in general – anybody who writes a comic about SQL Server needs some medication – but I mean you’re wrong on this issue.  Since the dawn of time, we’ve been telling SQL Server DBAs to put their data, logs and TempDB on different arrays for a reason: they get better performance.

Jason Says: We you say “need medication” I take it you mean more medication. Anyway…. Yes, that SQL Server 6.5 book we read a decade ago is, in fact, outdated. We once thought the earth was flat and we were the center of the universe. You can now file “dedicated raid 1 for logs” along with those busted myths.  That book was written back when servers had a small number of locally attached drives, and we were lucky if we got two hard drives in a mirrored pair for the logs.  Today, SQL Servers are attached to huge SANs with large numbers of hard drives.  Are you saying a shared piece of fifty or a hundred drives isn’t enough?

Brent Says: For small servers, you’re completely right, but let’s talk about an extreme example: established multi-terabyte data warehouses need very high I/O for long periods of time, several hours a night, during their nightly load windows.  They need to load as much data as possible, as fast as possible, and these windows often coincide with the nightly backup windows of other systems on that same storage.   They have dedicated DBAs who monitor performance closely, and they want to get every last dollar out of their SAN.  They carefully partition the data to balance load across different arrays and make it easier to do simultaneous ETL and reporting.

Jason Says: In situations like this when money for hardware and manpower is not an object, you can just throw enough resources at it that it is hard to mess up. However, you should spend some resources on testing and make sure you’re not flushing money down the toilet.  Separate drives for data, logs and ETL load files might best but having a higher total spindle count might still be better. The proof is in the test numbers, and blanket statements are not safe.  Not to mention by designing for the ETL, you will have idle drives when you are not loading – and you don’t want to have expensive SAN hardware sitting idle, do you?

Brent Says: You have a point there.  It’s easy to recommend dedicated drives in situations where there’s a consistently high utilization rate, like for 24×7 OLTP systems, email archiving databases or very heavily used data warehouses.  It’s harder to recommend them when the database servers just aren’t being that heavily used around the clock.  But talking about virtualization reminds me – I don’t want any SQL Servers sharing spindles with virtual servers, ever.  Virtual servers have such a high I/O utilization already that it’s like playing with fire.

Jason Says: It’s like buying a new house – you want to check out the neighborhood first and find out what your neighbors are like.  You wouldn’t build a brand new mansion in a sketchy part of town, and you wouldn’t put a data warehouse and a bunch of VMware guests on the same spindles.  I would suggest going to the Mayor (SAN Admin) and demanding a performance SLA. I have found by doing that, with executive sign off, that the Mayor makes sure the streets are clean and the riff raff out of sight.

Brent Says: Getting a good performance SLA is the first step to peace and harmony in the datacenter.  Data warehouses are a special case: they have lots of dedicated staff, generally experienced people, who can focus on things like IO throughput and latency.  But what about the rest of the databases, like your typical OLTP or application databases?  A single DBA has to manage dozens, sometimes a hundred instances.  They don’t have time to benchmark.

Jason Says: Hey, just because they’re a DBA doesn’t mean they spend every weekend in the datacenter, alone with their servers.  Not all of them are single.

Brent Says: Don’t be so sure.  My girlfriend refused to even think about marrying me until after I got out of the production DBA on-call rotation.   She says it’s a coincidence, but I dunno.

Jason Says: If you don’t have the time to benchmark, then you don’t have the time to micromanage storage performance, and you really don’t have the time to build a good estimate of how many individual drives you need for TempDB, logs, and so on to begin with.  Using a shared set of drives is the easiest way to achieve high performance.  If you’re not getting enough performance, add more drives, and it’ll even out.  Don’t try this with, say, 6 hard drives and expect to get good results.

Brent Says: That’s the same with dedicated drive setups, too.  I’ve been at shops where the developers swore SQL Server couldn’t scale – only to find out they had the data, logs and TempDB all on a handful of drives.  Whether you’re using dedicated drives or shared drives, don’t expect to scale a database server on any software platform without adding storage.  If you want a real eye-opener, go read the TPC benchmark records and look at the number of hard drives that were used.  It’s not unusual to see drive utilization rates of 1-5% - meaning, on a 146gb drive, they’re only storing a few gigs of data.  More spindles equals more speed, period.

Jason Says: Another problem is changing app requirements like adding audit trails or replication. These adding read IO to you dedicated log drives. There is also reporting mayhem(large sequential scans) being added to your OLTP app. These are the bad elements but once they are in your ‘hood, how do you move them out?

Brent Says: That’s actually a problem with some dedicated spindle SAN configurations.  When each array is configured on just a few disks, management can be a nightmare.  I’ve worked with one major SAN vendor who could only add drives two at a time to arrays, no matter how large the array was, and all drive management had to stop while that array was restriped – sometimes taking hours.  Adding more drives for more performance to handle new load was a real pain point, and I would have gladly traded off some performance in exchange for easier management.

Jason Says: Easier management is a big selling point for shared drive configurations.  When multiple servers draw out of a common pool, it’s easy to allocate space in smaller increments.  You can build multiple pools with different tiers: a Solid State Disk tier, a 15k RAID 10 tier, a SATA RAID 5 tier, and so on.

Brent Says: Speaking of Solid State Disk, when SSDs go mainstream, all SANs will just use shared drive configurations.  With SSDs, the random access problem goes away, because there’s no heads to move around, no platters to spin.  We’ll be able to gauge SAN utilization strictly on the number of queued-up requests instead of worrying about drive latency, and then it’ll be easy to tell when we need to add more drives.

Jason Says: Are you saying that maybe then you’ll get around to admitting the earth isn’t flat and shared drive configurations are always the best?

Brent Says: Get off my lawn, whippersnapper.

Jason Says: This is me getting in the last word. ?