Tag Archive: storage

SQL Server Can Run Databases from Network Shares & NAS

Geeks love duct tape.  Sure, we pride ourselves on building a rock-solid solution the right way, but when the brown stuff hits the moving metal stuff, we love to show off our ingenuity.  We carry our money in duct tape wallets, wear duct tape shirts, practice ductigami, and hang our duct tape from custom metal brackets.

I’m about to discuss a feature of SQL Server that’s probably going to horrify you at first, but take a step back and think of it as duct tape.  I don’t want you to build a solution from scratch with it, but it’s a killer tool that deserves a prominent spot in your bag of tricks.  Since I’ve started using SQL 2012 in my lab, not a week has gone by that I haven’t used this feature to do something faster and easier.

A Brief History Lesson on Trace Flag 1807

In the past, Microsoft SQL Server required its database files to be stored on local drives or SAN storage.  The SQL Server 2000 I/O Basics whitepaper explains in stern terms why not just any storage will do – we have to be sure our data will hit the disk in a calm, orderly fashion.  Server vendors got the point, and everybody’s local storage and SAN storage followed the guidelines.  Over time, we stopped buying hardware from a Windows Hardware Compatibility List because, well, everything worked.  If you could buy it off the shelf from a major vendor, you were fine.

For ambitious people who wanted to gamble with their data, trace flag 1807 let you store your database files on network shares or mapped drives.  This was a pretty bone-headed idea – heck, it was hard enough to get our local and SAN storage to be reliable enough – so few people used it.  Database administrators are usually way too paranoid to let their databases get corrupted just because somebody tripped over a network cable or a $200 NAS box failed.  Even if you didn’t mind unreliability, performance was a major problem – network file shares just weren’t fast enough to handle database access.

Network attached storage gained mainstream credibility over the last few years, and it’s gained widespread use thanks to virtualization and the Network File System protocol (NFS).  Under virtualization, each drive on your virtual server is really just one big file (VMDK or VHD), and it’s easier to manage accessing big files on a file share rather than mapping a bunch of LUNs to VMware hosts.

With tuning, NFS performance is fine for virtualization, and this has a bit of a hidden meaning: if we’re running our entire SQL Server in VMware, then we’re already doing database access over NFS.  So why can’t we let physical SQL Servers access their own databases via NFS too?

SQL Server 2008R2 and 2012 <3 Your NAS

SQL Server 2008R2 did away with the trace flag and lets you put your data & log files anywhere.  This statement works fine:

CREATE DATABASE [DestinedForGreatness]
ON  PRIMARY
( NAME = N'DestinedForGreatness', FILENAME = N'\\LittleBlackBox\SQL2012data\MSSQL\Data\DestinedForGreatness.mdf' )
LOG ON
( NAME = N'DestinedForGreatness_log', FILENAME = N'\\LittleBlackBox\SQL2012data\MSSQL\Data\DestinedForGreatness_log.ldf')
GO

And just to make things really clear, that statement doesn’t just work fine – it doesn’t give you a single warning.  Just like that, you’ve got a database relying on a network share, along with all the gotchas that entails.  Its database icon looks just like the rest in SSMS, and there’s nothing to suggest that your hosting strategy just got a lot riskier.

The Six Million Dollar Man's children had smaller budgets to work with.

File share access goes over the same network paths as your client communications (queries, results, RDP sessions, etc).  A heavily accessed database could saturate your network cards, thereby slowing down everything else – or vice versa.  An unrelated file copy or backup could bring your network-attached database to its knees.

You also have to start monitoring the remote file share’s performance.  It can get slowed down by CPU or memory issues, or just by other activity happening on that same file share.  This means we have to capture Perfmon data on that remote file server, including the physical disk counters.  I don’t see this as a drawback – after all, I need to do this same level of performance monitoring on my SAN, too.

Use Cases for Databases on NAS

When I first read about this SQL Server feature, I had the same reaction you’re probably having: nobody in their right mind should use this, right?  As it turns out, though, I keep using it to make my life easier.

Low log file space emergency?  Add a file on a NAS.  With a few mouse clicks, disaster is averted.  Let your transactions continue (albeit slowly), then go back later to remove that extra log file.

Need to restore a database fast?  Attach it from anywhere.  I’ve often needed to restore just one table from a big database backup, but I haven’t had the room to restore the entire database in production.  No problem – just restore the database on another server (like development), but use a network share name like \\MyFileServer\TempFolder\DatabaseName.mdf as the target.  All of the disk-intensive and network-intensive work happens on the dev server.  When the restore finishes, detach the database in development, and the production server can attach the database while it sits on the network share.  Copy out the tables you need, detach the database, and you’re off.

Hello Kitty Duck Tape

Hate drive sizing hassles?  Use NFS.  I run a lot of SQL Servers in virtualization, and I move databases around from server to server.  Rather than worrying about whether a server has enough free drive space, I’ve just created a file share on my NAS, and all my SQL Server 2012 instances attach their databases from that share.  I can manage free space as a pool, and I never have to worry about an individual server running out of free space again.

Should You Run Databases Over the Network?

The amateur question is, “Do you feel lucky, punk?”

The professional question is, “Do the performance and reliability of your network file shares match up with the performance and reliability needs of the database?”

When I’ve asked myself the professional question, I’ve been surprised at some of my answers.  Network storage probably shouldn’t be your first default choice for new databases and files, but give it a chance.

The key to success is getting everybody to agree on both the capabilities of the NAS and the requirements of the database.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL Server Storage: Files and Filegroups Video

You’re getting ready to buy more drives, but what should you use them for? If you’re a developer designing a database schema or a production DBA configuring storage, maybe you’ve thought about putting indexes on a separate filegroup.

I’ll explain when it’s right to add more files, the difference between filegroups and files, and why you probably shouldn’t separate out indexes into their own filegroup — but what should go into those new filegroups?  You’ll learn in this 33 minute video:

Liked that video?  Here’s some of our past videos that you can watch immediately:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Virtualization and SAN Basics for DBAs Video

These two technologies can make a very big — and very bad — difference in how your SQL Server performs. Wouldn’t it be great if you could get the real, honest lowdown from a virtualization administrator, a SAN administrator, and a DBA? Wouldn’t it be even better if one person had done all three, and could give you the pros and cons of each point of view?

In this one-hour session, I explain how virtualization changes CPU, memory, and monitoring, and I show how to get specific recommendations for your make & model of SAN:

The links I discuss in the video are BrentOzar.com/go/san and BrentOzar.com/go/virtual.

If you like the one-hour free session and you’d like to learn more, I’ve got upcoming four-hour webcasts on virtualization and on storage.

Storage Area Networks (SANs) for DBAs

Learn what’s happening inside the black box!  Brent Ozar is a Microsoft Certified Master of SQL Server who got fed up with his SAN admin saying, “Everything’s fine – it must be a SQL Server problem.”  When the SAN admin quit, Brent took over, learned how SANs worked, and started putting the pieces back together.

In this four-hour session, production DBAs will learn what happens inside the SAN including:

  • The differences between RAID levels (5, 6, 10, DP)
  • How pools of disks are shared between servers, and why you might like it
  • Pathing: the route between your server and your data
  • How solid state PCI Express drives like Fusion-IO work differently

After that, you’ll understand how SQL Server interacts with storage.  We’ll cover:

  • Why the SAN admin always thinks you’re not pushing the SAN hard enough
  • When you don’t need to separate your data and log files – and when you do
  • Why pathing may determine the number of data files you need for speed
  • When you might actually need two log files in your database – despite what the “experts” say
  • And a start-to-finish storage setup and testing checklist

This session is for DBAs who are frustrated with slow or unreliable SAN performance, or DBAs who are about to embark on buying a new SAN.  Register today.

Tuning SQL Server on VMware vSphere

SQL Servers can run faster and more reliably under VMware vSphere, but they can also run craptastically.  You can’t just use the defaults and expect databases to fly, but thankfully, it’s not that hard to get high performance and uptime with just a few important tweaks.

In this four-hour session, production DBAs will learn why:

  • Virtual CPUs are different – and sometimes less is more
  • Virtual memory is different – and how to set SQL Server max memory under VMware
  • Virtual storage is different – and whether we should use VMDKs or raw LUNs
  • Virtual networking is different – and why we have to do our backups differently
  • VIrtual monitoring is different – and why Task Manager is a dirty, filthy liar

This session is for DBAs who are frustrated with slow or unpredictable SQL Server performance inside VMware, or DBAs who are about to embark on virtualizing their first SQL Server.  Register today.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Automated Tiered Storage for Databases

Database administrators want just one kind of storage: fast.

Companies want just one kind of storage: cheap.

Since fast storage isn’t cheap, and we can’t put everything on slow storage either, we end up buying different tiers of storage. Let’s start by pretending that we’re buying a new HP DL370 G6 to act as a data warehouse server.  It can hold up to 14 3.5″ hard drives – so what drives should we buy if we want to use tiers?

All the better to store you with

HP DL370 G6: Our local tiered storage candidate

Tier 1: Mirrored pair of Solid State Drives – SSDs are still pretty expensive, so we only buy two of these.  We’re going to store mission-critical data on our SQL Server, so we buy two of these drives and mirror them.  That way, if either drive fails (and SSDs certainly do fail), our server won’t go down.  However, SSDs have limited capacity, so if we buy two 256GB drives in a mirrored pair, we’re only going to get 256GB of usable very-fast capacity.  SSDs are blazin’ fast at random reads and writes because they have no moving parts – they can jump to any portion of the drive at any time.  In our sample server, we’re going to use these drives for TempDB because we’ve determined that our application hammers TempDB hard.

Tier B: Four 15K drives in RAID 10 – 15k refers to how fast the drives spin.  At 15,000 revolutions per minute, the platters are flying, and lots of data is passing under the hard drive heads.  These drives can read or write a lot of data very fast when they’re dealing with sequential data, because the drive heads don’t have to move around.  They’re not as good with random reads and writes as SSDs, which have no moving parts.  Our 15K drives are larger than SSDs – we can afford to buy the 450GB models, and with four of those in a RAID 10, that gives us about 900GB of unformatted capacity.  (You can learn more about calculating RAID sizes in the Wikipedia RAID article.)  In our sample server, we’ll use these drives for our transaction logs.

Tier III: Eight cheap 7200RPM SATA drives in RAID 5 – These only spin at 7,200 RPM, roughly half the speed of the above drives, which means data is passing under the hard drive heads more slowly.  Just as slower cars are cheaper than faster cars, slower hard drives are cheaper too.  We’re going to buy the 2TB models, and with 8 of those in a RAID 5, that gives us 14TB of unformatted capacity.  RAID 5 can be much slower for writes, but it can be fast for large reads.  Our database stores a lot of wide data because we need to store XML and files in the database, but we don’t update that data very often.  It’s mostly kept for archive purposes.  We really do need that much space for our user databases.

This is manual tiering with local storage: carefully determining your application’s needs and then crafting a storage solution to fit.  It requires picking the right server, the right drives, the right RAID configuration, and then finally, configuring the application (SQL in this case) to store the right data in the right places.

This is a pain in the ass.

Very few people do this because there’s so much labor involved if you want to do it right.  If you just make some guesses about loads, buy any old drives, and slap ‘em into any old RAID configuration, then you’re locked into bad performance.  Six months down the road when you realize the application isn’t really hammering TempDB, and that the real speed bottleneck is one particular set of tables in a user database, you’re going to have a rough time reconfiguring all this.  Worst case scenario, we’re talking about backing up all of the databases, reformatting everything, and restoring it all from backup.  The bigger your databases get, the more painful this becomes.

In-Drive Storage Tiering

Whenever there’s a pain, vendors step in and offer to take your money help.  One of the early successful hybrids was Seagate’s Momentus XT – a 2.5″ laptop drive that partnered 4GB of SSD memory with a 500GB hard drive in one package.  The drive learned which data you accessed the most frequently, then cached that data on the SSD.  The hybrid drive sped up routine actions like reboots and application launches.

If it was shipping, I'd show a picture of it in my hand.

OCZ RevoDrive Hybrid

The newest contender in the market, the OCZ RevoDrive Hybrid, is a PCI Express drive like Fusion-IO, but it combines a 100GB SSD with a 1TB 2.5″ laptop drive.  These aren’t shipping yet, but the press release holds an interesting nugget:

“Advanced caching algorithms learn user behavior and adapt storage policies to ensure optimal performance for each individual user, maximizing productivity for the most demanded programs and applications.”

The word caching implies that like the Momentus XT, the RevoDrive Hybrid is writing everything to the magnetic drive, but just caching frequently accessed data on the SSD for faster reads.  However, further down in the release, they brag:

“In addition, the drive not only eliminates the SATA bottleneck unleashing ground-breaking bandwidth up to 910MB/s, but also features up to 120,000 IOPS (4K random write) for high transactional workloads delivering true SSD-level performance.”

The phrase random write implies that writes will hit the SSD first, and then later be pushed down to the hard drive.  You can’t get 120,000 IOPs of 4K random writes on a 2.5″ magnetic drive.  This also means that if you tried to continuously do 4k random writes, you might be able to fill the available space on the 100GB SSD, and then performance would slow down as the RevoDrive was forced to migrate data down to the hard drive.

I’m not suggesting you use either of these solutions on a production SQL Server, but I’m showing them to introduce you to the concept of tiered storage.  In a single drive, some of your data might reside on fast solid state memory, and the rest would live on slower, more capacious magnetic Frisbees.

SQL Server Thinks Drives is Drives

Tiered storage can be much cheaper, easier, and more effective than partitioning.The good news is that your database server doesn’t need to be configured for anything in order to use this storage.  In our data warehouse example, we could use several of these RevoDrive Hybrids to store our sales table, which has hundreds of millions of rows of history.  It documents our company’s history going back for the last ten years, and our executives have told us we’re not allowed to archive any data.  They want to be able to run reports to compare this Christmas season’s sales against the last ten Christmases.

Before automated storage tiering, this type of table was painful for DBAs.  How do we decide which data to put on which drives?  If we only keep the most recent year of data on fast storage and the rest on slow storage, then our users are going to be screaming during every holiday season.  They’ll all try to run reports comparing this year against the last ten years, and the reports will run dead slow.  We scream, “WE CAN’T STORE ALL THAT DATA ON FAST DRIVES!” but actually…we can.

With automated storage tiering, the storage gear constantly watches which parts of the drive are being accessed, and moves them to faster/pricier storage.  It also watches which parts aren’t being frequently accessed, and moves them to slower/cheaper storage.  Depending on your environment, this tiering management might be done in different places:

DBAs need to know about this technology because it can help you avoid complicated setups like table partitioning or guessing what needs a fast TempDB versus what needs fast transaction logs.  When you know how tiered storage works and when you know how to test it, you’ll be able to adapt to changing performance needs faster.

I’ll be discussing this and more in my 4-hour Storage Area Networks for DBAs webcast.  Come join me!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Iomega StorCenter PX4-300d NAS Review: iSCSI Monster

I wanted to love this monster from the moment I read the spec sheets:

Iomega StorCenter PX4-300d NAS

Iomega StorCenter PX4-300d NAS

  • Network attached storage device with 4 or 6 hot-swappable drive bays
  • Available empty (so you can bring your own drives, including SSDs)
  • iSCSI server with 2 network ports, jumbo frame support, VLANs
  • Official VMware ESX/ESXi, Hyper-V, Windows DFS compatibility
  • RAID 0/1/5/10
  • Cloud support – automatically copy files to Amazon S3, Mozy, or other Iomegas over the Internet
  • External USB3 drive support (for leftover USB hard drives)
  • Insane number of home media support options including BitTorrent downloading, Time Machine backups for Apples, Bluetooth, automatic uploads to Facebook/YouTube/Flickr, recording server for Axis/Panasonic/D-Link webcams
  • Quiet (31dBa max) and low power (based on an Intel Atom CPU and 2GB of RAM)

The full feature list is ridiculous – it’s absolutely everything I needed for my VMware lab and my backups.  At around $700 from Amazon for the diskless version, I couldn’t resist.  I wouldn’t recommend buying the versions that come populated with hard drives – if you ever need to replace the drives, you’re in for a rough time, and oddly, the bring-your-own-disk version doesn’t suffer from that issue.

Choosing Drives and RAIDs for the PX4 and PX6

As of this writing, the list of officially supported hard drives is pretty short:

Storage pools in the PX4/PX6 have to all be the same size and speed, and I didn’t really need performance, so I chose to go with four of the Hitachi 3TB drives.  With those drives, RAID 5 would give me 9TB of usable storage, and the possibly-faster RAID 10 would give me 6TB.  I wouldn’t recommend the 2TB drives for a reason that’ll be clear shortly.

PX6 users have 6 drives to choose from, so in that environment, it might make sense to go with four magnetic hard drives and two solid state drives.  This would allow two tiers of storage: a blazin’ fast SSD mirror, and a bigger/slower RAID 5 pool.  With VMware ESX/ESXi’s Storage vMotion, we can move virtual machines back and forth between the two RAID pools without taking the VM down for a reboot.  It’s a fairly inexpensive way to get tiered storage, but with just 128GB in the fast pool, I’m not sure how useful this would be in practice.

Pluggin’ in the N: Network Attached Storage

iomega PX4-300d Control Panel

iomega PX4-300d Control Panel

The PX4-300d comes with the basics: power brick, one Ethernet cable, and the management tools CD.  After adding in drives and plugging it in electricity and Ethernet, the control panel displays the IP address it fetched from DHCP.  Fire up a web browser, go to that address, and you’re going to be impressed with the user interface.  You can even test drive the Iomega StorCenter control panel online.  (No, that’s not my Iomega, and no, you can’t delete data.  I tried.)

For the first minute or two, Iomega impressed the heck out of me.  The device could fetch its own firmware upgrades over the Internet, and yes, it was up-to-date.  I went into Drive Management, and sure enough, it offered a really easy GUI to configure RAID levels.  I picked RAID 10 and clicked Apply, but Iomega warned me that I hadn’t chosen any drives yet.  I’d missed the subtle little visual checkboxes inside each hard drive in the GUI, so…

I need to stop here and tell you a little about myself: I like to break things.

I didn’t say I like to take things apart and put them back together – oh no.  I just like to find new ways to break ‘em.  I take an evil satisfaction out of knowing somebody, somewhere didn’t test their code.

It's web scale!

RAID 10 with 3 Drives?

So I did what I do – I checked three boxes, chose RAID 10, and clicked Apply.

As far as I know, there’s not a way to perform RAID 10 with 3 hard drives (it requires an even number of drives) but the PX4 happily started initializing the array.  I sat there absolutely dumbfounded – this wasn’t a small bug, but a giant, ugly, monstrous bug.  I had no idea what the PX4 might be doing to the disks behind the scenes or whether I’d have any redundancy whatsoever.

I wouldn’t be so freaked out about this if Iomega wasn’t owned by EMC, one of the biggest, most reliable companies in the storage industry.  I had instant flashbacks to the click of death, the nasty sound Zip drives made before they crashed, resulting in a class action lawsuit against Iomega.  Was Iomega back to its old tricks of cutting costs at our peril, or were they up to EMC’s quality standards?  I took a deep breath, pretended I never saw it, and deleted the array.  I set up a four-drive RAID 10 array, decided to let one bug slide, and pressed on.

While the array is initialized, the Iomega’s LCD display and banners across the top of the web UI both warn the user that performance is degraded – very nice touch.  As the array is initialized, you can continue to use the storage array, and even write data to it – again, nice touch.  However, the initialization process takes hours – I left mine running overnight – which means I wouldn’t recommend the 2TB drives.  A PX4 with 2TB drives is $1,165, and with the 3TB drives it’s $1,405 – just $240 more for 50% more storage.  Yes, you could upgrade the drives later, but it’s going to take a loooong time to swap out each drive one by one and let the array rebuild.

Some Server Changes Require Reboots

The only way to win is not to play.

Restart Required

As the array initialized, I started wandering through the web control panel setting things up.  I changed the name to LittleBlackBox, and I was taken aback when the PX4 asked to reboot itself.  Really?  Just for a name change?  Okay, well, alright, nobody was on the array yet anyway.

Next up – configure remote access.  The PX4 will connect to a dynamic domain name service so you can use a simple DNS name to access your data from anywhere.  Sounds good, so I set it up and – you guessed it – reboot required.

Next up – configure iSCSI and jumbo frames.  This one at least didn’t require a complete NAS reboot, but it did warn that “This may take a few minutes to complete.  Anyone currently accessing this device will be disconnected.  Do you want to continue?”

The Problem with Rebooting an iSCSI NAS

NAS reboots aren’t a problem for some home users.  If iTunes stops working for a few moments, or my pictures stop uploading to the cloud, life goes on.

In a business or virtualization lab environment, it’s a massive problem.  VMware will be storing the virtual hard drives for several running machines on it.  In order to reconfigure many settings on the PX4/PX6, I have to:

  1. Shut down all of my virtual servers
  2. Shut down VMware
  3. Restart the Iomega and wait for it to come up
  4. Start up VMware
  5. Start up my virtual servers

This is a total showstopper in a corporate lab environment where multiple people might be playing with the device.  I simply wouldn’t be able to count on junior sysadmins not changing settings on the Iomega, triggering a reboot, and losing all of my virtual servers.  I would have to tightly control security settings on the NAS, but the only security option is a checkbox for administrative privileges: either you’re in, or you’re not.

That NAS is a Monster, M-M-M-Monster

Or was that philanderer?

Famous Philosopher

Philosopher L. Gaga once said in a poem entitled “Monster”:

I’ve never seen one like that before
Don’t look at me like that
You amaze me

While she was referring to crazy yet well-endowed gentlemen, I’m sure she would agree that the PX4-300d is just such a monster.  Its feature list is the size of a baby’s arm, but sometimes it’s got the business logic of a weiner.

The good news is that it’s got killer qualities for a virtualization lab – or perhaps even a small business’ virtualization needs.  The bad news is that it doesn’t have the stability or testing to back it up.

I was heartbroken.  I’d really wanted to recommend this storage device to small businesses for their staff to learn virtualization, storage vMotion, and perhaps even as a backup target.  With the current version of firmware, though, I just couldn’t do it.  With that in mind, I gave up on testing the network card failovers (which do work at first glance), multipathing (which doesn’t – more on that later), the speed differences with jumbo frames, and anything reliability-related.  For now, it’s just a good device for geeks to use in their labs and homes, so I’ll focus the rest of the review on that.

Device Features: Shares, Apps, Time Machine, BitTorrent, More

Onboard storage is set up in a hierarchy:

  • Drives are grouped together in pools (like 2 drives in a RAID 1 pair or 4 drives in a RAID 5).  Pools are a fixed size.
  • Pools are carved up into multiple volumes.  Volumes are a fixed size.
  • Volumes are carved up into multiple shares.  Shares are thin provisioned.
  • Applications are configured at the share level.

So in my PX4-300d, I have:

  • Storage pool – 4 drives in a RAID 10 for 6TB of usable space
  • “Media” volume – a 1TB volume that lives in my storage pool
  • “Torrents” share – a thin provisioned share in the Backups folder.  It starts at zero size, and can grow to whatever size of files I dump in there – up to 1TB.  I can access the Torrents share as a folder in Windows or on the Mac.
  • The Torrents onboard application added a Download folder in the Torrents share.  I can copy any torrent file into this folder, and the Iomega begins downloading it.
I'm in ur tubez, downloadin ur questions.

Torrent Download Details

The application user interfaces are designed by geeks, for geeks: they’re just good enough to be point-and-click usable, but not good enough that I’d want to walk family members through configuring or troubleshooting them.  For example, in the Torrent configurations screen, there’s a Port box.  You can manually configure a port or use the automatically supplied one, but you can’t tell if it’s actually working.  There’s no “test the router” functionality built in, so I have no idea if uploads are working or not until I start seeding a torrent.  There’s edit boxes for Maximum Download Speed and Maximum Upload Speed, but there’s no scale – is it KB/sec or MB/sec?

The apps have just enough capabilities to say they work, but not much more than that.  Modern BitTorrent clients all have scheduling setups so that you can transfer more at night and throttle back the sharing during the day – not here.

The Amazon S3 sync app will upload your files to S3, but it won’t delete the ones that get deleted on the NAS, thereby making your storage bill steadily increase.  If you use it for, say, SQL Server offsite backups, you would want to reuse your backup file names in order to get a 7-day rotation in the cloud.  You’d run into a separate issue anyway – S3′s max file upload size is 5GB.  Any larger files just don’t get uploaded, and you don’t get an alert.  No upload/download throttling or time-of-day scheduling here either – your bandwidth will just slow to a crawl as new files are added to S3.

Size Matters

Time Machine File Sizes Not Updated

Time Machine works wonderfully as a backup target, but the user is presented with two cryptic fields: Apple Network Hostname and Ethernet ID.  The network hostname comes from your Mac’s System Preferences, Sharing, click Edit, and it’s the part you can edit.  The Ethernet ID comes from going into Terminal, type ifconfig, hit enter, and look for en0 – the next line says ether, and copy/paste the characters to the right of that.  The Iomega doesn’t tell you any of this.  The only thing it does tell you is the Time Machine backup folder’s size, a constant 173MB, and even that is wrong.  After several days of backups, the folder sizes still showed incorrectly on the Time Machine panel of the Iomega.

But it works.

At least, I think it works.

And that’s what makes this review so hard.  When the control panel tells me it’s configuring a RAID 10 array with 3 drives, I know it’s wrong.  A basic error like that shakes my confidence in the entire unit.  Same thing with the incorrect Time Machine folder sizes, and the wacko configuration screens.  If I can’t trust what it’s showing me, how can I trust what’s happening behind the scenes?  I’m not sure.  This doesn’t bother me too much for home media backups, but it would bother me a lot for running my virtual machines.

Is the Iomega PX4-300d the Best VMware Storage Server?

Like MJ, I keep it in the closet.

My home VMware lab: HP Microservers and the Iomega PX4-300d

Most business iSCSI network implementations involve a completely separate logical network for iSCSI – separate IP addresses on a separate subnet.  For example, my home network is on 192.168.37.x, so I might use 192.168.47.x for my iSCSI network.  The iSCSI traffic is kept away from the regular network.

Most business iSCSI setups also involve redundancy: the storage device is plugged into the network with at least two network patch cables plugged into two different network cards.  If either one fails, you’re still able to talk to the storage.

At first glance, the PX4 can handle either one of those requirements, but not both.  If you want two separate logical networks, then you probably want to isolate the iSCSI traffic completely.  You can configure one of the PX4′s network ports into your management network (like 192.168.37.x) and plug the other into an iSCSI network.  When you do that, you just lost redundancy.  If you choose instead to have both of the PX4-300d’s network jacks plugged into the same network, that network is going to need to see the outside world if you want to use any of the PX4′s home-media-savvy features like BitTorrent.

This is where the good monster comes in again.

This Good Monster Has VLAN Support

This NAS supports VLANs – multiple virtual network subnets attached to the same network card.  Imagine an incoming network packet that finds its way to your server, and it’s got a tiny part at the beginning saying, “I’m coming from Virtual LAN #1.”  Your operating system would need to understand that it’s attached to a certain subnet and route it to Virtual Network Card #1.  The next packet might come in saying, “I’m coming from Virtual LAN #2,” and your computer would understand that it’s part of your Virtual Network Card #2 – the one for iSCSI.

Granted, your one network cable is only so fast, and we can’t stuff 2Gb of traffic into a 1Gb bag patch cable.  However, this does let us segregate network traffic at the switch level so that your chatty iSCSI traffic doesn’t overwhelm your regular network, and your chatty BitTorrent client traffic doesn’t overwhelm your iSCSI storage network.  While it is indeed delicious to get your chocolate in my peanut butter, it’s not nearly as delicious to get your BitTorrent in my iSCSI.

VLANs require a savvy operating system like VMware ESX/ESXi and a VLAN-capable switch.  I chose the $125 Cisco SLM2008-T because it supports VLANs, jumbo frames, has 8 ports, and it’s cheap.  Cisco switches have a reputation for being difficult to administer, but this one comes with a pretty good web control panel.  The setup process of putting in VLANs, jumbo frames, and ESXi configurations is a good idea for a future blog post, but suffice it to say that between Iomega’s web UI, Cisco’s web UI, and the vSphere Client user interface, you can be up and running with a very real-world-ish lab setup in under an hour – for some values of “running.”

Here comes the bad monster again – users can’t exclude network interfaces from iSCSI use.  Every time VMware rescanned the iSCSI adapters, the Iomega happily reported back that iSCSI services were available from every single IP address it owned.  I wanted to just do iSCSI over one network subnet, not both – no can do.  After pulling my hair out for a few hours, I threw in the towel and switched to NFS.  It worked the very first time I tried it, and it came with a nice perk: the StorCenter’s LCD display panel actually reflects the right amount of used space used on NFS volumes.  iSCSI volumes show up as completely full even if there’s not a single file on ‘em.  (That’s not the device’s fault, but users won’t know that.)

That’s probably a good lesson for StorCenter users: the features that require zero parameters are easy, and they work like a champ.

How Fast is This Cheap NAS?

They WILL vary.

Your mileage may vary. (Just one of dozens of test passes.)

Geeks love to tweak parameters.  I want to set the stripe size, caching, NIC load balancing, and flux capacitor voltage for every storage device.  That’s not what the StorCenter PX4/PX6 is about: it’s just, uh, storage.  Just as the other LifeLine apps don’t have much in the way of parameters, the iSCSI app checks just enough boxes to work – but that’s where the fun ends.

Most of my Windows-based iSCSI tests were able to saturate a 1Gbps Ethernet link with reads or with big sequential writes, and small or random writes ran around 10-60 MB/sec.  I was surprised, though, that no matter what tricks I played with multipathing or multiple iSCSI volumes striped together, I just couldn’t get this thing to blow past a single NIC’s throughput.  The lack of network card performance metrics on the Iomega made it a little tricky to troubleshoot, but after I checked the Cisco switch metrics, I got proof: the iSCSI traffic was only sent back through one network card on the Iomega – specifically, NIC #1.

I can write iSCSI data to both of the PX4′s network cards simultaneously.  They just won’t send data back from both simultaneously – at least, not when they’re both on the same subnet.  Maybe by getting even fancier with the multipathing setup and putting the two network cards on two different VLANs, in two subnets, I might be able to break past the 100 MB/sec read speed limit.  (For more about multipathing, check out my SAN Multipathing series.)

But you know what?  100MB/sec is fine for me.  See, if I really needed speed, I would have filled this thing with SSDs to begin with.  Solid state drives excel at random access, and that’s what every Iomega StorCenter NAS is going to end up doing – lots of random access.  Multiple VMs accessing the same LUN, BitTorrents being transferred, MP3s being played, these little devices are a party in a box.  The drives are going to be worked hard, and I bet for most people, the real speed limit will be the random access speed of the drives anyway.

But while troubleshooting storage performance multipathing, I ran into the bad side of the monster again.  I unplugged one of the Iomega’s two network cables, and I didn’t get any warnings whatsoever.  The PX4 didn’t show any errors on the LCD display or the web control panel, and it didn’t even notify me via the built-in email alerts.  I had to dig into the network screen to even figure out that one of the cables was unplugged.  Upon reconnecting the cable, the Iomega autonegotiated to 10Mbps (not 100, not gig, but 10) half duplex.  No warnings were shown on the control panel for that, either, and to make matters worse, you can’t set the Ethernet speeds.  It’s autonegotiate or nothin’.

Bottom Line: It’s a Little Monster Alright.

So close, but yet, so far...

Iomega PX4-300d: 95% There

The Iomega PX4-300d NAS seduced me with its long, strong…feature list, and I’m willing to overlook the wild infestations of crabs bugs for my lab at home.  It’s the first NAS I haven’t returned back to the shop in less than 72 hours.  Like the philosopher said:

Look at him, look at me
That boy is bad, and honestly
He’s a wolf in disguise
But I can’t stop staring in those evil eyes

Who should buy the StorCenter PX4/PX6? Home users that want one quiet, capable box to sit in a closet handling backups, media, BitTorrents, uploading photos to the cloud, and other menial chores.  Yes, you could build your own more-capable solution for less money, but it won’t be as point-and-click easy as the Iomega.  IT, DevOps, and developer managers should also pick up one of these as a reward for a well-performing team.  Throw one of these under a cubicle and for less than the price of a good workstation, the entire team now has an iSCSI sandbox and shared MP3 storage off the domain and away from the corporate IT overlords.

Who shouldn’t buy it? I wouldn’t recommend the PX4/PX6 in a business environment for anything other than a lab – at least, not until versions of the firmware come out that fix the glaring UI bugs and stop users from rebooting the NAS.  I would be very uncomfortable to go into a small business as a consultant, sell them a shared storage setup using a StorCenter PX4/PX6, and walk away.

Can they fix these issues with firmware updates?  Yes, but only if they test the new firmware updates better than they’ve tested so far, and as of August 2011, it’s not looking good.  After applying a firmware update to fix the VMware multipathing issue, I received this disturbing email from Iomega:

Thank you for downloading the recent firmware update for your Iomega StorCenter px. After release, we identified an issue with growing or expanding storage pools with the v 3.1.10. 45882 update.  If you have not yet applied the update to your system, please wait. We will be releasing a new version of the firmware that resolves the issue soon.  If you applied the update, you should not experience any issues unless you expand or modify the size of a storage pool.   If you do experience any issues with a storage pool, please reply to Iomega Technical Support using this incident number….

M-m-m-monster…

If you decide to buy it, you can throw me a few coins by buying the Iomega PX4-300d via my Amazon link or the 6-drive version, the PX6.  And hey, it’s in stock for Prime members, so you could have it tomorrow – plenty of time to play with it over the holiday weekend. I’m not sayin’, I’m just sayin’.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Chicago #SQLPASS Meeting Recap

John Jones and Ray LaMarca of NetApp came to the Chicago PASS Chapter tonight, and John did a presentation about storage performance.  One of my measures of a good performance is the number of questions asked during the presentation, and John’s presentation was definitely a winner there.  Lots of good questions.

John Jones of NetApp

John Jones of NetApp

John recommended that everyone use an IO stress tool to validate performance.  This way you can use your metrics as a baseline so you can understand what kind of performance you can expect, analyze trends, and troubleshoot issues.  He listed several metrics, but he focused most on I/Os per second (IOPs) and IO size (the amount of data being transferred.)

Workload affects performance because different workloads access data differently.  OLTP databases are generally random in nature for both reads and writes, whereas OLAP (or Decision Support Systems) tend to be sequential, like table scans, index scans, and bulk inserts.  For a RAID array, the worst performance scenario is random writes, and the best case scenario is sustained sequential reads.  An attendee asked how SANs are affected in shared environments where multiple servers share the same disk drives, and John agreed that those systems will be mostly random.  For more on this topic, check out my Steel Cage Blogmatch with Jason Massie.

John pointed out that all SAN vendors get their disk drives from the same hard drive vendors.  A drive in a NetApp SAN doesn’t spin any faster than a drive in an HP SAN – it just boils down to what the SAN vendor does between the server and the drives.  Drive throughput averages are:

  • SATA 7200 rpm – 40 IOPs at 20ms latency
  • FC 10k rpm – 120 at 20ms
  • FC 15k rpm – 220 IOPs at 20ms
  • SAS 15k rpm – 220 IOPS at 20ms

SAS 15k drives are becoming more popular because they’re packaged smaller – NetApp can fit 24 drives in a 4u rack space.

You can use Performance Monitor (Perfmon.exe) to measure your storage performance.  Here’s the metrics John focuses on:

  • Average Disk Queue Length number, but that you have to work with your SAN team to find out the number of spindles behind your array.
  • Avg Disk Sec/Read should average under 20ms for OLTP systems (30ms for DSS)
  • Avg Disk Sec/Write should average under 10ms.
  • Disk Read Bytes/Sec (Throughput) – divide this by 1024 twice to get megabytes.
  • Disk Write Bytes/Sec (Throughput) – divide this by 1024 twice to get megabytes.
  • Avg Disk Bytes/Transfer (IO Size) – divide this by 1024 once to get megabytes.

I’ve got more about capturing and analyzing these statistics in my Perfmon tutorial for SQL Server DBAs.  John also likes using the DMV sys.dm_io_virtual_file_stats to get database-by-database IO statistics.

John talked about NetApp’s FlexVols, which sound like plain shared spindle configurations with some extra goodies like “automatic load shifting” which means “tuning is no longer necessary.”  I have mixed feelings about this.  If you don’t have the time to do a really good SAN configuration and you don’t need to wring every bit of performance out of the SAN, this works well, but if you do it poorly, you can screw yourself.  Try putting SQL Server on the same spindles as a bunch of file servers doing antivirus scans.

He also covered SAN snapshot backups, which can back up huge volumes of data instantaneously.  I like snapshot backups for a few use cases.  If you’ve got a multi-terabyte data warehouse, for example, and you need to quickly refresh your dev and QA environments, SAN snapshots are a neat way to do it.  He kept looking nervously over at me knowing I work for the company that makes Quest LiteSpeed – poor guy.  If I wanted to shoot holes in snapshot backups, I’d ask how they help for disaster recovery or log shipping, and how they manage to save space in environments that do index defrags, but I kept my mouth shut, heh.

NetApp has a Performance Acceleration Module (PAM) card that acts as a 256-512mb cache card.  It’s only for reads, but it gets you faster writes because your drives aren’t burdened with doing so many reads.  He showed some statistics suggesting that these cache cards get the same read performance benefits as adding a bunch of hard drives, but without the cost or space problems.

Great presentation, lots of good information & questions.  Big thanks to NetApp and to John!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL University: Scary Storage Scenarios

Merry Christmas, students! I very much appreciate that so many of you showed up in your bathrobes, because now I don’t feel so guilty teaching this class in mine. My Christmas gift to you is a shiny, new blog post. Try to contain your excitement.

This week, we’ve covered the basics of what SQL Server needs to store, and places we can store it. Today we’ll discuss what happens when storage goes bad – and it definitely goes bad.

Drives Go Bad When You’re Not Looking

After you’ve written data to storage, you probably assume it just stays there, staying exactly the same, waiting to be read.

Death Spiral

Death Spiral

If only that were true.

When drives begin their death spiral down the toilet bowl, sometimes it’s a long journey. At first, just a few blocks might go bad – but we won’t know about it if we’re not actively reading those blocks. Right now, in your servers, a hard drive might be in the process of self-destruction, taking your precious data with it.

The best defense is a good offense: actively checking your database content. SQL Server’s DBCC commands (yes, that’s an RAS) will go out and check the contents of your database to make sure everything’s still hunky-dory. The bigger your databases are, the more time it takes to check them, so DBCC has options that will let you just check the most important parts. For example, if a non-clustered index goes bad, we’re not too concerned because we can just drop that index and recreate it again. All of the index’s data is stored in the underlying table anyway, right? We should be able to recover in a matter of minutes.

If only that were true.

Where there’s smoke, there’s fire. If a component in your storage subsystem has started going bad, your first signs might be damage to a non-clustered index. Your proactive DBCC checks will catch that, and you’ll repair it, but you’re only fixing the symptom – not the disease. Storage corruption rarely just happens once and disappears. When your DBCC checks return problems, you do need to fix the symptom, but you also need to jump in and do root cause analysis. If you’re using a SAN with storage shared between several servers, start by checking other servers on that same SAN to see if they’re also having corruption problems.

Want to learn more about DBCC? Paul Randal’s blog posts on DBCC will teach you more than you wanted to know. When he was at Microsoft, he wrote the code behind DBCC.

SAN Drives Change When You’re Not Looking

Storage Area Networks are engineered for extreme reliability; if any one component in a properly configured SAN goes down, the servers won’t go down with it. They’ll be able to reroute their traffic automatically between multiple networks, multiple controllers, and multiple drive enclosures. SAN admins can even do some routine maintenance during business hours without anyone even noticing, like recabling switches, flashing firmware versions, adding additional drives to RAID arrays, or changing cache settings. Well – in theory, no one notices, but the performance changes can impact your servers. Do you know if your SQL Server’s transaction log array is 10% slower than yesterday? 20%? 30%?

I See What You Did There

I See What You Did There

The most advanced environments have even started using storage virtualization, which works just like our server virtualization technologies. Server virtualization abstracts the hardware away from the operating system, letting us move servers around. Storage virtualization lets SAN admins move your data from one array to another, like from RAID 10 to RAID 5, from SSD to SATA, or even from an old SAN to a new SAN. The benefit is that they can decommission old hardware without taking huge outages to rebuild servers from scratch. The drawback is that they can do this without telling you, on the fly. You might have fantastic performance on Monday, abysmal performance on Tuesday while they move your data, and then middling performance on Wednesday.

The solution to all of this? Constant performance monitoring. Learn how to use Perfmon to baseline your storage performance, and check your storage metrics regularly. Even if you don’t have the time to use Perfmon, you can start by monitoring your full backup times. If they jump by 20% overnight, there’s a good chance your SAN administrator just pulled a fast one – or rather, a slow one – on you. Talk to your storage folks to find out why performance is suddenly suffering, and use your Perfmon and backup times to prove it.

Virtualization Makes It Harder to Look

Virtualization abstracts your hardware away from the operating system. It’s much harder for you to know whether your CPU performance issues are caused by:

  • Your virtual server’s load
  • The host’s hardware capabilities
  • Load from other virtual servers

Or some combination of the above. Storage performance is even tougher to troubleshoot and measure, because the bottlenecks might be:

  • Your virtual server’s needs
  • The host hardware’s connection to the SAN (iSCSI, FC speeds)
  • The host hardware’s multipathing configuration (active/active, round robin, Russian Roulette, etc)
  • Other virtual servers sharing the same SAN arrays as you
  • Other virtual servers sharing the same connection to the SAN arrays as you (competition for paths)
  • The storage controllers and arrays

I don’t have a simple answer to solve your virtualization storage performance problems, but I do have one important recommendation: get good at performance monitoring before you start virtualization projects. If you can’t accurately identify the performance bottlenecks of non-virtual machines, you’re not going to get any better when they’re virtualized.

Summing Up the Series: Storin’ Ain’t Easy

SQL Server DBAs think they hold a patent on the answer, “It depends,” but storage administrators were using that phrase before DBAs were even a glimmer in…wait, let’s draw the line here before I start talking about your mothers. I could go on and on for dozens of pages about storage and how it affects SQL Server.

sql-server-2008-internals-and-troubleshooting-bookOh, wait – I did!

Want to learn more about SQL Server, raid levels, SAN storage, Windows storage configuration, and more? Check out Professional SQL Server 2008 Internals and Troubleshooting. My 40-page chapter on storage goes into more detail, but frankly, the other chapters are even better! There’s in-depth details on memory, processors, tracing, and more, and it’s got a ton of information you won’t find in other SQL Server books – especially the storage, latching, and Extended Events chapters.

It’s not for beginners – it’s targeted at senior DBAs, and people who want to become senior DBAs. If you liked this series, I know you’ll like the book.

Happy holidays!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL University: Where We Store Stuff

Welcome back, class.  In Monday’s session, we discussed the basics of SQL Server storage – things like the data file, log file, and tempdb.  We talked about why we might need to separate those things from each other in order to get more performance.  In today’s lecture, I’ll be covering where we store this stuff and why it matters.

In The Beginning, God Made Magnetic Hard Drives

You kids are living in a magical time; you’re old enough that you’ll remember when data was stored on spinning metal platters, yet young enough that you’ll watch the transition to stashing it in memory chips.  (Some of us are even old enough to remember computers that booted off floppy drives, but I digress.)

Hard Drive Platters

Hard Drive Platters

If you pop the hood on a 3.5″ magnetic hard drive, you’ll see something like the picture shown here.  The round silver platter is like a vinyl record, and the metal pointer-looking thing is like a the record needle.  Wait, what?  You don’t know what a phonograph is?  Well, I suppose that’s what I get for trying to use 19th century technology as a simile in the 21st century.

The pointer-looking thing (known as the “head”) changes the magnetic state of small areas of the drive.  If the area is charged one way, it represents a zero, and if it’s charged the other way, it represents a one.  As the platter spins around, the head moves over different areas of the drive where different data is stored.  The head can also move left or right to access closer in to the center of the platter, or further out toward the edges.  Over time, hard drive technology has improved by making the platter spin faster (thereby covering more data in less time) and packing more magnetic on/off areas on each platter (thereby letting us store more stuff in the same physical space.)

There’s a few problems with these drives.  First off, there’s a lot of moving going on; the platter spins at speeds of up to 15,000 RPM.  To put that in perspective, it’s faster than the engine spins in your car.  If any dust gets into the drive, or if a part comes loose inside, things will crash around inside the drive and tear it apart.  We might not find out about this damage until it’s too late – our data will get damaged as more parts of the platter start to fail, but unless we’re constantly re-reading everything on the drive, we won’t know about it.  The drive’s moving parts themselves can also wear out, such as the motor that spins the platter or the motor that moves the head around.  In order to fix those parts, we would have to crack open the drive, which would introduce dust and cause a failure.

While these speeds are fast in the mechanical world, they’re downright tortoise-like compared to the rest of our computer.  In the milliseconds that it takes to move the head from one point to another on the drive, our computer’s processors sit around waiting for what feels like eons.  This is why computers use memory to cache things that are also stored on the drives – it’s dramatically faster for us to read this data out of memory than it is to fetch it from spinning metal platters.

Want to learn more about relative speeds inside computers?  Peruse the Build-a-PC Bandwidth Chart, which compares bus speeds of processors, memory, networks, and storage.

Two Heads Are Better Than One

Redundant Array of Inexpensive Phonographs makes sure the beats go on and on to the break of dawn

Redundant Array of Inexpensive Phonographs ensures the beats go on and on to the break of dawn

To mitigate the failure and speed problems of these drives, sysadmins combined multiple hard drives together into Redundant Arrays of Inexpensive Drives (RAID).  There are several ways to configure groups of drives, and the simplest method is called RAID 1.  It uses two hard drives that always contain exactly the same set of data.  Whenever the computer writes data to one drive, it also writes that same data to the other drive.  The duplicate-write process can either be managed in hardware or at the operating system level.  When one of the drives fails, the other drive is unaffected, so our data is still available and our server stays online.

When both drives are working fine, then we can do our reads from either drive.  If we need to read a 10 megabyte file from storage, we can read 5 megabytes from one drive and 5 megabytes from the other drive, which could make our reads twice as fast.  If two drives still don’t give us the performance we need, there are other RAID levels like RAID 5 and RAID 10 that can handle more drives for more speed.  (The level names don’t have anything to do with the number of drives involved.)

Want to learn more about RAID levels? The Wikipedia articles on RAID and on standard RAID levels are very well-written.

The more performance we want, the more hard drives we have to use – but most servers only come with a few hard drive bays.  The solution is to move to a Storage Area Network (SAN), a system with hundreds of hard drives, a very smart computer to track what they’re doing, and a dedicated network to connect the servers to their hard drives.  SANs are big (typically fridge-sized or bigger), expensive (at least $100k, but often $1m or more), and they can be very fast.  Can be, but may not be – more on that in a minute.

Even dozens or hundreds of drives in a RAID configuration aren’t always fast enough, and that’s where the newest drive technology comes in.

Solid State Drives (SSDs)

USB thumb drives and memory cards for cameras represent a new kind of storage that uses transistors instead of magnetic charges to store data.  This flash memory uses an electric charge to change the state of a transistor much like our magnetic hard drives used power to flip magnetic bits, and the on/off state of each bit of flash memory represents 0s and 1s.  Just as magnetic hard drives don’t need power in order to maintain their state, flash drives don’t need electricity to maintain the state of the transistors.  This means that unlike your computer’s RAM memory that gets erased whenever you power off, the flash memory saves its state over time.  Eventually, as this technology got faster and cheaper, manufacturers could build entire hard drives out of these chips.

Early generations of solid state drives performed worse than conventional magnetic hard drives, but subsequent generations have dramatically improved.  SSDs can read any stored data immediately without waiting for platters to spin or heads to move, so they can really shine at random access of data.  Compare these two charts from Tom’s Hardware covering average read speeds (higher is better):

So we’ve established that they’re faster (in part because they don’t have moving parts), but just because they don’t have moving parts doesn’t mean they don’t wear out.  Each bit of storage can only be electronically flipped so many times before it breaks.  To account for that, SSDs try to balance out which parts of the drive are written to, and will move your data around in order to wear it out evenly.  “Hot” areas that have been written many times may be moved to another part of the drive, thereby leveling out the wear.  Drive manufacturers also build in extra blank space on the drive to give it extra space to take over when other areas are worn out, thereby extending the drive’s life.

Want to learn more about wear leveling?  Anandtech’s article on choosing the best SSD covers the subject in exhaustive detail and explains that the OS and drivers also matter.

We can protect ourselves from drive failure by using SSDs in RAID configurations the same way we did with magnetic hard drives.  In fact, SSDs may be simple drop-in replacements for some servers, since SSDs are available with SATA interfaces just like your desktop’s hard drive.  However, when we start talking RAID, we usually talk about connecting to our hard drives with something faster than SATA.  As we start to throw more and more drives at the problem, whether they’re magnetic or solid state, we run into a new problem.  We have to make sure that the connection between our drives and our computer is fast enough to handle all of the drives at once.

Homework Time: Studying Multipathing & Shared Drives

For those of you who are using a SAN, thinking of using a SAN, or hoping to get a job somewhere with a SAN, there’s a lot to learn.  Thankfully for me, I’ve already written a ton of it, and here comes your homework:

Stop whining – just because it’s a holiday week doesn’t mean you can stop learning.  After consuming all that information, you should have a basic grasp of what makes storage fast or slow, and how to find out your own storage performance.

Putting It All Together: How Fast Is My Storage?

Not So Fast, It Turns Out

Not So Fast, It Turns Out

Your storage speed is a combination of the drive type, the RAID level, and the connection between the servers and the drives.  (It depends on even more than that, like whether the drives are shared or dedicated, but let’s focus on the basics for now.)

Drive types, from fastest to slowest:

  • Good solid state drives
  • 15k magnetic drives (meaning they’re spinning at 15,000 RPM)
  • 7200rpm drives
  • 5400rpm drives
  • Bad solid state drives

There’s some variations in here – for example, there are really good 5400rpm drives that outperform bad 7200rpm drives, and so on, but again, let’s focus on the big picture.

RAID levels, from fastest to slowest:

  • RAID 10 or RAID 5 (depending on whether you’re reading or writing)
  • RAID 1 (a pair of drives in a mirror)
  • No RAID at all (a single drive, and yes, people do still do this – especially with solid state drives like Fusion-io)

You in the back – sit back down.  Yes, I understand that there are dramatic performance differences between RAID 5 and RAID 10 in many scenarios, but this is a beginner’s class.  If you want to teach, get your own course.

Connection methods, from fastest to slowest:

  • Properly configured SANs with true active/active multipathing
  • Properly configured SANs without active/active multipathing
  • Locally attached storage
  • Poorly configured SANs

The bean counters in the class will notice that these variables are also listed in another order: from most expensive to cheapest.  Good storage ain’t cheap.

You, dear student, probably don’t have the responsibility of architecting your entire SQL Server storage solution from scratch.  You’re probably stuck with a few existing options for your servers, like a couple of arrays from your SAN admin or asking for different RAID configurations.  Your simple answer is to use SQLIO to benchmark your storage performance and find out how fast each of your options are.

If you have a fixed number of drives, but you have the ability to carve them up into different groups, work with your SAN administrator to try different combinations.  For example, if you had 20 hard drives, you could measure the read/write performance of each of the following:

  • 14 drives in a RAID 10 plus 6 drives in a RAID 10
  • 14 drives in a RAID 5 plus 6 drives in a RAID 10
  • 2 sets of 7 drives each in a RAID 5, plus 6 drives in a RAID 10

Why would you try these different combinations?  Because your next decision – and this is where “it depends” becomes an art form – is where to place your SQL Server files.

Where SQL Server Meets Storage

Assuming you have multiple storage options with different performance profiles available to a given SQL Server, the simplest guideline is:

  • Transaction logs go on the fastest writing storage (typically your fastest drives in a RAID 10 array)
  • Data files go on the fastest reading storage

The problem is that both of those lines say “fastest.” When DBAs approach SAN admins in a dark alley with a storage requirement, we always say the same thing: “Give us the fastest storage you have.  All of it.”  Since SAN admins aren’t prone to emptying their wallets for us, we have to learn how to compromise.  We need to identify as many of our stored things as possible that don’t need speed, and then peel those out into separate requirements.  In the case of a multi-terabyte data warehouse, those requirements might look like this:

  • Transaction log files – extremely fast writes, needs to be on a RAID 10 array.
  • Primary data files – tables with frequently changing dimensions, last 12 months of sales data.  Needs to be on a RAID 10 array.
  • Archive data files – historical sales data >1 year old that makes up 75% of our data volume, but is rarely changed.  Can go on a slower RAID 5 array.

Then inside each of those needs, the DBA must decide whether to use a single file on a single array, or multiple files on multiple arrays.  That decision is determined by the number of cores involved, the types of queries, and even the multipathing configuration on the database server.  For example, if a server is connected to its SAN with several 1 gig iSCSI links that don’t have true active/active multipathing, the DBA and SAN admin may determine that they need multiple data files on multiple volumes just to get more throughput – regardless of CPU count or query type.

This is a lot of work to do on a server-by-server, database-by-database basis.  That’s why in the vast majority of cases, it’s best to start simple: just separate the data and log files onto the fastest drives possible, and then over time, monitor your storage performance.  I recommend using Perfmon to periodically check your Sec/Reads and Sec/Write numbers as discussed in my Perfmon article.  When your queries are experiencing IO bottlenecks and you trace the problem back to slow Sec/Reads and Sec/Writes metrics, then it’s time to tune storage.  Beef up that particular storage array, separate out what’s being stored on that array onto different arrays, or figure out how to make that array go faster (like with multipathing).

Hopefully I’ve started to open your eyes about the ways to tweak your storage options and what you put on that storage.  The options are staggering, and the more you play with it, the more you can eke out every last bit of performance.

Our final post on Friday will cover some scary storage scenarios.  (Yes, it’ll go live on Christmas Day, but only because I’m scheduling it in advance.)

Continue to Part 3: Scary Storage Scenarios

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL University: The Basics of Storage

A while back, when I was working as a DBA (before I became a consultant), my Storage Area Networking (SAN) admin decided to leave the company.  I volunteered to take on his job duties because I wanted the money wanted to further our educations.  We learned a lot through some very hard lessons, and I consider myself very lucky to have had that opportunity.  It’s really hard to get your foot in the door as a SAN admin, managing millions of dollars in hardware holding priceless data, and it’s just as hard to get quality SAN training.

This week, I’m going to try to bring you up to speed on how SQL Server and storage interact with each other.  This isn’t meant to be a doctorate-level course in SANs, but rather, a simple series that will teach you what you need to know in order to do your job better.   If you like what you see here, you might want to pick up Professional SQL Server 2008 Internals and Troubleshooting.  The book is targeted at DBAs who need to know more about what’s happening in and around SQL Server.  I wrote the storage chapter.

Click to Look Inside

Click to Look Inside

Today’s session will be about the very basics of storage.  I’m going to simplify things in order to help you see the big picture.  The experts in the crowd will notice that I’m leaving out things like snapshot isolation, multiple data files, and parallelism, but hang in there – we’ll get to some of those issues before the end of the week.  Alright, let’s get started!

What SQL Server Stores

At their simplest, SQL Server databases consist of two things: a data file and a log file.  The data file holds our tables, stored procedures, views, and so forth.  The log file holds a running play-by-play analysis of what SQL Server is trying to do at any given time.

When one user issues a SELECT statement against the database to read data, the SQL Server engine’s response is fairly straightforward.  It reads the data it needs from the data file.

When users INSERT, UPDATE, or DELETE data, however, things get a little more complicated.  SQL Server needs to make sure that two people can both access the data simultaneously and safely.  It also needs to guarantee that if the power plug gets pulled in the middle of an UPDATE statement, our data will be safe when the server restarts.  The engine has to be able to roll back (or roll forward) transactions that had not yet completed.

Want to learn more about this?  Drop ACID.  The concept of Atomicity, Consistency, Isolation, and Durability is the foundation for heavy-duty relational databases.

This is where the log file comes in.  In simple terms, when a user inserts a record, the SQL Server engine:

  1. Writes an entry to the log file recapping what it’s about to do
  2. Writes the new record into the data file
  3. Writes an entry into the log file saying it’s done mucking around in the data file

As often happens, let’s assume that a drunken sysadmin held down the power switch on our database server at the exact moment our SQL Server was busy performing step #2.  After he hit it again to power the server back on, the SQL Server engine would start up and review the log.  It would notice the logged entry from step #1, notice that there wasn’t a matching log entry for #3, and then go poke around in the data file to see whether or not the change had been made.  I don’t want to advocate drinking in the datacenter (if you see your sysadmin drinking, make sure to take his rack keys away), but I do have to say that SQL Server is amazingly resilient.  I’ve seen servers lose power or lose connections to their storage in the middle of very heavy activity, and yet they’ve powered back up fine.  (If your server gets into a situation like this, though, there are precautions you’ll want to take, and we’ll address those later in the week.)

A Fill Factor of .10 or Above is Unsafe

A Fill Factor of .10 or Above is Unsafe

The drawback of this safe system is a that it’s constantly writing to the log file.  Looking at our three-step example, two of the three writes involve the log file.  The more often we do DUIs (Deletes/Updates/Inserts), the faster our log file drives need to react. When we’re configuring hardware for SQL Server, we want to consider two things:

  • Put the data and log files on separate drives
  • Put the log files on the fastest-writing drives we can afford

In a perfect world, we’d put everything on the fastest drives we can afford, but in this perfect world I would be eating bacon instead of teaching you bozos, and bacon would prevent heart attacks.  Back here in the real world, though, we have to take Lipitor and cut back on our budgets.  We can often get away with a little less speed on the data drives if we can use the SQL Server’s memory to cache queried data and avoid hitting the drives for every read, but we can’t use that memory to cache log file writes.  Furthermore, the log file isn’t the only thing that wants its own storage.

Son, Your Server is Writing Data Your Memory Can’t Cache

SQL Server does a pretty good job of using memory to avoid hitting storage.  Unfortunately, in addition to log files, two other parts of SQL Server also like to do a whole lot of writing at the exact same time we’re accessing the data file: TempDB and indexes.  When these things need to write data, they need to write it NOW, and we can’t throw memory at the problem to make it faster.

TempDB is a system database that ends up a little like your kitchen junk drawer; it’s got all kinds of random stuff in there that isn’t really related to each other.  Users write queries that create temporary tables in here, SQL Server uses it for big sorting operations, and the engine even uses it for some special transaction modes.  In the case of temp tables, for example, users will write code like:

CREATE TABLE #MyReport (PilotID INT, DogfightsWon INT, DogfightsLost INT, TowersBuzzed INT)
INSERT INTO #MyReport (PilotID, DogfightsWon, DogfightsLost, TowersBuzzed)
  SELECT p.PilotID, SUM(p.DogfightsWon), SUM(p.DogfightsLost), SUM(p.TowersBuzzed)
  FROM dbo.Pilots p
  INNER JOIN dbo.Instructors i ON ....

The important thing to note here is that SQL Server will be doing two things simultaneously:

  • Reading from the data file
  • Writing to the TempDB data file

This is a fairly typical use for TempDB, so when managing a SQL Server with TempDB-intensive queries, it makes sense to isolate TempDB onto its own separate hard drives.

Indexes present a similar problem.  If we’ve got a table with five indexes, then every time we add a record to our table, SQL Server also has to add five other records – one to each of our indexes.  That’s a lot of writing, but rather than adapting our storage design, I recommend that DBAs control their index designs.  Use this SQLServerPedia query to find unused indexes, because dropping them will instantly result in faster DUIs – and who doesn’t want that?  Dropping indexes is cheaper and easier than micromanaging multiple arrays to handle data and indexes, because indexes aren’t as easy to isolate as the log files or TempDB files.

So we’ve identified several things that might need separate storage due to their constant simultaneous access:

  • Data files
  • Log files
  • TempDB
  • Indexes (plus a hidden one, full text indexes)

In our next class this week, we’ll look at our storage options – the places we can put these files, and they may perform differently.

Continue to Part 2: Where We Store Stuff

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SAN Multipathing Part 2: What Multipathing Does

In Part 1 of my multipathing series, I talked about what paths are, and today I’m going to be talking about multipathing.  SAN multipathing software has two goals, in this order:

  1. Protection
  2. Performance

Using SAN Multipathing for Failover Protection

What Could Go Wrong?

What Could Go Wrong?

Your server absolutely, positively has to be able to access its drives at all times.  When servers can’t access their hard drives, horrendous things happen.  When hard drives were directly attached to servers, this wasn’t a big risk, but storage area networks bring in a lot of risky factors.  Cables get unplugged or get bent beyond repair.  Switches fail.  Network configurations don’t go according to plan.

(Side note: I think this was one of the biggest reasons SAN administrators didn’t want to go to iSCSI.  They saw how our network cables looked, and they didn’t want their precious fiberoptic cables getting that same treatment.)

Multipathing software mitigates this risk by enabling the SAN admin to set up multiple routes between a server and its drives.  The multipathing software handles all IO requests, passes them through the best possible path, and takes care of business if one of the paths dies.

In the event of a problem like an unplugged cable, the multipathing software will sense that IO has taken too long, then reset the connections and pass the request over an alternate path.  The application (like SQL Server) won’t know anything went wrong, but the IO request will take longer than usual to perform.  Sometimes in SQL Server, this shows up as an application-level alert that IO has taken more than 15 seconds to complete.

To make this work, SAN administrators build in redundancy at every possible layer of the SAN infrastructure – multiple HBAs, multiple switch networks, multiple connections from the controllers, and so forth.  But most of the time, all this extra connectivity sits around idle.  It’s designed to be used for protection, but not necessarily performance: it’s active/passive gear where only one thing is active at a given time.   The secondary goal of multipathing is performance, but it’s a far, far second.  SAN administrators are so conservative, they make database administrators look like gambling addicts.  They’re perfectly comfortable leaving half or more of the infrastructure completely unused.

Do We Really Need More Bandwidth?

Depending on the SAN infrastructure, the theoretical speed limits are around:

  • 1GB Fibre Channel or iSCSI – around 125 MBs/second (this is the most commonly deployed iSCSI speed)
  • 2GB Fibre Channel – around 250 MBs/second
  • 4GB Fibre Channel – around 500 MBs/second (this is the most commonly deployed FC SAN speed)
  • 10GB iSCSI – around 1250 MBs/second

These limits were fine ten or fifteen years ago when hard drives weren’t all that fast, but here’s some sample read speeds from today’s desktop-class SATA drives:

  • One drive – around 130 MBs/second (from TomsHardware reviews)
  • RAID 5 array of five drives – around 300 MBs/second (from my home lab)

Forget 15k drives or solid state drives – even just with today’s SATA drives, 4GB Fibre Channel can get saturated fairly quickly during large sequential read operations, like SQL Server backups or huge table scans on data warehouses.  Sadly, I see so many cases where the IT staff bought a SAN with dozens or hundreds of hard drives, hooked it up to a server with just two 4GB fiberoptic connections, and they can’t understand why their storage isn’t much faster than it was with local disks.  Even if they get savvy to the basics of multipathing and try connecting more 4GB HBAs, their storage speed doesn’t necessarily increase.

Enter Active/Active Multipathing

Active/active multipathing is the ability to configure a server with multiple paths to the storage and simultaneously use all of them to get more storage bandwidth.  This type of multipathing software is usually sold by the SAN vendor, not a third party, because it’s a lot more complicated than it looks at first glance.  Talk to your SAN vendor and ask how much their active/active multipathing software costs, and what it’s compatible with.  EMC’s PowerPath even works with gear from multiple vendors.

But before you plunk down a lot of hard-earned cash – well, it’s not that hard-earned for storage administrators, but I’m talking to database administrators here – you need to ask one very important question: what exactly does this software mean by active/active?  In your feeble mind, you probably believe that you can have one array, accessed by one server, and spread the load evenly over two or more Host Bus Adapters.  Not so fast – some vendors define active/active as:

  • Only one path can be active per array at a given time. If you have four HBAs, you’ll need four arrays in the SAN, and SQL Server will need to spread the data across all four arrays.  This means designing your database filegroups and files specifically for the number of HBAs in use on your server.
  • All paths work for sending data, but only one can receive. I’ve seen this in iSCSI active/active multipathing solutions.  For SQL Server, this means you can insert/update/delete/bulk-load data at breakneck speeds, but your selects still crawl.
  • Active/active works, but failover sticks. Say you have two paths to your data, and one of the paths goes bad for some reason.  All traffic fails over to the alternate path.  When the bad path comes back up (like the cable is plugged back in, the power comes back on, the port is replaced, etc) traffic doesn’t automatically balance back out.  It stays on the single path.  The only way to find this out is with expensive SAN-monitoring software or by browsing through SAN configuration screens periodically.

For virtual servers, I’ve got bad more news: the only true active/active SAN multipathing today is in VMware vSphere 4.0 with EMC PowerPath.  Stephen Fosketts explains the storage changes in vSphere.  If you’re on VMware v3.5 or prior, on Windows Hyper-V, or on vSphere 4.0′s lower licensing tiers, you’re stuck with one HBA of throughput per server per LUN (array).  This is one reason why you might not want to virtualize your high-end SQL Servers yet: they don’t get quite the same level of throughput that you can get on physical hardware.  Don’t let that scare you off virtualization, though – remember, you’re probably reading this article because you don’t have true active/active multipathing set up on your physical SQL Servers, either.

There’s a lot of catches here, and the SAN salespeople are always going to smile and nod and say, “Oh yeah, ours does that.  That’s good, right?”  It’s up to you: you have to ask questions and test, test, test.  Get a time-limited evaluation copy of their multipathing software and test your SAN performance with SQLIO, as I explain over at SQLServerPedia.  It’s the only way to know for sure that you’re getting the most out of your storage investment.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts