Tag Archive: ssd

Automated Tiered Storage for Databases

Database administrators want just one kind of storage: fast.

Companies want just one kind of storage: cheap.

Since fast storage isn’t cheap, and we can’t put everything on slow storage either, we end up buying different tiers of storage. Let’s start by pretending that we’re buying a new HP DL370 G6 to act as a data warehouse server.  It can hold up to 14 3.5″ hard drives – so what drives should we buy if we want to use tiers?

All the better to store you with

HP DL370 G6: Our local tiered storage candidate

Tier 1: Mirrored pair of Solid State Drives – SSDs are still pretty expensive, so we only buy two of these.  We’re going to store mission-critical data on our SQL Server, so we buy two of these drives and mirror them.  That way, if either drive fails (and SSDs certainly do fail), our server won’t go down.  However, SSDs have limited capacity, so if we buy two 256GB drives in a mirrored pair, we’re only going to get 256GB of usable very-fast capacity.  SSDs are blazin’ fast at random reads and writes because they have no moving parts – they can jump to any portion of the drive at any time.  In our sample server, we’re going to use these drives for TempDB because we’ve determined that our application hammers TempDB hard.

Tier B: Four 15K drives in RAID 10 – 15k refers to how fast the drives spin.  At 15,000 revolutions per minute, the platters are flying, and lots of data is passing under the hard drive heads.  These drives can read or write a lot of data very fast when they’re dealing with sequential data, because the drive heads don’t have to move around.  They’re not as good with random reads and writes as SSDs, which have no moving parts.  Our 15K drives are larger than SSDs – we can afford to buy the 450GB models, and with four of those in a RAID 10, that gives us about 900GB of unformatted capacity.  (You can learn more about calculating RAID sizes in the Wikipedia RAID article.)  In our sample server, we’ll use these drives for our transaction logs.

Tier III: Eight cheap 7200RPM SATA drives in RAID 5 – These only spin at 7,200 RPM, roughly half the speed of the above drives, which means data is passing under the hard drive heads more slowly.  Just as slower cars are cheaper than faster cars, slower hard drives are cheaper too.  We’re going to buy the 2TB models, and with 8 of those in a RAID 5, that gives us 14TB of unformatted capacity.  RAID 5 can be much slower for writes, but it can be fast for large reads.  Our database stores a lot of wide data because we need to store XML and files in the database, but we don’t update that data very often.  It’s mostly kept for archive purposes.  We really do need that much space for our user databases.

This is manual tiering with local storage: carefully determining your application’s needs and then crafting a storage solution to fit.  It requires picking the right server, the right drives, the right RAID configuration, and then finally, configuring the application (SQL in this case) to store the right data in the right places.

This is a pain in the ass.

Very few people do this because there’s so much labor involved if you want to do it right.  If you just make some guesses about loads, buy any old drives, and slap ‘em into any old RAID configuration, then you’re locked into bad performance.  Six months down the road when you realize the application isn’t really hammering TempDB, and that the real speed bottleneck is one particular set of tables in a user database, you’re going to have a rough time reconfiguring all this.  Worst case scenario, we’re talking about backing up all of the databases, reformatting everything, and restoring it all from backup.  The bigger your databases get, the more painful this becomes.

In-Drive Storage Tiering

Whenever there’s a pain, vendors step in and offer to take your money help.  One of the early successful hybrids was Seagate’s Momentus XT – a 2.5″ laptop drive that partnered 4GB of SSD memory with a 500GB hard drive in one package.  The drive learned which data you accessed the most frequently, then cached that data on the SSD.  The hybrid drive sped up routine actions like reboots and application launches.

If it was shipping, I'd show a picture of it in my hand.

OCZ RevoDrive Hybrid

The newest contender in the market, the OCZ RevoDrive Hybrid, is a PCI Express drive like Fusion-IO, but it combines a 100GB SSD with a 1TB 2.5″ laptop drive.  These aren’t shipping yet, but the press release holds an interesting nugget:

“Advanced caching algorithms learn user behavior and adapt storage policies to ensure optimal performance for each individual user, maximizing productivity for the most demanded programs and applications.”

The word caching implies that like the Momentus XT, the RevoDrive Hybrid is writing everything to the magnetic drive, but just caching frequently accessed data on the SSD for faster reads.  However, further down in the release, they brag:

“In addition, the drive not only eliminates the SATA bottleneck unleashing ground-breaking bandwidth up to 910MB/s, but also features up to 120,000 IOPS (4K random write) for high transactional workloads delivering true SSD-level performance.”

The phrase random write implies that writes will hit the SSD first, and then later be pushed down to the hard drive.  You can’t get 120,000 IOPs of 4K random writes on a 2.5″ magnetic drive.  This also means that if you tried to continuously do 4k random writes, you might be able to fill the available space on the 100GB SSD, and then performance would slow down as the RevoDrive was forced to migrate data down to the hard drive.

I’m not suggesting you use either of these solutions on a production SQL Server, but I’m showing them to introduce you to the concept of tiered storage.  In a single drive, some of your data might reside on fast solid state memory, and the rest would live on slower, more capacious magnetic Frisbees.

SQL Server Thinks Drives is Drives

Tiered storage can be much cheaper, easier, and more effective than partitioning.The good news is that your database server doesn’t need to be configured for anything in order to use this storage.  In our data warehouse example, we could use several of these RevoDrive Hybrids to store our sales table, which has hundreds of millions of rows of history.  It documents our company’s history going back for the last ten years, and our executives have told us we’re not allowed to archive any data.  They want to be able to run reports to compare this Christmas season’s sales against the last ten Christmases.

Before automated storage tiering, this type of table was painful for DBAs.  How do we decide which data to put on which drives?  If we only keep the most recent year of data on fast storage and the rest on slow storage, then our users are going to be screaming during every holiday season.  They’ll all try to run reports comparing this year against the last ten years, and the reports will run dead slow.  We scream, “WE CAN’T STORE ALL THAT DATA ON FAST DRIVES!” but actually…we can.

With automated storage tiering, the storage gear constantly watches which parts of the drive are being accessed, and moves them to faster/pricier storage.  It also watches which parts aren’t being frequently accessed, and moves them to slower/cheaper storage.  Depending on your environment, this tiering management might be done in different places:

DBAs need to know about this technology because it can help you avoid complicated setups like table partitioning or guessing what needs a fast TempDB versus what needs fast transaction logs.  When you know how tiered storage works and when you know how to test it, you’ll be able to adapt to changing performance needs faster.

I’ll be discussing this and more in my 4-hour Storage Area Networks for DBAs webcast.  Come join me!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Fusion-IO ioDrive Review

Solid state drives (SSDs) have gotten a lot of press because they can be dramatically faster than magnetic hard drives.  They use flash memory instead of spinning magnetic platters.  Since any bit of memory can be accessed at any time without moving a hard drive’s head around, SSD random access is insanely fast.

Fusion-IO ioDrive

Fusion-IO ioDrive

How fast are SSDs?  So fast that good ones overwhelm the capacity of the connection between the server and the drive.  The SATA bus maxes out at around 300 MB/s, and a good SSD can saturate that connection.  In order to get your money’s worth out of an SSD, you have to connect it with something faster than SATA.  Value is especially important given the pricing of solid state drives – more on that in a minute.

Fusion-IO ioDrives get around this limitation because they’re not SATA drives; they plug directly into your server’s much faster PCI Express bus.  These cards can push several times more data per second than SATA drives can. Other vendors using this approach include OCZ Z-drives and RAMSAN.  Of course, this connection method only pays off when the drive uses top-notch memory chips, and after briefly testing some of Fusion-IO’s products in their lab, I can vouch that they’re using the good stuff.

How Fast Are Fusion-IO Drives?

As a former SAN administrator, I’m anal retentive about reliability and uptime.  I’ve heard FusionIO drives sold as a “SAN in your hand,” but with a single drive, there’s not enough failure protection for my personal tastes.  I wouldn’t run any storage device without redundancy, so I ran most of my tests in a RAID 1 configuration – a mirrored pair of Fusion-IO SSDs.  Keep in mind that since these devices have their own built-in controllers, any RAID setup must be a software RAID setup managed by Windows.  Software RAID has a bad reputation, but it’s the only choice available when working with these drives.  I was initially worried about the performance impact of software RAID, but I didn’t have anything to worry about.

I tested several different ioDrive models using my SQLIO scripts as seen on SQLServerPedia and got blazing results.  Here’s a fairly typical set of results from a pass doing random reads:

C:\SQLIO>sqlio -kR -t2 -s120 -o8 -frandom -b64 -BH -LS P:\SQLIO\TestFile1.dat
sqlio v1.5.SG
using system counter for latency timings, 2929716 counts per second
2 threads reading for 120 secs from file P:\SQLIO\TestFile1.dat
using 64KB random IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to use hardware disk cache (but not file cache)
using current size: 10240 MB for file: P:\SQLIO\TestFile1.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 22787.54
MBs/sec:  1424.22
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 106
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

I know what you’re saying, though.  “Wow, Brent, that’s faster than any SAN I’ve ever seen before, but solid state drives are slower for writes, right?”  Yes, here’s a set of results for writes:

C:\SQLIO>sqlio -kW -t2 -s120 -o1 -frandom -b64 -BH -LS P:\SQLIO\TestFile1.dat
sqlio v1.5.SG
using system counter for latency timings, 2929716 counts per second
2 threads writing for 120 secs to file P:\SQLIO\TestFile1.dat
using 64KB random IOs
enabling multiple I/Os per thread with 1 outstanding
buffering set to use hardware disk cache (but not file cache)
using current size: 10240 MB for file: P:\SQLIO\TestFile1.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 10114.24
MBs/sec:   632.14
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 0
Max_Latency(ms): 54
histogram:
ms: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 100  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Write performance was about half as much as read performance, but that wasn’t because Fusion-IO drives were significantly slower in writing.  They managed the same 500-600 MBs/sec write performance when using one drive alone (rather than RAID).  When I put them in a RAID 0 (striped) configuration, I frequently got over 1,000 MBs/sec.  To put things in perspective, the fastest solid state drives in Anandtech’s recent showdown only achieved 337 MBs/sec in their absolute best case scenarios.

The only way to outperform a Fusion-IO drive is to invest six figures in a SAN and hire a really sharp SAN admin. These drives consistently outperformed every storage I’ve ever seen short of SANs wired up with true active-active multipathing software, because typical 4Gb fiber connections can’t sustain this kind of throughput.  Random access, sequential access, lots of threads, few threads, reads, writes, you name it – once we had them configured properly, I couldn’t get them to perform any slower than 550 MBs/sec, which is faster than you can drive 4Gb fiber.  Connecting directly to the PCI Express bus really pays off here, and makes everything simple.

Configuring SQL Server on a SAN is hard.  To really wring the best performance out of it, you have to design the pathing, the queue lengths on the host bus adapters, the RAID arrays, the SAN caching, the database filegroups and files, and sometimes SQL Server table partitioning in order to get everything to work in concert.  Fusion-IO drives return SQL Server configuration to its easiest: plug it in, put data and logs on the same array, and just create databases with the GUI, using one data file and one log file.  As long as your hardware environment is solid, it’s pretty darned hard to screw up a database configuration on Fusion-IO drives.

The Drawbacks of PCI Express Drives

Connecting data storage devices as PCI Express cards isn’t all unicorns and rainbows. Since they’re PCI Express cards, they can’t be used in blades unless you’ve got special PCI Express expansion blades.

These cards are connected directly to one server, which means they can’t be used in clustering environments.  If your data is mission-critical, you’re probably using a cluster to protect yourself from server failures.  Protecting from failures is even more important when you’ve got a single point of failure – like, well, a single PCI Express card with all your data.  If something should happen to go wrong with one of these devices, you can’t simply fail over to another server unless you’re doing synchronous database mirroring, something I rarely see in production.

You can get some form of protection by using software RAID 1: building a simple mirror with Windows between two Fusion-IO drives in the same server.  Whenever data is written to the volume, Windows will automatically write it to both Fusion-IO drives.  Software RAID gets a bad rap, but in my brief testing, I saw no performance penalty when using this configuration.

However, when a drive fails, you probably won’t be hot-swapping these while the server is in operation.  With server-class RAID arrays, you can pull a failed hard drive out of a server and replace it on the fly.  The RAID controller will rebuild the array while the server is still online.  Data access speeds will be slower while the array is rebuilt, but at least the server can stay up the entire time without an outage.  Not so with PCI Express cards: the server will have to be pulled out of the rack and opened up in order to access the drives.  This requires careful cabling – something I don’t see often in datacenters.

And Yes, You Need to Be Paranoid

During my testing, before Fusion-IO ironed out all of the configuration issues, I kept having drives drop offline.  Normally I’d blame my own stupidity, but my tests were run in FusionIO’s datacenter, on their servers, configured by their staff.  I connected via remote desktop, set up SQLIO per my SQLIO tutorial at SQLServerPedia, and ran the tests.  During my tests, ioDrives appeared to have failed and FusionIO staff had to replace them.  It took several weeks for us to narrow down several unfortunate problems.

If you truly try to push your IO subsystems to the limit, a Fusion-IO subsystem will expose more weaknesses than other storage subsystems because it has so much more throughput.  Some of the problems included motherboard issues, driver problems, OS configuration errors, and even insufficient power supplies that couldn’t handle the load of multiple drives.

Buyers need to be aware that this is a version 1 product with version 1 best practices and documentation.  When you put something like this into your infrastructure, make sure you’re actually adding reliability.  In my post about adding reliability to your infrastructure, I pointed out:

“The only way a twin-engine plane is more reliable is if just one of the two engines is enough to power the airplane safely. If the airplane requires both engines in order to maneuver and land, then the second engine didn’t add reliability: it just added complexity, expense and maintenance woes.”

The ironic part about my FusionIO testing woes was that they only happened in RAID scenarios.  The drives were actually more reliable without RAID – when I added RAID, I could knock a drive offline in minutes.  The faster we wanted to go, the more careful the team had to be with other pieces of the infrastructure.

FusionIO drives solve a real problem, and they can deliver staggering performance, but just like any other new technology, you should test them thoroughly in your own environment before deploying them in production.  Make sure to test them in the exact configuration you plan to deploy – if you’re going to deploy them in a RAID configuration, test them that way, rather than testing individual drives and assuming they’ll hold up in RAID configs.  In the case of Fusion-IO drives, you should probably even test using similar power supplies to production in order to improve your odds.

Where I’d Use Fusion-IO Drives in Database Servers

If you’re experiencing heavy load problems in TempDB, and if you’re not using a cluster, a Fusion-IO drive can probably solve the problem with less engineering effort than any other solution.  Simply shut down the server, drop in an ioDrive, change SQL Server’s TempDB location to point to the ioDrive, and start SQL Server up again.  Your TempDB IO won’t travel over the same storage paths that your database needs, which frees up more bandwidth for your data and log traffic.  TempDB requests stay on the PCI Express bus and don’t hit your SAN.

If you’ve got power, cooling, and space constraints in your datacenter, but you need to add more storage performance (but not necessarily capacity), a Fusion-IO drive makes sense.  You’ll gain more storage throughput this way than by adding several shelves of bulky, hot hard drives that require lots of power.  On a cost-per-gigabyte basis, this won’t make economic sense, but if you’re buying storage for performance needs, the cost equation is different.

If you need to scale an OLTP database that doesn’t require high availability, you might consider skipping clustering and trying database mirroring instead.  Use two SQL Servers in the same datacenter, both equipped with Fusion-IO drives, and make sure they’ve got plenty of bandwidth between them to keep up with synchronous (or asynchronous) mirroring.  You could argue that this provides a higher availability than clustering, since it uses two different storage back ends.  I’ve had SANs fail, and I can see how this might be attractive in some environments.  StackOverflow strikes me as an excellent candidate – the databases are small enough to fit on Fusion-IO’s drives, and the servers are rack-mounted as opposed to blades.

I’m also intrigued at the ioDrive’s potential to offload transaction log load.  I can envision a scenario where databases are in full recovery mode, but the load of logging and transaction log backups is starting to put a strain on the server’s IO.  Moving the transaction logs onto the Fusion-IO drive eases the load on the SAN (not just the hard drives, but the cabling between the servers and the SAN controller).

I wish I had SQL Server benchmarks for these scenarios to share with you, but the testing process ended up taking several weeks, and I ran out of time.  Thankfully, Paul Randal is blogging about his experiences with Fusion-IO drives.

My verdict: these drives can solve some tough challenges. I’m not saying that because I’m being paid to, either; Fusion-IO was gracious enough to give me access to their labs for my testing, but I didn’t get compensated.  Quite the opposite – I sank a lot of time into this project.  Folks who follow me on Twitter may remember that I struggled with Fusion-IO during the initial stages of the testing as we went through one hardware failure after another.  After the problems we encountered and the weeks of investigation, I’m glad to finally be able to say without hesitation that you should check out FusionIO’s products.  Their throughput may push the rest of your infrastructure to its limits, but hey – that can be a good problem to have!

This post is a part of T-SQL Tuesday, which is focusing on storage/IO this week.  If you liked this post, head over there to see what else is happening in the blogosphere this week!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts