Database administrators want just one kind of storage: fast.
Companies want just one kind of storage: cheap.
Since fast storage isn’t cheap, and we can’t put everything on slow storage either, we end up buying different tiers of storage. Let’s start by pretending that we’re buying a new HP DL370 G6 to act as a data warehouse server. It can hold up to 14 3.5″ hard drives – so what drives should we buy if we want to use tiers?
Tier 1: Mirrored pair of Solid State Drives – SSDs are still pretty expensive, so we only buy two of these. We’re going to store mission-critical data on our SQL Server, so we buy two of these drives and mirror them. That way, if either drive fails (and SSDs certainly do fail), our server won’t go down. However, SSDs have limited capacity, so if we buy two 256GB drives in a mirrored pair, we’re only going to get 256GB of usable very-fast capacity. SSDs are blazin’ fast at random reads and writes because they have no moving parts – they can jump to any portion of the drive at any time. In our sample server, we’re going to use these drives for TempDB because we’ve determined that our application hammers TempDB hard.
Tier B: Four 15K drives in RAID 10 – 15k refers to how fast the drives spin. At 15,000 revolutions per minute, the platters are flying, and lots of data is passing under the hard drive heads. These drives can read or write a lot of data very fast when they’re dealing with sequential data, because the drive heads don’t have to move around. They’re not as good with random reads and writes as SSDs, which have no moving parts. Our 15K drives are larger than SSDs – we can afford to buy the 450GB models, and with four of those in a RAID 10, that gives us about 900GB of unformatted capacity. (You can learn more about calculating RAID sizes in the Wikipedia RAID article.) In our sample server, we’ll use these drives for our transaction logs.
Tier III: Eight cheap 7200RPM SATA drives in RAID 5 – These only spin at 7,200 RPM, roughly half the speed of the above drives, which means data is passing under the hard drive heads more slowly. Just as slower cars are cheaper than faster cars, slower hard drives are cheaper too. We’re going to buy the 2TB models, and with 8 of those in a RAID 5, that gives us 14TB of unformatted capacity. RAID 5 can be much slower for writes, but it can be fast for large reads. Our database stores a lot of wide data because we need to store XML and files in the database, but we don’t update that data very often. It’s mostly kept for archive purposes. We really do need that much space for our user databases.
This is manual tiering with local storage: carefully determining your application’s needs and then crafting a storage solution to fit. It requires picking the right server, the right drives, the right RAID configuration, and then finally, configuring the application (SQL in this case) to store the right data in the right places.
This is a pain in the ass.
Very few people do this because there’s so much labor involved if you want to do it right. If you just make some guesses about loads, buy any old drives, and slap ‘em into any old RAID configuration, then you’re locked into bad performance. Six months down the road when you realize the application isn’t really hammering TempDB, and that the real speed bottleneck is one particular set of tables in a user database, you’re going to have a rough time reconfiguring all this. Worst case scenario, we’re talking about backing up all of the databases, reformatting everything, and restoring it all from backup. The bigger your databases get, the more painful this becomes.
In-Drive Storage Tiering
Whenever there’s a pain, vendors step in and offer to
take your money help. One of the early successful hybrids was Seagate’s Momentus XT – a 2.5″ laptop drive that partnered 4GB of SSD memory with a 500GB hard drive in one package. The drive learned which data you accessed the most frequently, then cached that data on the SSD. The hybrid drive sped up routine actions like reboots and application launches.
The newest contender in the market, the OCZ RevoDrive Hybrid, is a PCI Express drive like Fusion-IO, but it combines a 100GB SSD with a 1TB 2.5″ laptop drive. These aren’t shipping yet, but the press release holds an interesting nugget:
“Advanced caching algorithms learn user behavior and adapt storage policies to ensure optimal performance for each individual user, maximizing productivity for the most demanded programs and applications.”
The word caching implies that like the Momentus XT, the RevoDrive Hybrid is writing everything to the magnetic drive, but just caching frequently accessed data on the SSD for faster reads. However, further down in the release, they brag:
“In addition, the drive not only eliminates the SATA bottleneck unleashing ground-breaking bandwidth up to 910MB/s, but also features up to 120,000 IOPS (4K random write) for high transactional workloads delivering true SSD-level performance.”
The phrase random write implies that writes will hit the SSD first, and then later be pushed down to the hard drive. You can’t get 120,000 IOPs of 4K random writes on a 2.5″ magnetic drive. This also means that if you tried to continuously do 4k random writes, you might be able to fill the available space on the 100GB SSD, and then performance would slow down as the RevoDrive was forced to migrate data down to the hard drive.
I’m not suggesting you use either of these solutions on a production SQL Server, but I’m showing them to introduce you to the concept of tiered storage. In a single drive, some of your data might reside on fast solid state memory, and the rest would live on slower, more capacious magnetic Frisbees.
SQL Server Thinks Drives is Drives
The good news is that your database server doesn’t need to be configured for anything in order to use this storage. In our data warehouse example, we could use several of these RevoDrive Hybrids to store our sales table, which has hundreds of millions of rows of history. It documents our company’s history going back for the last ten years, and our executives have told us we’re not allowed to archive any data. They want to be able to run reports to compare this Christmas season’s sales against the last ten Christmases.
Before automated storage tiering, this type of table was painful for DBAs. How do we decide which data to put on which drives? If we only keep the most recent year of data on fast storage and the rest on slow storage, then our users are going to be screaming during every holiday season. They’ll all try to run reports comparing this year against the last ten years, and the reports will run dead slow. We scream, “WE CAN’T STORE ALL THAT DATA ON FAST DRIVES!” but actually…we can.
With automated storage tiering, the storage gear constantly watches which parts of the drive are being accessed, and moves them to faster/pricier storage. It also watches which parts aren’t being frequently accessed, and moves them to slower/cheaper storage. Depending on your environment, this tiering management might be done in different places:
- Inside each drive, like in the Momentus XT and RevoDrive Hybrid
- Inside the server’s storage controllers, like in the FusionIO ioCache or Adaptec maxCache
- Inside the SAN storage processors, like Dell Compellent’s Automated Tiered Storage or EMC’s Automated Storage Tiering
- Inside your virtualization hypervisor, like VMware vSphere 5′s new Storage DRS
DBAs need to know about this technology because it can help you avoid complicated setups like table partitioning or guessing what needs a fast TempDB versus what needs fast transaction logs. When you know how tiered storage works and when you know how to test it, you’ll be able to adapt to changing performance needs faster.