Tag Archive: sqluniversity

SQL University: Scary Storage Scenarios

Merry Christmas, students! I very much appreciate that so many of you showed up in your bathrobes, because now I don’t feel so guilty teaching this class in mine. My Christmas gift to you is a shiny, new blog post. Try to contain your excitement.

This week, we’ve covered the basics of what SQL Server needs to store, and places we can store it. Today we’ll discuss what happens when storage goes bad – and it definitely goes bad.

Drives Go Bad When You’re Not Looking

After you’ve written data to storage, you probably assume it just stays there, staying exactly the same, waiting to be read.

Death Spiral

Death Spiral

If only that were true.

When drives begin their death spiral down the toilet bowl, sometimes it’s a long journey. At first, just a few blocks might go bad – but we won’t know about it if we’re not actively reading those blocks. Right now, in your servers, a hard drive might be in the process of self-destruction, taking your precious data with it.

The best defense is a good offense: actively checking your database content. SQL Server’s DBCC commands (yes, that’s an RAS) will go out and check the contents of your database to make sure everything’s still hunky-dory. The bigger your databases are, the more time it takes to check them, so DBCC has options that will let you just check the most important parts. For example, if a non-clustered index goes bad, we’re not too concerned because we can just drop that index and recreate it again. All of the index’s data is stored in the underlying table anyway, right? We should be able to recover in a matter of minutes.

If only that were true.

Where there’s smoke, there’s fire. If a component in your storage subsystem has started going bad, your first signs might be damage to a non-clustered index. Your proactive DBCC checks will catch that, and you’ll repair it, but you’re only fixing the symptom – not the disease. Storage corruption rarely just happens once and disappears. When your DBCC checks return problems, you do need to fix the symptom, but you also need to jump in and do root cause analysis. If you’re using a SAN with storage shared between several servers, start by checking other servers on that same SAN to see if they’re also having corruption problems.

Want to learn more about DBCC? Paul Randal’s blog posts on DBCC will teach you more than you wanted to know. When he was at Microsoft, he wrote the code behind DBCC.

SAN Drives Change When You’re Not Looking

Storage Area Networks are engineered for extreme reliability; if any one component in a properly configured SAN goes down, the servers won’t go down with it. They’ll be able to reroute their traffic automatically between multiple networks, multiple controllers, and multiple drive enclosures. SAN admins can even do some routine maintenance during business hours without anyone even noticing, like recabling switches, flashing firmware versions, adding additional drives to RAID arrays, or changing cache settings. Well – in theory, no one notices, but the performance changes can impact your servers. Do you know if your SQL Server’s transaction log array is 10% slower than yesterday? 20%? 30%?

I See What You Did There

I See What You Did There

The most advanced environments have even started using storage virtualization, which works just like our server virtualization technologies. Server virtualization abstracts the hardware away from the operating system, letting us move servers around. Storage virtualization lets SAN admins move your data from one array to another, like from RAID 10 to RAID 5, from SSD to SATA, or even from an old SAN to a new SAN. The benefit is that they can decommission old hardware without taking huge outages to rebuild servers from scratch. The drawback is that they can do this without telling you, on the fly. You might have fantastic performance on Monday, abysmal performance on Tuesday while they move your data, and then middling performance on Wednesday.

The solution to all of this? Constant performance monitoring. Learn how to use Perfmon to baseline your storage performance, and check your storage metrics regularly. Even if you don’t have the time to use Perfmon, you can start by monitoring your full backup times. If they jump by 20% overnight, there’s a good chance your SAN administrator just pulled a fast one – or rather, a slow one – on you. Talk to your storage folks to find out why performance is suddenly suffering, and use your Perfmon and backup times to prove it.

Virtualization Makes It Harder to Look

Virtualization abstracts your hardware away from the operating system. It’s much harder for you to know whether your CPU performance issues are caused by:

  • Your virtual server’s load
  • The host’s hardware capabilities
  • Load from other virtual servers

Or some combination of the above. Storage performance is even tougher to troubleshoot and measure, because the bottlenecks might be:

  • Your virtual server’s needs
  • The host hardware’s connection to the SAN (iSCSI, FC speeds)
  • The host hardware’s multipathing configuration (active/active, round robin, Russian Roulette, etc)
  • Other virtual servers sharing the same SAN arrays as you
  • Other virtual servers sharing the same connection to the SAN arrays as you (competition for paths)
  • The storage controllers and arrays

I don’t have a simple answer to solve your virtualization storage performance problems, but I do have one important recommendation: get good at performance monitoring before you start virtualization projects. If you can’t accurately identify the performance bottlenecks of non-virtual machines, you’re not going to get any better when they’re virtualized.

Summing Up the Series: Storin’ Ain’t Easy

SQL Server DBAs think they hold a patent on the answer, “It depends,” but storage administrators were using that phrase before DBAs were even a glimmer in…wait, let’s draw the line here before I start talking about your mothers. I could go on and on for dozens of pages about storage and how it affects SQL Server.

sql-server-2008-internals-and-troubleshooting-bookOh, wait – I did!

Want to learn more about SQL Server, raid levels, SAN storage, Windows storage configuration, and more? Check out Professional SQL Server 2008 Internals and Troubleshooting. My 40-page chapter on storage goes into more detail, but frankly, the other chapters are even better! There’s in-depth details on memory, processors, tracing, and more, and it’s got a ton of information you won’t find in other SQL Server books – especially the storage, latching, and Extended Events chapters.

It’s not for beginners – it’s targeted at senior DBAs, and people who want to become senior DBAs. If you liked this series, I know you’ll like the book.

Happy holidays!

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL University: Where We Store Stuff

Welcome back, class.  In Monday’s session, we discussed the basics of SQL Server storage – things like the data file, log file, and tempdb.  We talked about why we might need to separate those things from each other in order to get more performance.  In today’s lecture, I’ll be covering where we store this stuff and why it matters.

In The Beginning, God Made Magnetic Hard Drives

You kids are living in a magical time; you’re old enough that you’ll remember when data was stored on spinning metal platters, yet young enough that you’ll watch the transition to stashing it in memory chips.  (Some of us are even old enough to remember computers that booted off floppy drives, but I digress.)

Hard Drive Platters

Hard Drive Platters

If you pop the hood on a 3.5″ magnetic hard drive, you’ll see something like the picture shown here.  The round silver platter is like a vinyl record, and the metal pointer-looking thing is like a the record needle.  Wait, what?  You don’t know what a phonograph is?  Well, I suppose that’s what I get for trying to use 19th century technology as a simile in the 21st century.

The pointer-looking thing (known as the “head”) changes the magnetic state of small areas of the drive.  If the area is charged one way, it represents a zero, and if it’s charged the other way, it represents a one.  As the platter spins around, the head moves over different areas of the drive where different data is stored.  The head can also move left or right to access closer in to the center of the platter, or further out toward the edges.  Over time, hard drive technology has improved by making the platter spin faster (thereby covering more data in less time) and packing more magnetic on/off areas on each platter (thereby letting us store more stuff in the same physical space.)

There’s a few problems with these drives.  First off, there’s a lot of moving going on; the platter spins at speeds of up to 15,000 RPM.  To put that in perspective, it’s faster than the engine spins in your car.  If any dust gets into the drive, or if a part comes loose inside, things will crash around inside the drive and tear it apart.  We might not find out about this damage until it’s too late – our data will get damaged as more parts of the platter start to fail, but unless we’re constantly re-reading everything on the drive, we won’t know about it.  The drive’s moving parts themselves can also wear out, such as the motor that spins the platter or the motor that moves the head around.  In order to fix those parts, we would have to crack open the drive, which would introduce dust and cause a failure.

While these speeds are fast in the mechanical world, they’re downright tortoise-like compared to the rest of our computer.  In the milliseconds that it takes to move the head from one point to another on the drive, our computer’s processors sit around waiting for what feels like eons.  This is why computers use memory to cache things that are also stored on the drives – it’s dramatically faster for us to read this data out of memory than it is to fetch it from spinning metal platters.

Want to learn more about relative speeds inside computers?  Peruse the Build-a-PC Bandwidth Chart, which compares bus speeds of processors, memory, networks, and storage.

Two Heads Are Better Than One

Redundant Array of Inexpensive Phonographs makes sure the beats go on and on to the break of dawn

Redundant Array of Inexpensive Phonographs ensures the beats go on and on to the break of dawn

To mitigate the failure and speed problems of these drives, sysadmins combined multiple hard drives together into Redundant Arrays of Inexpensive Drives (RAID).  There are several ways to configure groups of drives, and the simplest method is called RAID 1.  It uses two hard drives that always contain exactly the same set of data.  Whenever the computer writes data to one drive, it also writes that same data to the other drive.  The duplicate-write process can either be managed in hardware or at the operating system level.  When one of the drives fails, the other drive is unaffected, so our data is still available and our server stays online.

When both drives are working fine, then we can do our reads from either drive.  If we need to read a 10 megabyte file from storage, we can read 5 megabytes from one drive and 5 megabytes from the other drive, which could make our reads twice as fast.  If two drives still don’t give us the performance we need, there are other RAID levels like RAID 5 and RAID 10 that can handle more drives for more speed.  (The level names don’t have anything to do with the number of drives involved.)

Want to learn more about RAID levels? The Wikipedia articles on RAID and on standard RAID levels are very well-written.

The more performance we want, the more hard drives we have to use – but most servers only come with a few hard drive bays.  The solution is to move to a Storage Area Network (SAN), a system with hundreds of hard drives, a very smart computer to track what they’re doing, and a dedicated network to connect the servers to their hard drives.  SANs are big (typically fridge-sized or bigger), expensive (at least $100k, but often $1m or more), and they can be very fast.  Can be, but may not be – more on that in a minute.

Even dozens or hundreds of drives in a RAID configuration aren’t always fast enough, and that’s where the newest drive technology comes in.

Solid State Drives (SSDs)

USB thumb drives and memory cards for cameras represent a new kind of storage that uses transistors instead of magnetic charges to store data.  This flash memory uses an electric charge to change the state of a transistor much like our magnetic hard drives used power to flip magnetic bits, and the on/off state of each bit of flash memory represents 0s and 1s.  Just as magnetic hard drives don’t need power in order to maintain their state, flash drives don’t need electricity to maintain the state of the transistors.  This means that unlike your computer’s RAM memory that gets erased whenever you power off, the flash memory saves its state over time.  Eventually, as this technology got faster and cheaper, manufacturers could build entire hard drives out of these chips.

Early generations of solid state drives performed worse than conventional magnetic hard drives, but subsequent generations have dramatically improved.  SSDs can read any stored data immediately without waiting for platters to spin or heads to move, so they can really shine at random access of data.  Compare these two charts from Tom’s Hardware covering average read speeds (higher is better):

So we’ve established that they’re faster (in part because they don’t have moving parts), but just because they don’t have moving parts doesn’t mean they don’t wear out.  Each bit of storage can only be electronically flipped so many times before it breaks.  To account for that, SSDs try to balance out which parts of the drive are written to, and will move your data around in order to wear it out evenly.  “Hot” areas that have been written many times may be moved to another part of the drive, thereby leveling out the wear.  Drive manufacturers also build in extra blank space on the drive to give it extra space to take over when other areas are worn out, thereby extending the drive’s life.

Want to learn more about wear leveling?  Anandtech’s article on choosing the best SSD covers the subject in exhaustive detail and explains that the OS and drivers also matter.

We can protect ourselves from drive failure by using SSDs in RAID configurations the same way we did with magnetic hard drives.  In fact, SSDs may be simple drop-in replacements for some servers, since SSDs are available with SATA interfaces just like your desktop’s hard drive.  However, when we start talking RAID, we usually talk about connecting to our hard drives with something faster than SATA.  As we start to throw more and more drives at the problem, whether they’re magnetic or solid state, we run into a new problem.  We have to make sure that the connection between our drives and our computer is fast enough to handle all of the drives at once.

Homework Time: Studying Multipathing & Shared Drives

For those of you who are using a SAN, thinking of using a SAN, or hoping to get a job somewhere with a SAN, there’s a lot to learn.  Thankfully for me, I’ve already written a ton of it, and here comes your homework:

Stop whining – just because it’s a holiday week doesn’t mean you can stop learning.  After consuming all that information, you should have a basic grasp of what makes storage fast or slow, and how to find out your own storage performance.

Putting It All Together: How Fast Is My Storage?

Not So Fast, It Turns Out

Not So Fast, It Turns Out

Your storage speed is a combination of the drive type, the RAID level, and the connection between the servers and the drives.  (It depends on even more than that, like whether the drives are shared or dedicated, but let’s focus on the basics for now.)

Drive types, from fastest to slowest:

  • Good solid state drives
  • 15k magnetic drives (meaning they’re spinning at 15,000 RPM)
  • 7200rpm drives
  • 5400rpm drives
  • Bad solid state drives

There’s some variations in here – for example, there are really good 5400rpm drives that outperform bad 7200rpm drives, and so on, but again, let’s focus on the big picture.

RAID levels, from fastest to slowest:

  • RAID 10 or RAID 5 (depending on whether you’re reading or writing)
  • RAID 1 (a pair of drives in a mirror)
  • No RAID at all (a single drive, and yes, people do still do this – especially with solid state drives like Fusion-io)

You in the back – sit back down.  Yes, I understand that there are dramatic performance differences between RAID 5 and RAID 10 in many scenarios, but this is a beginner’s class.  If you want to teach, get your own course.

Connection methods, from fastest to slowest:

  • Properly configured SANs with true active/active multipathing
  • Properly configured SANs without active/active multipathing
  • Locally attached storage
  • Poorly configured SANs

The bean counters in the class will notice that these variables are also listed in another order: from most expensive to cheapest.  Good storage ain’t cheap.

You, dear student, probably don’t have the responsibility of architecting your entire SQL Server storage solution from scratch.  You’re probably stuck with a few existing options for your servers, like a couple of arrays from your SAN admin or asking for different RAID configurations.  Your simple answer is to use SQLIO to benchmark your storage performance and find out how fast each of your options are.

If you have a fixed number of drives, but you have the ability to carve them up into different groups, work with your SAN administrator to try different combinations.  For example, if you had 20 hard drives, you could measure the read/write performance of each of the following:

  • 14 drives in a RAID 10 plus 6 drives in a RAID 10
  • 14 drives in a RAID 5 plus 6 drives in a RAID 10
  • 2 sets of 7 drives each in a RAID 5, plus 6 drives in a RAID 10

Why would you try these different combinations?  Because your next decision – and this is where “it depends” becomes an art form – is where to place your SQL Server files.

Where SQL Server Meets Storage

Assuming you have multiple storage options with different performance profiles available to a given SQL Server, the simplest guideline is:

  • Transaction logs go on the fastest writing storage (typically your fastest drives in a RAID 10 array)
  • Data files go on the fastest reading storage

The problem is that both of those lines say “fastest.” When DBAs approach SAN admins in a dark alley with a storage requirement, we always say the same thing: “Give us the fastest storage you have.  All of it.”  Since SAN admins aren’t prone to emptying their wallets for us, we have to learn how to compromise.  We need to identify as many of our stored things as possible that don’t need speed, and then peel those out into separate requirements.  In the case of a multi-terabyte data warehouse, those requirements might look like this:

  • Transaction log files – extremely fast writes, needs to be on a RAID 10 array.
  • Primary data files – tables with frequently changing dimensions, last 12 months of sales data.  Needs to be on a RAID 10 array.
  • Archive data files – historical sales data >1 year old that makes up 75% of our data volume, but is rarely changed.  Can go on a slower RAID 5 array.

Then inside each of those needs, the DBA must decide whether to use a single file on a single array, or multiple files on multiple arrays.  That decision is determined by the number of cores involved, the types of queries, and even the multipathing configuration on the database server.  For example, if a server is connected to its SAN with several 1 gig iSCSI links that don’t have true active/active multipathing, the DBA and SAN admin may determine that they need multiple data files on multiple volumes just to get more throughput – regardless of CPU count or query type.

This is a lot of work to do on a server-by-server, database-by-database basis.  That’s why in the vast majority of cases, it’s best to start simple: just separate the data and log files onto the fastest drives possible, and then over time, monitor your storage performance.  I recommend using Perfmon to periodically check your Sec/Reads and Sec/Write numbers as discussed in my Perfmon article.  When your queries are experiencing IO bottlenecks and you trace the problem back to slow Sec/Reads and Sec/Writes metrics, then it’s time to tune storage.  Beef up that particular storage array, separate out what’s being stored on that array onto different arrays, or figure out how to make that array go faster (like with multipathing).

Hopefully I’ve started to open your eyes about the ways to tweak your storage options and what you put on that storage.  The options are staggering, and the more you play with it, the more you can eke out every last bit of performance.

Our final post on Friday will cover some scary storage scenarios.  (Yes, it’ll go live on Christmas Day, but only because I’m scheduling it in advance.)

Continue to Part 3: Scary Storage Scenarios

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL University: The Basics of Storage

A while back, when I was working as a DBA (before I became a consultant), my Storage Area Networking (SAN) admin decided to leave the company.  I volunteered to take on his job duties because I wanted the money wanted to further our educations.  We learned a lot through some very hard lessons, and I consider myself very lucky to have had that opportunity.  It’s really hard to get your foot in the door as a SAN admin, managing millions of dollars in hardware holding priceless data, and it’s just as hard to get quality SAN training.

This week, I’m going to try to bring you up to speed on how SQL Server and storage interact with each other.  This isn’t meant to be a doctorate-level course in SANs, but rather, a simple series that will teach you what you need to know in order to do your job better.   If you like what you see here, you might want to pick up Professional SQL Server 2008 Internals and Troubleshooting.  The book is targeted at DBAs who need to know more about what’s happening in and around SQL Server.  I wrote the storage chapter.

Click to Look Inside

Click to Look Inside

Today’s session will be about the very basics of storage.  I’m going to simplify things in order to help you see the big picture.  The experts in the crowd will notice that I’m leaving out things like snapshot isolation, multiple data files, and parallelism, but hang in there – we’ll get to some of those issues before the end of the week.  Alright, let’s get started!

What SQL Server Stores

At their simplest, SQL Server databases consist of two things: a data file and a log file.  The data file holds our tables, stored procedures, views, and so forth.  The log file holds a running play-by-play analysis of what SQL Server is trying to do at any given time.

When one user issues a SELECT statement against the database to read data, the SQL Server engine’s response is fairly straightforward.  It reads the data it needs from the data file.

When users INSERT, UPDATE, or DELETE data, however, things get a little more complicated.  SQL Server needs to make sure that two people can both access the data simultaneously and safely.  It also needs to guarantee that if the power plug gets pulled in the middle of an UPDATE statement, our data will be safe when the server restarts.  The engine has to be able to roll back (or roll forward) transactions that had not yet completed.

Want to learn more about this?  Drop ACID.  The concept of Atomicity, Consistency, Isolation, and Durability is the foundation for heavy-duty relational databases.

This is where the log file comes in.  In simple terms, when a user inserts a record, the SQL Server engine:

  1. Writes an entry to the log file recapping what it’s about to do
  2. Writes the new record into the data file
  3. Writes an entry into the log file saying it’s done mucking around in the data file

As often happens, let’s assume that a drunken sysadmin held down the power switch on our database server at the exact moment our SQL Server was busy performing step #2.  After he hit it again to power the server back on, the SQL Server engine would start up and review the log.  It would notice the logged entry from step #1, notice that there wasn’t a matching log entry for #3, and then go poke around in the data file to see whether or not the change had been made.  I don’t want to advocate drinking in the datacenter (if you see your sysadmin drinking, make sure to take his rack keys away), but I do have to say that SQL Server is amazingly resilient.  I’ve seen servers lose power or lose connections to their storage in the middle of very heavy activity, and yet they’ve powered back up fine.  (If your server gets into a situation like this, though, there are precautions you’ll want to take, and we’ll address those later in the week.)

A Fill Factor of .10 or Above is Unsafe

A Fill Factor of .10 or Above is Unsafe

The drawback of this safe system is a that it’s constantly writing to the log file.  Looking at our three-step example, two of the three writes involve the log file.  The more often we do DUIs (Deletes/Updates/Inserts), the faster our log file drives need to react. When we’re configuring hardware for SQL Server, we want to consider two things:

  • Put the data and log files on separate drives
  • Put the log files on the fastest-writing drives we can afford

In a perfect world, we’d put everything on the fastest drives we can afford, but in this perfect world I would be eating bacon instead of teaching you bozos, and bacon would prevent heart attacks.  Back here in the real world, though, we have to take Lipitor and cut back on our budgets.  We can often get away with a little less speed on the data drives if we can use the SQL Server’s memory to cache queried data and avoid hitting the drives for every read, but we can’t use that memory to cache log file writes.  Furthermore, the log file isn’t the only thing that wants its own storage.

Son, Your Server is Writing Data Your Memory Can’t Cache

SQL Server does a pretty good job of using memory to avoid hitting storage.  Unfortunately, in addition to log files, two other parts of SQL Server also like to do a whole lot of writing at the exact same time we’re accessing the data file: TempDB and indexes.  When these things need to write data, they need to write it NOW, and we can’t throw memory at the problem to make it faster.

TempDB is a system database that ends up a little like your kitchen junk drawer; it’s got all kinds of random stuff in there that isn’t really related to each other.  Users write queries that create temporary tables in here, SQL Server uses it for big sorting operations, and the engine even uses it for some special transaction modes.  In the case of temp tables, for example, users will write code like:

CREATE TABLE #MyReport (PilotID INT, DogfightsWon INT, DogfightsLost INT, TowersBuzzed INT)
INSERT INTO #MyReport (PilotID, DogfightsWon, DogfightsLost, TowersBuzzed)
  SELECT p.PilotID, SUM(p.DogfightsWon), SUM(p.DogfightsLost), SUM(p.TowersBuzzed)
  FROM dbo.Pilots p
  INNER JOIN dbo.Instructors i ON ....

The important thing to note here is that SQL Server will be doing two things simultaneously:

  • Reading from the data file
  • Writing to the TempDB data file

This is a fairly typical use for TempDB, so when managing a SQL Server with TempDB-intensive queries, it makes sense to isolate TempDB onto its own separate hard drives.

Indexes present a similar problem.  If we’ve got a table with five indexes, then every time we add a record to our table, SQL Server also has to add five other records – one to each of our indexes.  That’s a lot of writing, but rather than adapting our storage design, I recommend that DBAs control their index designs.  Use this SQLServerPedia query to find unused indexes, because dropping them will instantly result in faster DUIs – and who doesn’t want that?  Dropping indexes is cheaper and easier than micromanaging multiple arrays to handle data and indexes, because indexes aren’t as easy to isolate as the log files or TempDB files.

So we’ve identified several things that might need separate storage due to their constant simultaneous access:

  • Data files
  • Log files
  • TempDB
  • Indexes (plus a hidden one, full text indexes)

In our next class this week, we’ll look at our storage options – the places we can put these files, and they may perform differently.

Continue to Part 2: Where We Store Stuff

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts