Time to take a look back at the blog for the year to see what’s popular, and what I liked the most.
My Most Popular Posts in 2009
People just seem to love top 10 lists and Twitter information:
- Top 10 Interview Questions to Ask Senior DBAs
- Top 10 Reasons I’m Not Following You on Twitter
- Top 10 Reasons I *AM* Following You on Twitter
- Top 10 Questions to Ask During an Interview (for employees)
- Adding More Info to Your Twitter Profile
- SQL Server 2010 Features Leaked!
- Free SQL Server DBA Training Videos
- Top 10 Developer Interview Questions About SQL Server
- Data Mining the StackOverflow Database
- Twitter Client Comparison Video
My Top 10 Favorite Posts from 2009
In no particular order, and I’m amused that they don’t overlap at all with the top 10 by pageviews:
- Stunt Car Drivers, Eggs Benedict, and You
- Are You Being Treated Fairly?
- How to Get a Junior DBA Job – 3-part series
- Meet the #SQLPASS Hypervisors – because I was just bowled over to get so many great volunteers.
- The Ozar Family Tradition of Performance Monitoring
- Meet PASS Board Candidate Matt Morollo – not because it was a particularly good post, but because it triggered such active discussion.
- What Community Means to Me
- SQL Server & Virtualization – 3-part series
- How to Deliver a Killer Tech Presentation
- We Finished the Chicago-Mac Race (Last)
The moral of the story – the things that you love to write probably aren’t the things that bring visitors in the door. I feel a little guilty when I look at the most popular posts because as I was writing almost all of those, I knew full well what I was doing. I was making a deal with the devil, writing content designed to appeal to the masses. I don’t feel too guilty, though, because I still write most of my stuff for me, not for the masses.
My Dedication for the Book
This year I crossed off a big goal by getting published. In case nobody actually BUYS the book, I’m copying in the dedication here because there’s a lot of people I’d like to thank.
Thanks to Jimmy May for hooking me up with such a great team of authors. My involvement with this book wouldn’t have happened without his encouragement and trust. Similarly, I’d like to thank the authors for giving me an opportunity to work with them. I’m humbled to have my names next to such great technical minds. Thanks also to Christian Hasker, Rony Lerner, Don Duncan, and Joe Sullivan; I have consistently hit the lottery when it comes to managers, and I couldn’t have picked a better string of guys to mentor me and grow my career. If anybody doesn’t succeed under any of them, it’s their own fault.
To my coworkers Andy Grant, Brett Epps, Heather Eichman, and everybody else at Quest Software, thanks for making my work so much fun. Thanks to everybody on Twitter for laughing with me while I keep punching harder and faster. My day is infinitely more enjoyable thanks to folks like @SQLRockstar, @SQLChicken, @SQLAgentMan, @KBrianKelley, @Wendy_Dance, @GFritchey, @MrDenny, @StatisticsIO, @MikeHillwig, @Peschkaj, @SQLSarg, @SQLCraftsman, and many others.
Finally, I’d like to thank Dad, Mom, and Caryl for my dashing good looks and brilliant wit. I just wish you’d given me some humility so that I could be perfect.
That about sums up the year. I can’t thank everyone enough – you all made this year rock.
My Goals for 2010
I know, I know. As evidenced by my most popular blog posts in 2009, my blog is all over the place. If I was you, dear reader, I’m not so sure that I’d want to go to a blog that covers so many wacko subjects. I’ve put a lot of thought into that, and it’s time to divide and conquer. In the first quarter of 2010, I’m launching a couple of side projects that will give me different platforms to talk about some of the other stuff that interests me. BrentOzar.com will regain its focus on SQL Server and geeky stuff, like my race for the Microsoft Certified Master certification, and I’ll be building two new sites to cover other topics.
Social media & marketing interest me a lot lately. I’m in an odd job position – I’m technically technical, but I like dabbling in marketing. I’ve been blogging all decade long, been on Twitter for 3 years now, and I’ve got a lot of other irons in the fire. There’s a lot of people doing a really crappy job marketing their companies with blogs, Twitter, and other social tools, and I think I might be able to convey a few lessons in a fun manner. I’ve been thinking about writing an ebook about social marketing and building a site around that, and 2010 is the year I’ll make it happen.
I also think IT work is about to make a shift. Our grandparents worked their whole lives for the same company, but our parents had to reinvent themselves at least once due to layoffs and industry changes. We’re going to be the generation who has to constantly reinvent ourselves, refreshing our skillsets and updating our goals, and I think our relationship with our employers is going to change. I know more and more IT professionals who are turning to consulting earlier in their careers, and more employers who are turning to consultants faster than they’d hire staff. There’s a growing market for micro-consulting: very short engagements with highly skilled professionals who don’t do the job for you, but they walk you through it. I’ve been kicking around ideas with a few friends on how to position ourselves for that shift, and I’ll launch that company this year as well.
I’m heading out on a weeklong cruise with Mom – her first cruise. Thanks so much for spending time with me online and making my year so much fun.
Have a wonderful New Year, and I’ll talk to you again in a week!
I’m proud to announce another three new bloggers at SQLServerPedia, including our first SQL couple!
Jonathan is a fellow Houstonian, and he’s a Senior Technology Consultant with Insource Technology. He’s got over a decade of IT experience doing things as diverse as 3D animation to SAN & virtualization administration. He’s the acting DBA for EMS Group. He’s an adrenalin junkie who likes SCUBA, skydiving, and rugby.
Some of his blog posts include:
- Execution Plan Challenge – Jonathan posted a query that was giving him trouble, and included a link to the execution plan. Here’s what I love about blogging: read through the comments, and you’ll see that Pat Wright and Grant Fritchey both took time out of their day to help. Blogging isn’t always about helping others – sometimes it’s about getting help yourself too!
- Using Twitter for Corporate Disaster Recovery – what a great idea! When your corporate email server is down, a private Twitter account posting company updates could be a big help.
- User and Schema Association after Restoring – when you restore a database from one SQL Server instance to another, you can run into some odd problems with logins. Another problem you might run into is missing SQL users or users with the wrong SIDs, and you’ll get more info on that in a minute.
Tim & Lori Edwards (Blog – His Twitter – Her Twitter)
His and hers Twitter, how cute! That’s right, ladies and gentlemen, we have our first SQL Server couple at SQLServerPedia. Tim and Lori are both DBAs in Tucson, Arizona. I work with Tim in the PASS Virtualization Virtual Chapter, and Lori volunteers with the Auditing & Compliance Virtual Chapter.
I had the privilege of meeting Lori at the PASS Summit this year, and she’s a hoot. I look forward to meeting Tim too when I head out to Tucson in a couple of months. I wanna learn more about the 1984 Nissan 300ZX he’s restoring – I used to own a 280ZX myself, and I still miss that car.
Some of their recent posts include:
- Transferring logins and passwords – if you think there’s nothing left to blog about, read this post. It’s a clear, straightforward walkthrough of a problem we’ve all encountered. I wish this post would have been around the first time I tried to set up a disaster recovery server.
- SQL Server Disaster Recovery Options – covers the pros and cons of clustering, database mirroring, log shipping, replication, and DoubleTake.
- The Edwards’ are some people in your neighborhood – Lori sums up the Twitter experience by saying, “I’ve felt more a part of the SQL Server community in the last six months than the previous 5 1/2 years of working as a DBA.”
At SQLServerPedia, we’re bringing out more cool features for syndication in 2010. I look forward to working with all of you, and if you’d like to join up, check out how to syndicate your blog with SQLServerPedia. Happy New Year!
My Amazon preorder just arrived. The ball can now officially drop and 2010 can begin. Hooah!
Also of note – the Amazon sales ranking. It’s updated hourly, so it’s been all over the place the last couple of days, but I’m giddy every time I look at it. We’ve been in the top 5-50 books in all the major database categories each time I’ve looked.
Now, if you’ll excuse me, I’m going to go curl up with a glass of wine and read through everybody else’s chapters. I’ve got a lot of learning to do….
When LOST finishes this year, we’re going to need some new entertainment. Here’s the shows that I’m pitching to the major networks:
Dancing With The SELECT *’s
Every week, celebrity developers are paired with SQL Server MVPs to manage SQL Servers. Tune in to find out if these coders can make their data dance, or if their efforts will fall flat.
Dirty SQL Agent Jobs
Every week, host Mike Row (no, not Rowe) searches the country for those thankless scheduled tasks that keep your servers in good shape. He’ll interview the ones in the trenches – maintenance plans, index defrag scripts, and mirroring alerts.
The Deadliest Try/Catch
It’s not just the Bering Sea – it’s the Bering C#. Ever seen those computers on the Captain’s dashboard that track where the crab pots are laid out? This is where the real work is done – inside the database. Forget the diesel engines and check out the database engines as we join the ship’s mechanic: Adam Machanic.
When crime impacts the government database community, the SQL Server Investigative Service team jumps into action. They might be disarmed forces, but they have keyboard hotkeys programmed to fire off KILL commands.
Law and ORDER BY
The police procedural genre has spawned countless spinoffs, and now it’s time to CREATE another one: the stored procedural show. Turn your volume up, set nocount on, and join us for this brutal thriller that will solve a new execution plan every week.
Ever wonder where the “master” database got its name? That’s right – Robin Masters. Turn the clock back thirty years and join us in Hawaii to witness the birth of Business Intelligence. Is Brad McGehee really the butler, or is he the man running the whole show? You’ll have to watch to find out.
For decades, the cult British classic has tried and failed to make its way to the American shores, but this version is surely destined for success. If this one doesn’t succeed, we’ll reboot it as Doctor SP_Who2.
It’s Your Turn! Pitch Us Your Show
Think you can top these? Give it your best shot in the comments before Monday, January 4th. My personal favorite will win an autographed copy of my new book, Professional SQL Server 2008 Internals and Troubleshooting. Good luck!
Merry Christmas, students! I very much appreciate that so many of you showed up in your bathrobes, because now I don’t feel so guilty teaching this class in mine. My Christmas gift to you is a shiny, new blog post. Try to contain your excitement.
Drives Go Bad When You’re Not Looking
After you’ve written data to storage, you probably assume it just stays there, staying exactly the same, waiting to be read.
If only that were true.
When drives begin their death spiral down the toilet bowl, sometimes it’s a long journey. At first, just a few blocks might go bad – but we won’t know about it if we’re not actively reading those blocks. Right now, in your servers, a hard drive might be in the process of self-destruction, taking your precious data with it.
The best defense is a good offense: actively checking your database content. SQL Server’s DBCC commands (yes, that’s an RAS) will go out and check the contents of your database to make sure everything’s still hunky-dory. The bigger your databases are, the more time it takes to check them, so DBCC has options that will let you just check the most important parts. For example, if a non-clustered index goes bad, we’re not too concerned because we can just drop that index and recreate it again. All of the index’s data is stored in the underlying table anyway, right? We should be able to recover in a matter of minutes.
If only that were true.
Where there’s smoke, there’s fire. If a component in your storage subsystem has started going bad, your first signs might be damage to a non-clustered index. Your proactive DBCC checks will catch that, and you’ll repair it, but you’re only fixing the symptom – not the disease. Storage corruption rarely just happens once and disappears. When your DBCC checks return problems, you do need to fix the symptom, but you also need to jump in and do root cause analysis. If you’re using a SAN with storage shared between several servers, start by checking other servers on that same SAN to see if they’re also having corruption problems.
Want to learn more about DBCC? Paul Randal’s blog posts on DBCC will teach you more than you wanted to know. When he was at Microsoft, he wrote the code behind DBCC.
SAN Drives Change When You’re Not Looking
Storage Area Networks are engineered for extreme reliability; if any one component in a properly configured SAN goes down, the servers won’t go down with it. They’ll be able to reroute their traffic automatically between multiple networks, multiple controllers, and multiple drive enclosures. SAN admins can even do some routine maintenance during business hours without anyone even noticing, like recabling switches, flashing firmware versions, adding additional drives to RAID arrays, or changing cache settings. Well – in theory, no one notices, but the performance changes can impact your servers. Do you know if your SQL Server’s transaction log array is 10% slower than yesterday? 20%? 30%?
The most advanced environments have even started using storage virtualization, which works just like our server virtualization technologies. Server virtualization abstracts the hardware away from the operating system, letting us move servers around. Storage virtualization lets SAN admins move your data from one array to another, like from RAID 10 to RAID 5, from SSD to SATA, or even from an old SAN to a new SAN. The benefit is that they can decommission old hardware without taking huge outages to rebuild servers from scratch. The drawback is that they can do this without telling you, on the fly. You might have fantastic performance on Monday, abysmal performance on Tuesday while they move your data, and then middling performance on Wednesday.
The solution to all of this? Constant performance monitoring. Learn how to use Perfmon to baseline your storage performance, and check your storage metrics regularly. Even if you don’t have the time to use Perfmon, you can start by monitoring your full backup times. If they jump by 20% overnight, there’s a good chance your SAN administrator just pulled a fast one – or rather, a slow one – on you. Talk to your storage folks to find out why performance is suddenly suffering, and use your Perfmon and backup times to prove it.
Virtualization Makes It Harder to Look
Virtualization abstracts your hardware away from the operating system. It’s much harder for you to know whether your CPU performance issues are caused by:
- Your virtual server’s load
- The host’s hardware capabilities
- Load from other virtual servers
Or some combination of the above. Storage performance is even tougher to troubleshoot and measure, because the bottlenecks might be:
- Your virtual server’s needs
- The host hardware’s connection to the SAN (iSCSI, FC speeds)
- The host hardware’s multipathing configuration (active/active, round robin, Russian Roulette, etc)
- Other virtual servers sharing the same SAN arrays as you
- Other virtual servers sharing the same connection to the SAN arrays as you (competition for paths)
- The storage controllers and arrays
I don’t have a simple answer to solve your virtualization storage performance problems, but I do have one important recommendation: get good at performance monitoring before you start virtualization projects. If you can’t accurately identify the performance bottlenecks of non-virtual machines, you’re not going to get any better when they’re virtualized.
Summing Up the Series: Storin’ Ain’t Easy
SQL Server DBAs think they hold a patent on the answer, “It depends,” but storage administrators were using that phrase before DBAs were even a glimmer in…wait, let’s draw the line here before I start talking about your mothers. I could go on and on for dozens of pages about storage and how it affects SQL Server.
Want to learn more about SQL Server, raid levels, SAN storage, Windows storage configuration, and more? Check out Professional SQL Server 2008 Internals and Troubleshooting. My 40-page chapter on storage goes into more detail, but frankly, the other chapters are even better! There’s in-depth details on memory, processors, tracing, and more, and it’s got a ton of information you won’t find in other SQL Server books – especially the storage, latching, and Extended Events chapters.
It’s not for beginners – it’s targeted at senior DBAs, and people who want to become senior DBAs. If you liked this series, I know you’ll like the book.
Welcome back, class. In Monday’s session, we discussed the basics of SQL Server storage – things like the data file, log file, and tempdb. We talked about why we might need to separate those things from each other in order to get more performance. In today’s lecture, I’ll be covering where we store this stuff and why it matters.
In The Beginning, God Made Magnetic Hard Drives
You kids are living in a magical time; you’re old enough that you’ll remember when data was stored on spinning metal platters, yet young enough that you’ll watch the transition to stashing it in memory chips. (Some of us are even old enough to remember computers that booted off floppy drives, but I digress.)
If you pop the hood on a 3.5″ magnetic hard drive, you’ll see something like the picture shown here. The round silver platter is like a vinyl record, and the metal pointer-looking thing is like a the record needle. Wait, what? You don’t know what a phonograph is? Well, I suppose that’s what I get for trying to use 19th century technology as a simile in the 21st century.
The pointer-looking thing (known as the “head”) changes the magnetic state of small areas of the drive. If the area is charged one way, it represents a zero, and if it’s charged the other way, it represents a one. As the platter spins around, the head moves over different areas of the drive where different data is stored. The head can also move left or right to access closer in to the center of the platter, or further out toward the edges. Over time, hard drive technology has improved by making the platter spin faster (thereby covering more data in less time) and packing more magnetic on/off areas on each platter (thereby letting us store more stuff in the same physical space.)
There’s a few problems with these drives. First off, there’s a lot of moving going on; the platter spins at speeds of up to 15,000 RPM. To put that in perspective, it’s faster than the engine spins in your car. If any dust gets into the drive, or if a part comes loose inside, things will crash around inside the drive and tear it apart. We might not find out about this damage until it’s too late – our data will get damaged as more parts of the platter start to fail, but unless we’re constantly re-reading everything on the drive, we won’t know about it. The drive’s moving parts themselves can also wear out, such as the motor that spins the platter or the motor that moves the head around. In order to fix those parts, we would have to crack open the drive, which would introduce dust and cause a failure.
While these speeds are fast in the mechanical world, they’re downright tortoise-like compared to the rest of our computer. In the milliseconds that it takes to move the head from one point to another on the drive, our computer’s processors sit around waiting for what feels like eons. This is why computers use memory to cache things that are also stored on the drives – it’s dramatically faster for us to read this data out of memory than it is to fetch it from spinning metal platters.
Want to learn more about relative speeds inside computers? Peruse the Build-a-PC Bandwidth Chart, which compares bus speeds of processors, memory, networks, and storage.
Two Heads Are Better Than One
To mitigate the failure and speed problems of these drives, sysadmins combined multiple hard drives together into Redundant Arrays of Inexpensive Drives (RAID). There are several ways to configure groups of drives, and the simplest method is called RAID 1. It uses two hard drives that always contain exactly the same set of data. Whenever the computer writes data to one drive, it also writes that same data to the other drive. The duplicate-write process can either be managed in hardware or at the operating system level. When one of the drives fails, the other drive is unaffected, so our data is still available and our server stays online.
When both drives are working fine, then we can do our reads from either drive. If we need to read a 10 megabyte file from storage, we can read 5 megabytes from one drive and 5 megabytes from the other drive, which could make our reads twice as fast. If two drives still don’t give us the performance we need, there are other RAID levels like RAID 5 and RAID 10 that can handle more drives for more speed. (The level names don’t have anything to do with the number of drives involved.)
The more performance we want, the more hard drives we have to use – but most servers only come with a few hard drive bays. The solution is to move to a Storage Area Network (SAN), a system with hundreds of hard drives, a very smart computer to track what they’re doing, and a dedicated network to connect the servers to their hard drives. SANs are big (typically fridge-sized or bigger), expensive (at least $100k, but often $1m or more), and they can be very fast. Can be, but may not be – more on that in a minute.
Even dozens or hundreds of drives in a RAID configuration aren’t always fast enough, and that’s where the newest drive technology comes in.
Solid State Drives (SSDs)
USB thumb drives and memory cards for cameras represent a new kind of storage that uses transistors instead of magnetic charges to store data. This flash memory uses an electric charge to change the state of a transistor much like our magnetic hard drives used power to flip magnetic bits, and the on/off state of each bit of flash memory represents 0s and 1s. Just as magnetic hard drives don’t need power in order to maintain their state, flash drives don’t need electricity to maintain the state of the transistors. This means that unlike your computer’s RAM memory that gets erased whenever you power off, the flash memory saves its state over time. Eventually, as this technology got faster and cheaper, manufacturers could build entire hard drives out of these chips.
Early generations of solid state drives performed worse than conventional magnetic hard drives, but subsequent generations have dramatically improved. SSDs can read any stored data immediately without waiting for platters to spin or heads to move, so they can really shine at random access of data. Compare these two charts from Tom’s Hardware covering average read speeds (higher is better):
So we’ve established that they’re faster (in part because they don’t have moving parts), but just because they don’t have moving parts doesn’t mean they don’t wear out. Each bit of storage can only be electronically flipped so many times before it breaks. To account for that, SSDs try to balance out which parts of the drive are written to, and will move your data around in order to wear it out evenly. “Hot” areas that have been written many times may be moved to another part of the drive, thereby leveling out the wear. Drive manufacturers also build in extra blank space on the drive to give it extra space to take over when other areas are worn out, thereby extending the drive’s life.
Want to learn more about wear leveling? Anandtech’s article on choosing the best SSD covers the subject in exhaustive detail and explains that the OS and drivers also matter.
We can protect ourselves from drive failure by using SSDs in RAID configurations the same way we did with magnetic hard drives. In fact, SSDs may be simple drop-in replacements for some servers, since SSDs are available with SATA interfaces just like your desktop’s hard drive. However, when we start talking RAID, we usually talk about connecting to our hard drives with something faster than SATA. As we start to throw more and more drives at the problem, whether they’re magnetic or solid state, we run into a new problem. We have to make sure that the connection between our drives and our computer is fast enough to handle all of the drives at once.
Homework Time: Studying Multipathing & Shared Drives
For those of you who are using a SAN, thinking of using a SAN, or hoping to get a job somewhere with a SAN, there’s a lot to learn. Thankfully for me, I’ve already written a ton of it, and here comes your homework:
- SAN Multipathing Part 1: What Are Paths?
- SAN Multipathing Part 2: What Multipathing Does
- Free SAN 101 Video from the SSWUG Virtual Conference
- SQL Server on SAN: Shared or Dedicated Drives?
- Steel Cage Blogmatch: How to Configure a SAN
- When Should You Put Data and Logs on the Same Drive
Stop whining – just because it’s a holiday week doesn’t mean you can stop learning. After consuming all that information, you should have a basic grasp of what makes storage fast or slow, and how to find out your own storage performance.
Putting It All Together: How Fast Is My Storage?
Your storage speed is a combination of the drive type, the RAID level, and the connection between the servers and the drives. (It depends on even more than that, like whether the drives are shared or dedicated, but let’s focus on the basics for now.)
Drive types, from fastest to slowest:
- Good solid state drives
- 15k magnetic drives (meaning they’re spinning at 15,000 RPM)
- 7200rpm drives
- 5400rpm drives
- Bad solid state drives
There’s some variations in here – for example, there are really good 5400rpm drives that outperform bad 7200rpm drives, and so on, but again, let’s focus on the big picture.
RAID levels, from fastest to slowest:
- RAID 10 or RAID 5 (depending on whether you’re reading or writing)
- RAID 1 (a pair of drives in a mirror)
- No RAID at all (a single drive, and yes, people do still do this – especially with solid state drives like Fusion-io)
You in the back – sit back down. Yes, I understand that there are dramatic performance differences between RAID 5 and RAID 10 in many scenarios, but this is a beginner’s class. If you want to teach, get your own course.
Connection methods, from fastest to slowest:
- Properly configured SANs with true active/active multipathing
- Properly configured SANs without active/active multipathing
- Locally attached storage
- Poorly configured SANs
The bean counters in the class will notice that these variables are also listed in another order: from most expensive to cheapest. Good storage ain’t cheap.
You, dear student, probably don’t have the responsibility of architecting your entire SQL Server storage solution from scratch. You’re probably stuck with a few existing options for your servers, like a couple of arrays from your SAN admin or asking for different RAID configurations. Your simple answer is to use SQLIO to benchmark your storage performance and find out how fast each of your options are.
If you have a fixed number of drives, but you have the ability to carve them up into different groups, work with your SAN administrator to try different combinations. For example, if you had 20 hard drives, you could measure the read/write performance of each of the following:
- 14 drives in a RAID 10 plus 6 drives in a RAID 10
- 14 drives in a RAID 5 plus 6 drives in a RAID 10
- 2 sets of 7 drives each in a RAID 5, plus 6 drives in a RAID 10
Why would you try these different combinations? Because your next decision – and this is where “it depends” becomes an art form – is where to place your SQL Server files.
Where SQL Server Meets Storage
Assuming you have multiple storage options with different performance profiles available to a given SQL Server, the simplest guideline is:
- Transaction logs go on the fastest writing storage (typically your fastest drives in a RAID 10 array)
- Data files go on the fastest reading storage
The problem is that both of those lines say “fastest.” When DBAs approach SAN admins in a dark alley with a storage requirement, we always say the same thing: “Give us the fastest storage you have. All of it.” Since SAN admins aren’t prone to emptying their wallets for us, we have to learn how to compromise. We need to identify as many of our stored things as possible that don’t need speed, and then peel those out into separate requirements. In the case of a multi-terabyte data warehouse, those requirements might look like this:
- Transaction log files – extremely fast writes, needs to be on a RAID 10 array.
- Primary data files – tables with frequently changing dimensions, last 12 months of sales data. Needs to be on a RAID 10 array.
- Archive data files – historical sales data >1 year old that makes up 75% of our data volume, but is rarely changed. Can go on a slower RAID 5 array.
Then inside each of those needs, the DBA must decide whether to use a single file on a single array, or multiple files on multiple arrays. That decision is determined by the number of cores involved, the types of queries, and even the multipathing configuration on the database server. For example, if a server is connected to its SAN with several 1 gig iSCSI links that don’t have true active/active multipathing, the DBA and SAN admin may determine that they need multiple data files on multiple volumes just to get more throughput – regardless of CPU count or query type.
This is a lot of work to do on a server-by-server, database-by-database basis. That’s why in the vast majority of cases, it’s best to start simple: just separate the data and log files onto the fastest drives possible, and then over time, monitor your storage performance. I recommend using Perfmon to periodically check your Sec/Reads and Sec/Write numbers as discussed in my Perfmon article. When your queries are experiencing IO bottlenecks and you trace the problem back to slow Sec/Reads and Sec/Writes metrics, then it’s time to tune storage. Beef up that particular storage array, separate out what’s being stored on that array onto different arrays, or figure out how to make that array go faster (like with multipathing).
Hopefully I’ve started to open your eyes about the ways to tweak your storage options and what you put on that storage. The options are staggering, and the more you play with it, the more you can eke out every last bit of performance.
Our final post on Friday will cover some scary storage scenarios. (Yes, it’ll go live on Christmas Day, but only because I’m scheduling it in advance.)
The CS Techcast (Blog – Twitter) video podcast covers information technology news and opinions. I love the chemistry between the hosts, always fun to listen to, but if you just want to jump straight to my interview it starts at 16 minutes in. I talk about how virtualization affects SQL Server, and why virtualization storage isn’t as easy as it looks.
You can also view the episode at CSTechcast.com or download it to your favorite media player:
A while back, when I was working as a DBA (before I became a consultant), my Storage Area Networking (SAN) admin decided to leave the company. I volunteered to take on his job duties because I wanted the money wanted to further our educations. We learned a lot through some very hard lessons, and I consider myself very lucky to have had that opportunity. It’s really hard to get your foot in the door as a SAN admin, managing millions of dollars in hardware holding priceless data, and it’s just as hard to get quality SAN training.
This week, I’m going to try to bring you up to speed on how SQL Server and storage interact with each other. This isn’t meant to be a doctorate-level course in SANs, but rather, a simple series that will teach you what you need to know in order to do your job better. If you like what you see here, you might want to pick up Professional SQL Server 2008 Internals and Troubleshooting. The book is targeted at DBAs who need to know more about what’s happening in and around SQL Server. I wrote the storage chapter.
Today’s session will be about the very basics of storage. I’m going to simplify things in order to help you see the big picture. The experts in the crowd will notice that I’m leaving out things like snapshot isolation, multiple data files, and parallelism, but hang in there – we’ll get to some of those issues before the end of the week. Alright, let’s get started!
What SQL Server Stores
At their simplest, SQL Server databases consist of two things: a data file and a log file. The data file holds our tables, stored procedures, views, and so forth. The log file holds a running play-by-play analysis of what SQL Server is trying to do at any given time.
When one user issues a SELECT statement against the database to read data, the SQL Server engine’s response is fairly straightforward. It reads the data it needs from the data file.
When users INSERT, UPDATE, or DELETE data, however, things get a little more complicated. SQL Server needs to make sure that two people can both access the data simultaneously and safely. It also needs to guarantee that if the power plug gets pulled in the middle of an UPDATE statement, our data will be safe when the server restarts. The engine has to be able to roll back (or roll forward) transactions that had not yet completed.
Want to learn more about this? Drop ACID. The concept of Atomicity, Consistency, Isolation, and Durability is the foundation for heavy-duty relational databases.
This is where the log file comes in. In simple terms, when a user inserts a record, the SQL Server engine:
- Writes an entry to the log file recapping what it’s about to do
- Writes the new record into the data file
- Writes an entry into the log file saying it’s done mucking around in the data file
As often happens, let’s assume that a drunken sysadmin held down the power switch on our database server at the exact moment our SQL Server was busy performing step #2. After he hit it again to power the server back on, the SQL Server engine would start up and review the log. It would notice the logged entry from step #1, notice that there wasn’t a matching log entry for #3, and then go poke around in the data file to see whether or not the change had been made. I don’t want to advocate drinking in the datacenter (if you see your sysadmin drinking, make sure to take his rack keys away), but I do have to say that SQL Server is amazingly resilient. I’ve seen servers lose power or lose connections to their storage in the middle of very heavy activity, and yet they’ve powered back up fine. (If your server gets into a situation like this, though, there are precautions you’ll want to take, and we’ll address those later in the week.)
The drawback of this safe system is a that it’s constantly writing to the log file. Looking at our three-step example, two of the three writes involve the log file. The more often we do DUIs (Deletes/Updates/Inserts), the faster our log file drives need to react. When we’re configuring hardware for SQL Server, we want to consider two things:
- Put the data and log files on separate drives
- Put the log files on the fastest-writing drives we can afford
In a perfect world, we’d put everything on the fastest drives we can afford, but in this perfect world I would be eating bacon instead of teaching you bozos, and bacon would prevent heart attacks. Back here in the real world, though, we have to take Lipitor and cut back on our budgets. We can often get away with a little less speed on the data drives if we can use the SQL Server’s memory to cache queried data and avoid hitting the drives for every read, but we can’t use that memory to cache log file writes. Furthermore, the log file isn’t the only thing that wants its own storage.
Son, Your Server is Writing Data Your Memory Can’t Cache
SQL Server does a pretty good job of using memory to avoid hitting storage. Unfortunately, in addition to log files, two other parts of SQL Server also like to do a whole lot of writing at the exact same time we’re accessing the data file: TempDB and indexes. When these things need to write data, they need to write it NOW, and we can’t throw memory at the problem to make it faster.
TempDB is a system database that ends up a little like your kitchen junk drawer; it’s got all kinds of random stuff in there that isn’t really related to each other. Users write queries that create temporary tables in here, SQL Server uses it for big sorting operations, and the engine even uses it for some special transaction modes. In the case of temp tables, for example, users will write code like:
CREATE TABLE #MyReport (PilotID INT, DogfightsWon INT, DogfightsLost INT, TowersBuzzed INT) INSERT INTO #MyReport (PilotID, DogfightsWon, DogfightsLost, TowersBuzzed) SELECT p.PilotID, SUM(p.DogfightsWon), SUM(p.DogfightsLost), SUM(p.TowersBuzzed) FROM dbo.Pilots p INNER JOIN dbo.Instructors i ON ....
The important thing to note here is that SQL Server will be doing two things simultaneously:
- Reading from the data file
- Writing to the TempDB data file
This is a fairly typical use for TempDB, so when managing a SQL Server with TempDB-intensive queries, it makes sense to isolate TempDB onto its own separate hard drives.
Indexes present a similar problem. If we’ve got a table with five indexes, then every time we add a record to our table, SQL Server also has to add five other records – one to each of our indexes. That’s a lot of writing, but rather than adapting our storage design, I recommend that DBAs control their index designs. Use this SQLServerPedia query to find unused indexes, because dropping them will instantly result in faster DUIs – and who doesn’t want that? Dropping indexes is cheaper and easier than micromanaging multiple arrays to handle data and indexes, because indexes aren’t as easy to isolate as the log files or TempDB files.
So we’ve identified several things that might need separate storage due to their constant simultaneous access:
- Data files
- Log files
- Indexes (plus a hidden one, full text indexes)
In our next class this week, we’ll look at our storage options – the places we can put these files, and they may perform differently.
Yes, the guy who wrote that certifications are icing, not cake finally got himself some icing. I was an MCSE on Windows NT4 in 1999, but I let my certifications lapse over time because I just didn’t see the value in them. I’m not saying I do now – I only got them because they were a prerequisite to apply for the Microsoft Certified Master program – but I was pleasantly surprised by the experience.
I used to despise Microsoft certifications because the questions didn’t match up with what IT professionals actually did for a living. I vaguely remember the NT4 tests asking what the resistance was, in ohms, for a 10base2 coax network terminator. I had resistance alright.
If there’s such a thing as a “typical” production SQL Server DBA, their job duties include things like:
- Backup & recovery
- Security management
- Troubleshooting service problems
- Troubleshooting query problems
- Designing new systems
All of these require different commands (and sometimes even different tools), all with wildly different syntax. When I hire a DBA, I don’t want to know if he’s memorized the exact syntax for DBCC. If he has to run DBCC with that many parameters that often, he’s got a bigger problem, and I want to see him recognize that, and solve that underlying problem. I don’t give bonus points for being pretty sure about syntax – rather, I give bonus points for double-checking the manual every time you do something risky.
When I took the 70-432 exam and 70-450 exam, both targeted at production DBAs, I was expecting Trivial Pursuit: SQL Server Edition. I was completely and totally wrong. The test questions were meaningful, the answers required thought (not regurgitation), and the questions – for the most part – lined up with what production DBAs have to do for a living.
Readers, if you’ve been managing SQL Servers for 2-3 years, and if you need ammunition for your upcoming annual review, go take the 70-432 exam. It’s about $125, and I bet a lot of you could probably pass it without studying. And that’s a good thing, because it measures things production DBAs actually need to know.
How I Prepared for the Exams
I bought the Kaplan 70-432 Self Test Software and I was completely disappointed. They’re cheap – less than $50 for the test – and you still don’t get what you pay for. I can say unequivocally that the self-test is no braindump; the questions didn’t even approximate what was on the exam. Kaplan’s questions represented everything I used to hate about the old Microsoft tests, because they focused on obscure trivia.
Much more valuable, yet free, are Buck Woody’s blog posts about certification. He gathered lots of links to resources covering the 70-432 subject matter. My advice would be to read those whitepapers & docs, but don’t try to remember every little configuration option. Focus on the big picture of why you would implement something or when to use a particular feature.
After I passed the 70-432, I decided not to study at all for the 70-450 and just go for it. My logic was that if the test was anything like the 70-432 and tested “why” instead of “how,” then I was set. That paid off – I passed with a 982, although I attribute at least a couple of those right answers to lucky guesses. (Especially the CLR answers.) Pay attention, read the questions carefully, and filter out the answers that you know aren’t right. For example, there’s no such thing as muffler bearings in SQL Server 2008. Everybody knows those were eliminated in 2005 when they added the second flux capacitor.
You’ve got questions, we’ve got answers.
No, not Radio Shack – Twitter. Right now, no matter when you’re reading this, there are people on Twitter who can answer quick questions for you. Thanks to an excellent idea by Aaron Nelson (Blog – @SQLVariant), it’s even easier now. In this post, I’ll show how to ask questions and how to answer them.
How to Ask #SQLHelp Questions
Sign up for a Twitter account. You don’t have to follow anyone, but if you want to, I’d suggest following my SQL Server Twitter list instead of individual people. I’ve got a post on how to use Twitter lists, but in a nutshell, they let you keep in touch with a lot of people who focus on a particular topic. The cool part is that their tweets don’t clutter up your main Twitter page, which is important because there’s several hundred SQL Server folks on Twitter as of this writing.
When you need help, write a tweet and include #SQLHelp in the tweet, like this:
If your question involves more than 140 characters, you’ve got a few options:
- Post a question to StackOverflow if it’s a programming question, to ServerFault if it’s an infrastructure question, or DBA.StackExchange.com if it’s a SQL question. Tweet the link to your question.
- Upload screenshots to TwitPic. It’s a free service that tweets the images you upload. When you write the description, make sure to include #SQLHelp so that the smart folks see it.
- Upload files to FileDropper.com and tweet the link to the file. Remember that anything you upload is public – don’t upload your databases. It’s a great way to show query execution plans though.
After you click Update to post your question, click on the @YourName link on the right side of your Twitter home page. For me, it says @BrentO, because that’s my Twitter name. This page is your replies page – it shows anyone who’s mentioned your name. Then sit tight – as people reply to you, you’ll see the new tweets on this page.
When you reply back to users, the default Twitter action is to put their @Name at the beginning of the tweet. Edit the tweet first and put a period and a space before their name, like this:
This is because if you just start the tweet with @Mike_Walsh, then the only people who will see it are the folks who follow both you and Mike. If you start the tweet with anything other than an @ sign, then anyone who follows you will see your reply – regardless of whether or not they’re following Mike.
Don’t include the #SQLHelp tag in the reply, either. That just helps keep the #SQLHelp search cleaner.
When you get your final answer, post it a thank-you back to #SQLHelp, like this:
That way people know when your question is answered. If your question hasn’t been answered within an hour, you can repeat it again, but please don’t repeat it in less than an hour.
How to Answer #SQLHelp Questions
Set up a search in your Twitter client for #SQLHelp, or use one of these alternate methods:
As you’re interacting with the questioner, remember that they’re probably new to Twitter, and that you’re probably not the only one working with them. I open up two web pages – Search.Twitter.com with a search for the questioner’s username (so I can see who’s replying to them) and the questioner’s Twitter page (so I can see everything they respond back). That way you can keep duplicate interactions to a minimum.