I got an awesome group of responses for the initial PASS Virtualization Virtual Chapter. I wish I could have taken everybody, but you know what happens when you have a committee with too many people: nothing gets done. (Come to think of it, nothing gets done on committees period…)
Meet the Hypervisors, also known as the founding members:
- Dylan Finney
- Iain Knight (Twitter – Blog)
- Jorge Segarra (Twitter – Blog)
- Ken Simmons (Twitter – Blog)
- Michael Deputy (Twitter – Blog)
- Tim Edwards (Twitter)
Here’s what we’re working on first:
Work with Blythe Morrow to set up our web site. PASS uses DotNetNuke to host their site, and we’ll get our own subdomain with our own look & feel. We get to pick what we want on the site and how it works. Blythe’s got ideas on how to help, and she’s got experience from working with the other virtual chapters. We need to work with her to figure out what we can put on there.
Build a schedule for meetings. We need to pick dates for monthly chapter-wide meetings. These are not status meetings with just us, but chapter-wide meetings where someone talks about a virtualization/SQL topic and lots of people fall asleep listening. We’ll need to recruit speakers. I’ll speak at the first one to talk about what we’re doing, and that’ll buy us some time.
Figure out how to record & web-enable LiveMeetings. The PASS virtual chapters use LiveMeeting to hold their meetings, and I want to start recording ours and putting the archives up on the web site. I want them to be playable through the web browser like I do with my SQLServerPedia tutorials and I want users to be able to comment on the videos & ask questions. I also want sharks with friggin’ laser beams on their heads.
Build a policy for working with virtualization vendors. The virtualization market is rough-and-tumble right now. There’s a lot of nasty subliminal marketing going on with vendors slinging mud at each other. I want to make sure we don’t get dirty, but at the same time, I want to us to be the gatekeepers for the SQL community. We can help each vendor get training material out to DBAs on the right way to configure virtualization for performance. Before we let them get to first base, though, we need a list of what we’re willing to do and what we’re not gonna do.
I’ve really appreciated the work Andy Warren‘s put into keeping PASS transparent by blogging status updates, and I’ll do my best to keep up that same level of transparency with us too.
Holy mackerel. Turns out you folks really liked my “Log Shipping to the Cloud with Amazon EC2 and S3″ session at the SSWUG Virtual Conference this season. Based on attendance and survey feedback, I won the Best of Show award in the DBA track.
Lightning even struck twice: my Reaching Compliance with SQL Server 2008 presentation was a runner-up, along with Kevin Kline’s End-to-End Troubleshooting for SQL Server 2005 session.
I want to thank everybody who took the time to submit feedback surveys. The presenters really appreciate your feedback. (Especially when it leads to awards, ha ha ho ho.) I also want to point out the SQL Server 2008 Compliance Portal, which is a fantastic resource on compliance training by JC Cannon’s crew at Microsoft. I can’t recommend that highly enough – it’s some of the best free training I’ve seen online.
Backup speed isn’t the sexiest thing DBAs spend time on during the day. It’s kind of boring.
Backups don’t complain that they’re not running fast enough. Users, those are the ones who complain: “My query’s not fast enough. Why can’t I do a cross join between data warehouse tables?” The squeaky wheel gets the lube, so we bend the users over the – wait, where was I going with that?
Backup speed also worsens very gradually over time, taking longer and longer to complete each night. DBAs assume the time increase is caused by more database data, but you know what they say about assume: there’s no lube involved. What if the time has nothing to do with the database size?
In the next couple of posts, I’ll explore some backup bottlenecks I found when doing a Backup Health Check recently for a company with a couple dozen SQL Servers. Today, put your lifejackets on, because we’re in for…
The Perfect MSDB Storm
Any one or two of these things wouldn’t have been a problem, but combine them all together and we had a disaster on our hands:
- SQL Server 2000 – which by itself isn’t a big deal.
- MSDB was on the C drive – which in this case, happened to be a very slow pair of ancient drives. The rest of the server was on a pretty quick SAN with RAID 10 arrays for data and logs, but MSDB was still stuck.
- The server had over 100 databases – again, no big deal.
- Transaction log backups ran every 15 minutes – of all 100 databases. That’s a lot of backup history data pouring into MSDB.
- MSDB cleanup jobs weren’t set up initially – which meant that weeks of backup history data started turning into months, and then into years.
- MSDB had grown to 3gb – on SQL Server 2000, this is a huge problem because it’s not indexed well.
Combine these factors, and we had The Perfect MSDB Storm. I couldn’t add MSDB cleanup jobs at this point because they couldn’t finish in any reasonable time. Search the web for problems like this, and you’ll find people in forums beating their chests and wailing for lifejackets. I tried a couple of stored procs and an ordered-delete trick, but doing these deletes meant holding a lock on the MSDB backup tables. The business wasn’t comfortable not being able to back up while I ran these deletes.
Kids – don’t try this at home. We’re what you call experts. We’ve got years of experience that keeps us safe.
- I scripted out the foreign key constraints on the relevant MSDB tables (backupset, backupfile, backupmediaset, etc)
- I disabled the backup jobs and waited for them to stop
- I dropped the constraints
- I truncated the tables
- I created the foreign key constraints again
- I enabled the backup jobs
The good news: all the backup history was instantly gone without waiting for slow logged deletions.
The bad news: all the backup history was instantly gone. This wasn’t a problem for this client because they used Quest LiteSpeed, and it has its own backup history repository anyway. (Shameless product plug: when your server goes down, what good does a local MSDB backup history do you anyway? LiteSpeed’s repository is centralized on another server, so you can take action on restoring backups faster.) This is not a great solution, and I wouldn’t post the code here for that reason, but it did work.
The bottom line: backup times dropped by 2/3! The nightly backups had been taking two hours, but a whopping 90 minutes of that was spent just updating MSDB tables with the backup history. They now take under 40 minutes. This certainly isn’t a typical result, but take a minute to make really sure that you’re cleaning up your MSDB history regularly.
There are 3 kinds of candidates for any IT job:
- The Junior Candidate – someone who’s not quite qualified yet because they haven’t actually done the work you want yet.
- The Perfectly Qualified Candidate – the person who’s done the tasks you’re hiring them for.
- The Senior Candidate – someone who’s done everything in this job position, plus the next level up.
Managers discard The Junior Candidate because they don’t want to take a gamble that whoever they’re about to hire might not be able to cut it. They want an employee who’s going to take care of business, period. They don’t want to walk in behind the employee and find ‘em surfing Books Online or posting a message on a help forum. (I’m not saying it’s right – I’m just giving you a tour of a manager’s head.)
Managers discard The Senior Candidate because they’re worried that the Senior Candidate is only taking the job temporarily. The Senior Candidate’s employer might be in bad financial shape, and the Senior Candidate will keep looking for a better-paying job to bail out.
Managers want The Perfectly Qualified Candidate: someone who’s making a lateral move between companies.
But why would someone do that? Why would someone change companies and keep doing exactly the same work for the same range of pay, or for within 5-10% of their same pay?
- Because the company’s in bad shape – maybe they’re working for a time bomb of a company that’s about to explode (or implode). They know they’re living on borrowed time, and they’re desperate to get out before they get laid off. Ironically, candidates are trained not to say anything negative about their company, so if this is the case, the manager might not know it.
- Because the team’s in bad shape – they’re working for a jerk, or they’re working inside a group of jerks, and they’ve had about enough. Again, candidates are told by well-meaning interview books to give non-negative answers about their current team, so this reason won’t show up in the interview.
- Because they’re personally in bad shape – they think they’re about to get fired for their own behavior, or perhaps they think the rest of the team is in bad shape but the problem’s in the mirror. Extremely sharp interviewers might be able to find the bad apples.
I’ve seen so many cases over and over when companies have hired The Perfectly Qualified Candidate only to be shocked that they don’t work out so well. Stop and think about it: people don’t make lateral moves unless something’s in bad shape. Before you hire them, you’d better make sure it’s not the candidate themselves.
The Brent Ozar World Domination Tour continues with a bunch more free training events. I’ve updated my Upcoming Events page, and here’s the stuff in the next 30 days:
May 28 – Quest Pain of the Week Webcast
Getting Started with SQL Server Management Studio
Join SQL Server experts Brent Ozar and Michelle Ufford to learn basic but essential skills for managing your SQL Server.
They will show you how to get around SQL Server Management studio, how to use Activity Monitor and how to get a quick feel for what’s happening on your SQL Server.
Don’t miss this overview of tools that can help you as you take on more DBA responsibilities in lean times.
June 3 – How to Be a Backup & Recovery Hero in Natural Disasters
Hurricane season is here! Will your SQL Server data be safe in the event of a hurricane or other natural disaster? Quest can help you be a hero in SQL Server data recovery for your organization with valuable insight based on experience gained in the aftermath of disasters.
I’m doing two different seminars:
DBAs at small to midsize businesses – you probably can’t afford to maintain standby servers in a remote datacenter. However, you can still put cost-effective measures in place to protect your organization’s essential databases in the event of a disaster. In this webcast, you’ll learn how to build a bug-out bag, why virtualization can be a lifesaver, and why USB drives don’t always make sense for emergency offsite backups. Register for the small to midsize business webcast.
DBAs at large enterprises – to ensure your organization’s data is safe when disaster strikes, it’s not enough to throw servers in a remote datacenter. Being ready means testing, documentation and planning for the return to the production datacenter. I’ll share practices I learned from managing databases for a $7 billion company in South Florida during three hurricane seasons, including a two-week power outage after Hurricane Wilma. You will also learn the importance of a buddy system with a fellow employee in another state, and why hardware inventories are much more important than you think. Register for the enterprise webcast.
June 11 – Quest Pain of the Week Webcast
Time-Saving SQL Server Management Studio Tips & Tricks
Advance your skills in SQL Server Management Studio (SSMS) with SQL Server experts Brent Ozar and Michelle Ufford. They’ll show you ways do administration tasks faster than ever and which free add-ons actually add value, like Mladen Prajdic‘s SSMS Tools Pack and the Performance Dashboard Reports.
Plus, they’ll explain how Toad for SQL Server complements SSMS by allowing you to compare and synchronize schema and table data, execute scripts against multiple instances, roll back transactions and more.
Come meet me in person! We’re doing a day of presentations on how to get more performance out of the hardware you’ve already got, and how to make your applications run faster. My presentations will be:
Perfmon and Profiler Basics: Find The Performance You’re Leaving on the Table
These two tools are the key to successful performance tuning. Brent will show how to get started with these tools, how to slice and dice the results, and how to interpret the data to find easy ways to get more performance fast out of your existing hardware. Attendees will learn:
- How to use Perfmon to find bottlenecks
- How to use Profiler to track down the cause of problem
- How to feed these results into Microsoft tools to get performance recommendations
SQL Server Disaster Recovery 101
Developers and accidental DBAs: if you know more about how SQL Server handles crashes and disasters, you’ll be able to make a better decision about how to prepare. In this session, Brent will cover all of SQL Server’s backup and high availability options at a high level, including clustering, log shipping, mirroring, replication and more. He’ll show the pros and cons of each, and teach you how to pick the right method for your application.
Attendees will learn:
- The difference between high availability and disaster recovery
- Real-world drawbacks of each solution
- Which methods complement each other for better protection
June 25 – Quest Pain of the Week Webcast
Conquering the Giant – How to Manage a Very Large Database
As your database continues to grow to a terabyte or more, you might feel like David facing Goliath. Your challenge seems insurmountable as it gets harder and harder to allocate space and speed up queries. Attend this webcast to learn how to quickly reclaim space, monitor and report on capacity growth trends, and improve the performance of your database. Conquer your giant database!
SQL Server Links
New PCI Express solid state drive benchmarked – HotHardware got their hot hands on a prototype of the upcoming OCZ Z-Drive and got “reads in excess of 500MB/sec and writes in excess of 400MB/sec”, which translates to a nice little 4-8 disk RAID array of conventional magnetic hard drives. I really like the sounds of these things to shed TempDB load. No word yet on price or release dates.
Failover Clustering Supported on Virtual Machines – frankly, if you weren’t doing virtualization before, this probably wasn’t what was holding you back, but it’s out anyway.
SQL Server startup problems due to affinity masking – Bob Horkay runs into problems with an active/active cluster on HP Polyserve and explains the solution. Active/active clusters save money, but you need a good DBA. (Like Bob.) This stuff isn’t set-it-and-forget-it.
SQLBatman is now Thomas LaRock: SQLRockstar – Tom rebranded his blog, and Phil Factor and I carried on a conversation about branding in the comments.
SQL Backup Cleanups – before you use scripts to automatically delete old bak files, Jason Massie explains why you should check the archive bit. I’ve been bitten by this one too when my backup guys swore the tape backups were working – except they weren’t.
Drop and Recreate All Indexes – copying a lot of data between servers is made easier with Jeremiah Peschka’s script.
Are You Down with BPA? – the Best Practices Analyzer is like getting a Microsoft Risk Assessment Program (RAP) visit for free. Speaking of RAPs…
SQL Server 2008 Developer Training Kit – Pinal Dave points out free training materials for developers who don’t mind digging through PowerPoint and demos.
Cloud and Virtualization Links
Ten Slides from the Microsoft SDS TechEd Session – violates all kinds of rules about good presentations, but hey, it makes a good blog post.
Send Amazon your USB disks – wanna load a ton of data onto Amazon S3 for backup, like a large volume of images (no, not pr0n) or database backups, but you don’t have a fast upload connection? Now they take eSATA and USB drives. One-time $80 fee and $2.49 per hour, but that’s pretty fair. I worked for a company that needed offsite backup of a ton of billing images, and they had years of archives online. This would have been the easiest way to get the archived data to S3, and then it’d be easy to stay current using conventional uploads.
Amazon offers load balancing and monitoring for EC2 – Amazon EC2 is a cloud-based virtualization host where you can spin up new instances of SQL Server, Windows, MySQL, etc. Now they’ve kicked it up a notch with CloudWatch, a service to monitor your instances and scale up by turning on new instances when your app’s getting overwhelmed with load. Of course, your app still needs to handle some of the scaling – you can’t just turn on a bunch of SQL Servers and hope for the best – but they do include a load balancer service. Interesting, but not for SQL Server users yet.
Steve Jones’ MVP prize arrived – a USB plasma ball? Seriously? Wow, if this is the MVP award, I’d hate to see the LVP award. (I’m kidding, of course – he got the USB plasma ball from the Microsoft company store. The real MVP award plasma ball is at least 8″ across.)
Wolfram Alpha is Nothing Like Google – I agree with Aaron Alton’s explanation of why Google doesn’t have anything to fear from Wolfram Alpha. I am so totally unimpressed by Wolfram Alpha: it’s a visually beautiful answer to a question nobody’s asking. Congratulations – I have never searched for the temperature on the day I was born, nor am I inclined to now that it’s even easier.
Inside a Datacenter – getting a peek inside a datacenter usually requires some nasty security agreements and a ban on photography. Watching videos like these is like geek pr0n.
On Fridays on Twitter, people make a short list of a few people they recommend that other folks follow, and they tag their messages with #FollowFriday. I think that’s interesting, but I don’t participate often because I don’t wanna just name a few names in 140 characters. There are, however, a few people that I want to take a moment to recognize amongst the community for doing really good work.
I met Jeremiah at the PASS Summit in Seattle last year, and we’ve been good friends ever since. He’s a developer (NO!) and a consultant (NOOO!) but still manages to be a completely likable guy. He’s constantly trying to learn and grow his skills, and we keep threatening to cross-train each other on our respective skill sets. (Not respectable, just respective.)
He’s launched a PASS chapter in Columbus, Ohio, and he’s stirring things up at PASS. He combined forces with Blythe Morrow and Tom LaRock to convince me that I really could start a Virtualization Virtual Chapter. That says a lot, because I abhor politics.
Jeremiah’s in the running for MVP, and if I had anything to do with the MVP process, I’d nominate him too. I think PASS volunteering is the second most important thing someone can do for the community, and convincing OTHERS to help the community is el numero uno.
I met Jimmy, a member of the world-famous A.C.E. performance & security team at Microsoft, for the first time at the PASS Summit last year also. (Notice that? Two good people at PASS in a row. You meet a lot of good people at the PASS Summit.) I was absolutely blown away by his presentation. He explained partition alignment in a clear, straightforward, and hilarious way. He’s since followed up with a whitepaper on partition alignment, and it should be mandatory reading for every SAN administrator.
Just this past week, I got a frantic email from a client who’d just rolled out a new half-million-dollar SAN. Performance wasn’t what they’d expected, and was even slower than the last SAN I’d helped them configure. After troubleshooting, the SAN vendor gave them just one recommendation: align your partitions. The client wanted to know if the SAN vendor was pulling their leg, and no, they’re not. It really is that important.
He’s submitting the same session again for PASS this year, and I sincerely hope he gets approved quick fast and in a hurry. (He works for Microsoft, so they have a separate round of submissions aside from us mere mortals.) If he does, you need to go to his session. Partition alignment is an issue that isn’t going away until we’re completely done with Windows 2003, and judging by the number of SQL Server 2000 instances I see sticking around, it’s going to be at least another five or ten years of manual partition alignment.
You might recognize Michelle’s name from any number of places:
- Founded the East Iowa PASS Chapter
- Her webcast on index fragmentation
- The SQLServerPedia article on index fragmentation
- The rap contest she ran with Ward Pond and me
She’s got an inquiring mind and she helps teach others the things she learns. When she writes a tutorial, I pay attention, because it’s usually something I haven’t seen before. When she used DBCC PAGE to demonstrate the effects of fragmentation, I was so impressed.
Like Jeremiah, I’ve heard that she’s also been nominated as an MVP, and I would cast a nomination for her as well if I had any power whatsoever. I don’t, so presto, you’re getting a sermon from me via my blog, because you might have some power over the process. If I was going to recognize valued members of the community, Michelle would be another one since she devotes so much time and effort into helping others in the community.
I won’t be doing this every #FollowFriday, but every now and then I might drop a few names here. These people deserve a round of applause for the work they’re doing. Here here!
Joe Webb, Jorge Segarra, and Stuart Ainsworth have joined our merry band of bloggers over at SQLServerPedia! Let’s meet the new folks.
Joe’s a SQL Server MVP, former PASS Board member, and a funny guy. I can prove the last point with one of his #SQLputdowns tweets, which was one of my all-time favorites and will be featured on an upcoming Quest t-shirt. His humor carries forth in his blog, too, with posts like:
- I’ll Have a #4 Combo with a Side of Wifi – in which Joe gains two pounds and blames it on the database.
- My Virtualization Setup – if you thought my Macbook Pro was odd for a SQL guy, read about Joe’s Ubuntu setup.
- Does order matter in a JOIN clause? – classic question, and Joe has the answer along with examples.
Speaking of funny, Jorge (aka SQLChicken) was one of the guys behind the #SQLeditions craziness during the SSWUG Virtual Conference. Some of his recent posts have included:
- Got Corruption? – think Twitter’s a waste of time? Imagine having a database corruption problem and getting tips from Paul Randal. If that doesn’t sell you on Twitter, I don’t know what will.
- Tampa SSUG Meeting Notes – every month, SQL Server groups all over the country meet and the proceedings are forever lost. It’d be great if after every meeting, someone posted an in-depth listing of what happened. This encourages other area people to attend because they see fun stuff happening in their own community.
- SQL Rap by DJ Majik Poultry – Jorge’s entry for Michelle Ufford’s SQL RAP contest, and he earned high points from me.
Stuart’s domain name CodeGumbo.com gives away his native home, Louisiana, and I would probably have syndicated his blog just for the mention of my favorite soup. The very thought of my favorite restaurant in the world, Ragin Cajun in Houston, makes my mouth water. If you haven’t had bayou gumbo and a muffuletta sandwich, you haven’t lived – or at least you haven’t lived large. But I digress.
Stuart is an architect that specializes in SQL Server and data integration. Some of his recent blog posts include:
- SCRUM, Source Control, and the SQL Server Developer – DBAs can work better with developers by keeping in mind how the developers work with management. If your developers are using SCRUM, read Stuart’s post to understand what the fuss is all about.
- So You Wanna Host a SQL Saturday? – organizing these free all-day events is a monstrous job.
- Visual Studio TS for Database Professionals – Stuart talks about some challenges he’s had with the tool. It’s always tough finding honest yet in-depth reviews of how software development tools work. Seems like reviewers scan over the highlights and dash off a quick article, whereas guys like Stuart find more info because they use the tools for a living.
I’m excited by the number of good things that just keep happening at SQLServerPedia. Our new webernator, Brett Epps, is in the midst of posting a blog entry about the updates he made to the site this week, and tomorrow I’ve got more fun announcements to make. Good times!
The Professional Association for SQL Server has a few Virtual Chapters. Formerly known as Special Interest Groups (SIGs), these are groups where people with similar interests can get together and talk shop. It’s like that Furry group you belong to, only different. Or maybe not so different.
I’m starting up a Virtualization Virtual Chapter: a group dedicated to helping database administrators manage SQL Servers that live in virtual environments like Microsoft Hyper-V and VMware vSphere.
The group will be responsible for:
- Helping the community by producing things like monthly webcasts, a centralized list of helpful virtualization-savvy administrators, and an annual report of what’s working well under virtualization and what isn’t quite there yet.
- Helping PASS management by increasing membership, building assets that PASS can leverage (like a library of training content), and showing that PASS solves real DBA problems.
- Helping virtualization vendors by getting training and configuration information out to the right people who need it and helping them build a better product with our feedback. I almost see the group as Consumer Reports for virtualization – there’s so much crappy hype going back and forth, and the DBA community needs an unbiased group to give real answers.
I’m not just looking for virtualization cheerleaders or seasoned pros, either. In order for this to work, I need a balanced group of people from all over the virtualization spectrum, like:
- DBAs from enterprise shops with hundreds of SQL Servers that have deployed virtualization successfully
- DBAs from small to midsize shops that have only dabbled in virtualization
- At least one doubter: somebody who isn’t convinced that SQL Server will ever work in virtual environments
My community will be made up of people all over the technological map, but they’ll have a few things in common:
- Experience with SQL Server (even if they’re just junior DBAs)
- A keen interest in virtualization (whether it’s seeing it succeed or seeing it fail, I just want some passion)
- An online presence (Twitter, blog, etc)
- A willingness to spend 4 hours a week putting up with me
If this sounds like you, take the Virtualization Virtual Chapter Steering Committee Screening Survey, also known as the VVCSCSSV1.
Update 5/27 – thanks for your interest, and I’ve closed the survey. We’ve got our founding committee members.
In Part 1 of my multipathing series, I talked about what paths are, and today I’m going to be talking about multipathing. SAN multipathing software has two goals, in this order:
Using SAN Multipathing for Failover Protection
Your server absolutely, positively has to be able to access its drives at all times. When servers can’t access their hard drives, horrendous things happen. When hard drives were directly attached to servers, this wasn’t a big risk, but storage area networks bring in a lot of risky factors. Cables get unplugged or get bent beyond repair. Switches fail. Network configurations don’t go according to plan.
(Side note: I think this was one of the biggest reasons SAN administrators didn’t want to go to iSCSI. They saw how our network cables looked, and they didn’t want their precious fiberoptic cables getting that same treatment.)
Multipathing software mitigates this risk by enabling the SAN admin to set up multiple routes between a server and its drives. The multipathing software handles all IO requests, passes them through the best possible path, and takes care of business if one of the paths dies.
In the event of a problem like an unplugged cable, the multipathing software will sense that IO has taken too long, then reset the connections and pass the request over an alternate path. The application (like SQL Server) won’t know anything went wrong, but the IO request will take longer than usual to perform. Sometimes in SQL Server, this shows up as an application-level alert that IO has taken more than 15 seconds to complete.
To make this work, SAN administrators build in redundancy at every possible layer of the SAN infrastructure – multiple HBAs, multiple switch networks, multiple connections from the controllers, and so forth. But most of the time, all this extra connectivity sits around idle. It’s designed to be used for protection, but not necessarily performance: it’s active/passive gear where only one thing is active at a given time. The secondary goal of multipathing is performance, but it’s a far, far second. SAN administrators are so conservative, they make database administrators look like gambling addicts. They’re perfectly comfortable leaving half or more of the infrastructure completely unused.
Do We Really Need More Bandwidth?
Depending on the SAN infrastructure, the theoretical speed limits are around:
- 1GB Fibre Channel or iSCSI – around 125 MBs/second (this is the most commonly deployed iSCSI speed)
- 2GB Fibre Channel – around 250 MBs/second
- 4GB Fibre Channel – around 500 MBs/second (this is the most commonly deployed FC SAN speed)
- 10GB iSCSI – around 1250 MBs/second
These limits were fine ten or fifteen years ago when hard drives weren’t all that fast, but here’s some sample read speeds from today’s desktop-class SATA drives:
- One drive – around 130 MBs/second (from TomsHardware reviews)
- RAID 5 array of five drives – around 300 MBs/second (from my home lab)
Forget 15k drives or solid state drives – even just with today’s SATA drives, 4GB Fibre Channel can get saturated fairly quickly during large sequential read operations, like SQL Server backups or huge table scans on data warehouses. Sadly, I see so many cases where the IT staff bought a SAN with dozens or hundreds of hard drives, hooked it up to a server with just two 4GB fiberoptic connections, and they can’t understand why their storage isn’t much faster than it was with local disks. Even if they get savvy to the basics of multipathing and try connecting more 4GB HBAs, their storage speed doesn’t necessarily increase.
Enter Active/Active Multipathing
Active/active multipathing is the ability to configure a server with multiple paths to the storage and simultaneously use all of them to get more storage bandwidth. This type of multipathing software is usually sold by the SAN vendor, not a third party, because it’s a lot more complicated than it looks at first glance. Talk to your SAN vendor and ask how much their active/active multipathing software costs, and what it’s compatible with. EMC’s PowerPath even works with gear from multiple vendors.
But before you plunk down a lot of hard-earned cash – well, it’s not that hard-earned for storage administrators, but I’m talking to database administrators here – you need to ask one very important question: what exactly does this software mean by active/active? In your feeble mind, you probably believe that you can have one array, accessed by one server, and spread the load evenly over two or more Host Bus Adapters. Not so fast – some vendors define active/active as:
- Only one path can be active per array at a given time. If you have four HBAs, you’ll need four arrays in the SAN, and SQL Server will need to spread the data across all four arrays. This means designing your database filegroups and files specifically for the number of HBAs in use on your server.
- All paths work for sending data, but only one can receive. I’ve seen this in iSCSI active/active multipathing solutions. For SQL Server, this means you can insert/update/delete/bulk-load data at breakneck speeds, but your selects still crawl.
- Active/active works, but failover sticks. Say you have two paths to your data, and one of the paths goes bad for some reason. All traffic fails over to the alternate path. When the bad path comes back up (like the cable is plugged back in, the power comes back on, the port is replaced, etc) traffic doesn’t automatically balance back out. It stays on the single path. The only way to find this out is with expensive SAN-monitoring software or by browsing through SAN configuration screens periodically.
For virtual servers, I’ve got bad more news: the only true active/active SAN multipathing today is in VMware vSphere 4.0 with EMC PowerPath. Stephen Fosketts explains the storage changes in vSphere. If you’re on VMware v3.5 or prior, on Windows Hyper-V, or on vSphere 4.0’s lower licensing tiers, you’re stuck with one HBA of throughput per server per LUN (array). This is one reason why you might not want to virtualize your high-end SQL Servers yet: they don’t get quite the same level of throughput that you can get on physical hardware. Don’t let that scare you off virtualization, though – remember, you’re probably reading this article because you don’t have true active/active multipathing set up on your physical SQL Servers, either.
There’s a lot of catches here, and the SAN salespeople are always going to smile and nod and say, “Oh yeah, ours does that. That’s good, right?” It’s up to you: you have to ask questions and test, test, test. Get a time-limited evaluation copy of their multipathing software and test your SAN performance with SQLIO. It’s the only way to know for sure that you’re getting the most out of your storage investment.
Want to learn more? We’ve got video training. Our VMware, SANs, and Hardware for SQL Server DBAs Training Video is a 5-hour training video series explaining how to buy the right hardware, configure it, and set up SQL Server for the best performance and reliability. Here’s a preview: