Monthly Archives: March 2009

Adding Reliability to Your Infrastructure

I’ll never fly in a single-engine plane.  Never.  Not gonna happen.  Carve that one in stone.

Call me chicken, call me scaredy-cat, but I’m not going to get into an airplane that will kill me if an engine fails.  The next step up is a twin-engine plane – but I don’t get on all of those either.

I like my engines to be RAID 10.

I like my engines to be RAID 10.

If a single engine averages a failure once in every 10,000 hours of operation, then a plane with just one of those engines will experience a failure once every 10,000 hours.  What if we equip our plane with two of those engines – how often will we experience a failure?

  • Once in every 20,000 hours of operation, or
  • Once in every 5,000 hours of operation

The correct answer is once in every 5,000 hours of operation.  All other things being equal, two-engine planes are twice as likely to have an engine failure in the same span of time.

The only way a twin-engine plane is more reliable is if just one of the two engines is enough to power the airplane safely. If the airplane requires both engines in order to maneuver and land, then the second engine didn’t add reliability: it just added complexity, expense and maintenance woes.

If one engine fails, the other engine might suddenly be running at full capacity.  In day-to-day operations, we’d only be using around 50% of each engine’s power (because we got twice as much power as we needed in order to cover our disaster recovery plan).  This engine would have to suddenly go from 50% utilized to 100% utilized – and that’s when things really start to get tested.  This means we probably shouldn’t take our time to land if one engine fails: we should get our plane on the ground as fast as possible to minimize the risks of overworking the remaining engine.  It’s working much harder than normal, and it isn’t used to that kind of load.

The only way a twin-engine plane is more reliable is if the one remaining engine can last long enough to get us to the ground. If it can’t handle the stress of running at 100% capacity, we’re not much better off than we were in the first place.  Therefore, it probably makes sense to build in even more capacity; either using more powerful engines so that they each only need 80% of their power to handle our plane, or using three engines instead of two.

But we can’t just go bolting on engines like crazy: engines cost money, add complexity, and add weight, which makes the plane harder to get off the ground.

Now Replace “Engines” with “Servers”

Some disaster recovery plans call for two database servers: a primary server used for production, and then a secondary disaster recovery server at another site.  That secondary server is constantly refreshed with data from production – might be with log shipping, replication, database mirroring, etc.  So far, so good: we’ve improved the reliability of our production site, even though we’ve added complexity.

Later, management looks at that server sitting idle and says, “We can’t leave those resources lying around. Let’s use those for reporting purposes.  We’ll have reports run against the DR server, and that’ll make our production server much faster.”  Query loads grow over time, and before you know it, both of those servers are now production.  If even just the disaster recovery system goes down, we suddenly have a problem.

The only way a two-server disaster recovery plan is more reliable is if just one of the two servers is enough to power your application safely. Otherwise, you don’t have a disaster recovery plan: you have a pending disaster.  You have the insinuation of protection without enough actual protection.  Sure, your data will still be around if one server dies, but you won’t have enough horsepower to actually service your users.  In the users’ minds, that’s a failure.

To prepare for that disaster, do some basic documentation ahead of time.  Make a list of your environments, and note whether each DR server is purely DR, or if it’s actually turned into production over time.  Before disaster strikes, make a list of which user-facing services will need to be disabled, and which can remain standing.  Decide ahead of time whether to shut down reporting queries, for example, in order to continue to service other end user activities.

Now Replace “Engines” with “Drives”

RAID 5 protects your data by striping it across multiple drives and storing parity information too.  If any one drive fails in a RAID 5 array, you’re completely fine.  Pull out the failed drive, swap in a brand new one, and the RAID card will automatically begin rebuilding the missing data from parity data on the blank drive.  For more about this process, check out the Wikipedia article on RAID.

Hard drives have moving parts, and moving parts fail.  The more drives we add, the more likely we are to experience a failure.  We’re distributing the work across more drives, which increases performance, but it simultaneously increases risk.

When there’s a drive failure, the clock starts ticking.  We have to get a new drive in as fast as possible.  In order to reduce the failure window, enterprise systems use hot spare hard drives: blank drives that sit around idle doing nothing.  When there’s a failure on Saturday night at midnight (the universally agreed-upon standard time for drive failures), the raid array automatically uses these hot spare drives as their replacement and start rebuilding the array automatically. SAN administrators like hot spares, because they like doing other things on Saturday nights instead.

On Saturday nights, SAN administrators like to do karaoke at The Arbitrated Loop.

On Saturday nights, SAN administrators like to do karaoke at The Arbitrated Loop.

When they finally return to the datacenter on Monday to replace the dead drive with a fresh one, that fresh one becomes the new hot spare.  (Not all arrays work this way – I’m generalizing.  I can hear SAN admins typing their replies already.)

While the drive array rebuilds, the remaining drives are working harder than they normally would.  Not only are they handling their regular load, but they’re also simultaneously reading data to write it onto the fresh drive.  This means our hard drives are working overtime – just like the remaining engines in our plane scenario.

This becomes a tricky balance:

  • The more drives we add, the easier they can handle normal load from end users
  • The more drives we add, the more likely we are to have failures
  • But when we have failures, the more drives we add, the easier of a time we’ll have keeping up with the rebuilds
  • The larger the drives, the longer rebuilds take, which lengthens our time window for recovery

It’s just like planes: adding more stuff means managing a balance between cost, complexity and reliability.

The next time someone asks you to add more gear into your scenario or asks to take advantage of the disaster recovery gear that’s “sitting around idle”, it’s time to recalculate your risks and reliabilities.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

SQL Server DBA Interview: Kendal Van Dyke

Kendal Van Dyke is a Senior SQL Server DBA from Florida, and he’s been blogging since last year.  He’s syndicated at SQLServerPedia, and his recent series on RAID performance is a must-read.  I’ve talked back and forth with him a lot lately, and I figured I’d try something new: a virtual interview.  I emailed Kendal a list of questions, and here’s how it went down:

Kendal Van Dyke

Kendal Van Dyke

Brent: I think I recognize your avatar because you fragged me last night.  What made you decide to use your Xbox avatar as your online persona?

When I started getting really active in the online SQL community I didn’t have a decent headshot so I just used my avatar since I had already made one for XBox Live. It unintentionally turned into something recognizable since I used it everywhere. One of these days I really do need to take a good headshot though. I don’t think my avatar will be taken seriously by the PASS abstract selection committee, unless of course they think I really am a 6 foot tall cartoon.

Brent: I can tell by reading your series on RAID performance that you put a ton of work into it.  How many hours did the whole thing take, start to finish?

I didn’t keep track, to be honest! The testing was spread out over two months, with maybe 4-5 hours per week put into pushing buttons and rebuilding drives in different configurations. The hardest part was writing up the results. I put at least 4 hours of writing into each part of the series. All in all I probably put at least 50 hours into it, if not more. I’ve got more things in store that I hope to publish soon, too.

Brent: Blogging, when you do it right, is a part time job – if not a full time job.  It’s also the worst-paying job I’ve ever had, hahaha.  Why do you do it?

I gave a presentation at the first SQL Saturday in Orlando two years ago. As nerve racking as it was, I found that I really enjoyed sharing what I knew and I came to realize that I had more things to share that other people might be able to learn from. I also realized that there were a lot of one-off things that I wanted to document somewhere in case I ever ran into them again, and a blog seemed like the best way to kill two birds with one stone. I’ve had comments on some of my posts thanking me for helping solve a problem and that alone makes it worth it for me. It certainly isn’t for the money! haha

Brent: You got started blogging on your own web site.  If you’re trying to reach the most number of people, why do you blog on your own, instead of, say, using a popular group blogging site, writing magazine articles or writing books?

Bruce Campbell did a terrific commercial for Old Spice about experience that sums this up nicely. Although sites like SQL Server Central allow open submissions for articles, my impression is that writing for a magazine or a book – or even blogging on the bigger sites – is by invitation only. You don’t get an invitation unless people know who you are so you’ve got to build up your reputation somehow. Some people do that by answering forum posts. Others build up their social networks. I choose blogging and I ended up on Blogger because it was free, easy to get started, and I could figure out what to write about (and how) without any pressure. Now that I’ve got a year of experience I think I’ve gotten to the point where I can do those other things. Ironically, I don’t know that I want to blog on the bigger sites anymore – I worked hard to build up a personal brand and I don’t know that I’m ready to give up the control that I have over my own look and feel that having my own blog affords me. Instead, my next move is to take content that I would normally put on my blog, start publishing it in articles, and reference my blog if readers want to read more from me. That way I can maximize reach while still retaining the identity I built for myself. It’s a win-win for me. As for books, perhaps one day I’ll have gained enough recognition to be offered the chance to write something, but that’s still a long way out.

Brent: If you could give new bloggers any tips about the experience and how to get started, what would you say?

I’d start by reading your series on how to start a technical blog (I wish you had done that series before I got started, it would have really helped me!). Be patient because it’s going to take time to build a subscriber base and generate regular traffic. Having a pillar post really helps too. Before I wrote my disk performance series I was getting 10-12 visitors a day and now I’m over 100 per day just for that series. Also, don’t be afraid to ask other bloggers for feedback and help. I’ve found the blogging community to be very open when I’ve asked for advice.

Brent: You’re on Twitter as @SQLDBA.  How’d you find out about Twitter, and what made you start using it?

I’d heard bits and pieces about Twitter from news sites, blogs, etc, but I never paid much attention to it until the 2008 PASS summit. I didn’t get to go to the summit but I subscribed to the RSS feed for #sqlpass and got a kick out of what everyone was tweeting. That prompted me to sign up for an account and now I’m hooked!

Brent: The call to speakers is out for the PASS Summit in Seattle, and I know you’ve done some speaking.  Are you submitting any sessions this year?  I wanna encourage you since my favorite sessions always involve storage, heh.

Absolutely! One of my goals this year it to attend PASS as a speaker. And yes, the disk performance stuff will definitely be one of my abstracts. I haven’t submitted anything yet but when I do I’ll post the details on my blog.

Brent: Sounds good, looking forward to it!  Readers: swing by Kendal’s blog and check it out.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Open Letter to Non-Technical Friends with Windows Machines

First, I’m sorry about the virus you got.  Viruses, I mean, plural.  I totally don’t mean to kick you while you’re down, but we need to talk.

Yes, I’ll fix it.  Yes, I totally understand that you were just surfing the web for legitimate business needs.  No, I won’t look in your browser history.

Achewood Tells It Like It Is

Achewood Tells It Like It Is

It’s going to take me about four hours to back your stuff up, strip out the viruses, make sure everything’s working okay, run the latest updates, put a real antivirus program on there, put a safer browser on there, and teach you how to use it all.  Go ahead and grab a beer and a couple of Tylenol. No, not for you – for me.  I know you think I love doing computer work, but this isn’t exactly the part that calls to me.

Did you know the Geek Squad charges $400 for in-home service for this, and $300 at the store?  Ouch.  That’d pretty much erase the cost difference between the iMac I told you to get, versus the cheap Windows machine you picked up.  And that’s just one instance – and we both know this is gonna happen again, just like it did a couple of years ago.  Ah, you thought I’d forgotten that, huh?

Now remember, when I give it back to you, you can’t surf any suspicious sites with this thing.  I’ve done what I can by putting a better browser on there, but it’s always an arms race between the good guys and the bad guys.  If you go to bad web sites, odds are you’re still going to get infected, no matter how much I set up ahead of time.  Don’t download movies or music from web sites, because they’re probably not legit.  Don’t open movies people send you in email, no matter who it is – even if it says it’s me, it’s not really me.  I know you don’t understand how that works, but you have to trust me.  And don’t even think about going to any “adult” sites.  Maybe I could install virtualization on your machine and give you a separate instance, but I’m not sure that you’d be able to remember which window was safe and which one was dangerous.  I’d end up coming over to give you regular refresher lessons about how to use the thing, and I know you’d get frustrated because you don’t want to spend any time learning – you just want to go surf and play.

In fact, if you’re going to insist on sticking with Windows, and you’re going to keep your valuable business stuff on there like your accounting, what I’d really recommend is that you get two separate machines: one for business and one for…pleasure.  Although of course, even if you confine your “adult” surfing to the pleasure machine, you’re still going to get viruses on it, and you’re still going to rack up those $400 Geek Squad bills.

Or you could just buy a 20″ iMac for $1,000 or a Macbook for $1,300 and be done with it.  Your call.  You could even pick up a Mac Mini for under $500, but you have to bring your own keyboard, and quite frankly, you need a new keyboard.  This one is filled with stuff, and I don’t mean crumbs.

No, it doesn’t run a lot of software, but all you’re really doing is surfing the web, listening to music & movies, checking your email, and using Microsoft Office, and it’s great for all of those.  Best of all, you can go to all kinds of, uh, “web sites” and you won’t get infected, and I know how important that is to you.  Or at least, I imagine it would be if I’d checked your browser history.  Okay, look, I didn’t even have to check the history, because my anti-spyware stuff scrolled through a bunch of web site names as it was ripping out cookies.  We’re still friends, but let’s just say I’m wearing gloves when I have to touch your keyboard.

No?  You still wanna save money, eh?  Well, here’s where the bad news comes in: this is the last time I’ll fix your computer for free.  Next time, I’m pointing you to the Geek Squad.  I’m not spending my weekends fixing your computers just because you want some free pr0n.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

SQL Server and Cloud Links for the Week

SQL Server Links

Database Manager for IIS7 – IIS7 is designed to enable quick installation of mini-applications.  If you’ve used a Linux web host, it’s like Fantastico – a gallery of apps that you can point and click to install.  This Database Manager app is a web-based version of SSMS – just enough functionality to get basic database work done. Developers are gonna be all over this.

Microsoft Support Switching to Call-Back Model – for non-Premier customers, they’ll call you back instead of keeping you on the phone while they track down the right person.  If you’re in a company with 25 or more IT people (developers, Windows guys, whatever) you should check out Premier Support.  Yes, 25 is a low number, but Premier is amazing.

Does join order matter? – New blogger Christopher Stobbs writes my favorite kind of article: show and tell.

Using SQL Server Reporting Services in Client Mode – you can write your own reports for SSMS using this technique.

Erland’s SQL Server Wishlist – a list of items on Microsoft Connect that he wants you to vote on.  This is absolutely brilliant – I’ve resolved to spend time next week going through Connect.  It’s neat just for a learning exercise if nothing else.

Denis Gobo’s SQL Quiz – some tricky T-SQL and engine questions to test your knowledge.  The answers are here.

David Stein touches SSRS for the first time – I had this exact same “whoa” experience when I first saw SSRS, SSAS and SSIS.  I’ve spent my whole career just diving into the engine, and then I crack open these tools and say to myself, “My God, it’s full of stars!”  (Sorry, changed movie tracks there.)

Cloud Links

SQL Data Services Q&A on New Changes – they’re delivering a public CTP mid-calendar year 2009, and shipping in the second half.  SQL Server in the cloud is coming…

Salesforce.com Runs on 1,000 Servers – TechCrunch says, “55,000 enterprise customers, 1.5 million individual subscribers, 30 million lines of third-party code.”  Makes me wonder about the size of Facebook’s server pools.

More security holes in Google Docs – look, people, stop reacting with shock and awe.  It’s a young product. It’s gonna have bugs.  How secure was Windows 3.1, eh?  95?  98?  Young products have security issues.  Quit storing your secure data on somebody else’s brand new platform.  Keep your black book locally in an encrypted text file.

Junk Drawer

Drinkin and Dialin

Drinkin and Dialin

Avoid drunk dialing with the Bad Decision Blocker – before you go out on a bender, set up this iPhone application to prevent drunk dialing of your exes. And probably your manager too, come to think of it.

Apple WWDC scheduled for June 8-12 – Apple’s announced new iPhones the last two summers, and I’m hoping they announce another one at WWDC. I’m still on my original first generation, and it’s cool, don’t get me wrong, but I’m itching to get a new one.  I don’t surf enough to need 3G, and I’d like GPS, but not enough to shell out for a new phone.

Credibility comes from who you are, not where you work – Stephen Fosketts talks about the basics of the business of consulting, and that credibility line really resonated for me.

Advice on Setting a Price for Your Business – when negotiating with a venture capitalist, here’s a tip on how to get yourself a much better deal.

Urine-soaked coins aren’t accepted as payment for speeding tickets – who knew?

New Tesla S model unveiled – Tesla’s trying to get a government loan to build the car, and I hope they get it. That car is nothing short of gorgeous.  The big LCD panel in the center console and the integrated 3G internet are just icing on the cake for us geeks.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Reasons Why You Shouldn’t Virtualize SQL Server

I’ve blogged about why you should virtualize SQL Server, but it’s not all unicorns and rainbows.  Today we’re going to talk about some of the pitfalls and problems.

When Virtualization Goes Right

When Virtualization Goes Right

It’s Tougher to Get More Storage Throughput

Servers connect to Storage Area Networks (SANs) with Host Bus Adapters (HBAs).  They’re like fancypants network cards, and they come in either fiberoptic (FC) or iSCSI varieties.  These components are the place to focus when thinking about virtualization.

If your SQL Server:

  • Has 2 or more HBAs connected to the SAN
  • Uses active/active load balancing software like EMC’s PowerPath to get lots of storage throughput
  • Actually takes advantage of that throughput

Then you’ll be dissatisfied with the current state of storage access in virtualization.  Generally speaking, without doing some serious voodoo, you’re only going to get one HBA worth of throughput to each virtual machine, and that’s the best case scenario.

If you’re running multiple servers on the same virtual host, the IO situation gets worse: it becomes even more important to carefully manage how many SQL Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.

Never mind how much more complex this whole thing gets when we throw in shared storage: a single raid array might have virtual server drives for several different servers, and they can all compete for performance at the same time.  Think about what happens on Friday nights when the antivirus software kicks off a scheduled scan across every server in the shop – goodbye, performance.

No-Good Liar

No-Good Liar

It’s Tougher to Get Good Performance Reporting

Let’s look at the very simplest performance indicator: Task Manager.  On a virtual server, Task Manager doesn’t really show how busy the CPUs are.  The CPU percentages are a function of several things, and none of them are transparent or detectable to the database administrator.

Other virtual servers might be using up all of the CPU.

The virtualization admin might have throttled your virtual server.  They can set limits on how much CPU power you actually get.

Your host’s CPU can change.  Your server can get moved from a 2ghz box to a 3ghz box without warning.

And even if you dig into the underlying causes to find out what’s going on, there’s no reporting system that will give you a dashboard view of this activity over time.  You can’t look at a report and say, “Well, last Thursday my production SQL Server was hitting 100% CPU, but it’s because it was on a slow shared box, and on Thursday night at 5:00 PM it was migrated live over to a faster box, and that’s why pressure eased off.”

Not Everything Works As Advertised

Virtualization vendors have some amazing features.  We talked about vMotion and Live Migration, the ability to move virtual servers from one physical host to another on the fly without downtime.  While that does indeed work great, it doesn’t necessarily work great for every server in every shop.  If you’ve got a heavily saturated network, and your SQL Server’s memory is changing very fast (like in high-transaction environments or doing huge queries), these features may not be able to copy data over the network as fast as it’s changing in memory.  In situations like this, the live migration will fail.  I’ve never seen it bring the virtual server down, but I’ve seen it slow performance while it attempted the migration.

New features and new versions of virtualization software come out at a breakneck pace, and like any other software, it’s got bugs.  A particularly nasty bug surfaced in VMware ESX v3.5 Update 2 – on a certain date, VMware users couldn’t power on their servers because the licensing was expired – even if it wasn’t.  Imagine shutting down a server to perform maintenance, then trying to turn it back on and getting denied.  “Sorry, boss, I can’t turn the server back on. I just can’t.”  It took VMware days to deploy a fixed version, and in that time span, those servers just couldn’t come back on.

That’s an extreme case, but whenever more complexity is introduced into the environment, risk is introduced too.  Injecting virtualization between the hardware and the OS is a risk.

It’s Not Always Cost-Effective

All of the virtualization vendors have a free version of their software, but the free version lacks the management tools and/or performance features that I touted in my earlier articles about why sysadmins want to virtualize your servers.  The management tools and power-up editions cost money, typically on a per-CPU basis, and there’s maintenance costs involved as well.  If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.

Combining multiple guest servers onto less physical servers still doesn’t always pay off: run the numbers for all of your virtualization tool licenses, and you may end up being better served by a SQL Server consolidation project.  I did a webcast last year with Kevin Kline and Ron Talmage about choosing between consolidation and virtualization.  That information is still relevant today.

My Virtualization Recommendations for SQL Server

My recommendations are:

  • Virtualize only when it’s going to solve a problem, and you don’t have a better solution for that problem.
  • Get good at performance monitoring before you virtualize, because it’s much tougher afterwards.
  • Start by virtualizing the oldest, slowest boxes with local storage because they’ll likely see a performance gain instead of a penalty.
  • Avoid virtualizing servers that have (and utilize) more than 2 HBAs.

If you’ve virtualized production SQL Servers, I’d love to hear about your experiences.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Why Would You Virtualize SQL Server?

I talked about why sysadmins want to virtualize servers, and today it’s time to talk about why you as a DBA or developer might want to virtualize some of your SQL Servers.

Virtualization Means Cheap High Availability

Yo momma was an early adopter.

Yo momma was an early adopter. That's right - you're adopted.

In a perfect world, every SQL Server would be a cluster on a SAN with instant failover.  This isn’t a perfect world, and we can’t afford to implement that level of protection for every server in the shop.

Virtual SQL Servers provide a level of high availability without the expense and hassle of clustering.  In my last post, I explained that sysadmins love virtualization because they can easily move guest servers around from one piece of hardware to another.  As a DBA, I love it for that same reason: suddenly, if the underlying hardware fails, I’m not sweating bullets.  If the RAID card fails, if the motherboard fails, if the memory fails, if the network card blows chunks, you name it, it’s not my problem.  The sysadmin simply boots up my SQL Server on another virtual host, and I don’t have to deal with screaming users.

I especially like this option for development and QA servers.  Management is rarely willing to implement expensive high-availability setups for development environments, but when the dev server goes down, the developers and project managers are kicking my door down.  The PM’s scream about how there’s so many expensive developers sitting around idle waiting for the server to come back up.  Virtualization avoids this problem altogether.

Virtual SQL Servers can React Quickly to Changing Needs

Capacity planning is hard work.  How much speed do we really need for a new application?  What happens if our estimates are wrong, and we suddenly need a whole lot more oomph?  What happens if we overbought, and we’ve got all this expensive hardware sitting around idle because the application never caught on with end users?

Virtual servers can be easily scaled up or scaled down.  Simply shut the virtual machine down, change the number of CPUs and memory it has with a few mouse clicks, and boot it back up again.  Presto change-o.

We can even move storage around.  Say we initially decided to put the server’s data on a cheap RAID 5 SATA array, and now our users have fallen in love with the application and they’re adopting it like crazy.  We can shut it down, move the virtual hard drives over to a faster RAID 10 SAS array, and boot it back up again.  Presto change-o, rapid storage performance improvements without long downtime windows to backup/restore.

I loved this approach because it let me keep old SQL Servers around as long as necessary.  For example, we decommissioned an old help desk database server, but I told the help desk folks they could keep it around as long as they wanted.  It hardly used any resources at all, because nobody queried it unless they had a question about an old help desk ticket that didn’t get transitioned correctly to the new system.  I set the server up with one virtual CPU, 1gb of memory, and put it on cheap RAID 5 SATA.  The end users loved me because I could provide the service for no cost.

Virtual Server Performance Isn’t That Bad

Virtualization has a really bad reputation because in the beginning, the software really did suck and admins didn’t understand how to use it.  Raise your hand if you installed VMware Workstation on your desktop years ago, and you tried to run multiple operating systems on a machine that barely had the horsepower to run Internet Explorer.  I was there, and I know how it was.

Things have changed: the software’s gotten much better, memory became absurdly cheap, and there’s resources to help use it better.

Busted!

Busted!

The 2008 book Oracle on VMware by Bert Scalzo is a good example.  Bert’s one of the Oracle experts at Quest, and he wrote this to help DBAs understand how to implement database servers in a virtual environment.  It does a good job of explaining enough about virtualization to make a DBA comfortable with the technology and avoid some common pitfalls.  Years ago, books like this weren’t available because the technology was changing so fast that best practices were wrong as fast as they were written.

You can still get really bad performance with virtualization, just as you can get really bad performance with SQL Server on native hardware.  But follow a few basic guidelines like avoiding oversubscription, properly configuring networking & storage, and using the right OS configurations, and you can get perfectly acceptable performance.  Yes, you’re still going to lose a few percent of overhead – but with today’s redonkulously fast CPUs and memory, I’m more concerned about optimizing T-SQL code than I’m concerned with losing 5% of my CPU speed due to virtualization.  It’s all about priorities, and that small performance loss is just one piece of the picture.

Virtualization is All About Give and Take

The economy has gone down the toilet.

To make matters worse, it’s one of those pay toilets, and US Treasury Secretary Timothy Geithner has run out of quarters.

Invisible Quarters

Invisible Quarters

Until he’s willing to go through the Oval Office couch looking for more change – and I don’t blame him for being hesitant, because Bill Clinton left some disgusting stuff in that couch – we’re going to have to buckle down and find ways to save money.  As I mentioned in my last post, we can either cut hardware costs, or we can cut people costs.

Virtualizing non-mission-critical SQL Servers gives us the chance to cut costs without cutting service levels, and furthermore, gives us the chance to show management that we’ve got skin in the game.  If we insist that every single SQL Server is too important to be virtualized, and that there’s no way we can save money, then management just might look at us as part of the problem.  Be part of the solution: be proactive, find SQL Servers with low utilization levels, and offer them up as the first phase.

My approach was to start with the SQL Servers that I frankly didn’t really want to survive on their own anyway.  I had one particular application that had always demanded its own standalone SQL Server with no other applications on it.  The server was barely used, and it was a completely out-of-date and unsupported version of SQL Server.  The application manager wouldn’t allow me to patch it, wouldn’t upgrade to a newer version, and didn’t have the budget to move to newer hardware.

Virtualization administrators have slick tools like VMware Converter and Vizioncore vConverter to automate the physical-to-virtual (P2V) process.  These solutions take a backup of the physical server and clone it into a brand-new virtual server.  When the virtual server starts up, the new virtualization drivers are installed automatically, and the server is up and running smoothly without time-intensive application reinstalls.  I’ve blogged about virtualizing SQL Servers with P2V and I’ve performed it many a time – it’s a great way to get rid of hardware dinosaurs.

Even better, this whole process doesn’t require work from the DBA.  The time and effort is all put in by the virtualization administrator.  If your sysadmin is dying to try virtualizing a SQL Server, you can point them at your newly chosen dinosaur and tell ‘em to have at it – you don’t have to be involved.  That’s the best of both worlds: management will see you as a part of the solution, but you won’t actually have to do any work.

Sounds Great! Let’s Virtualize SQL Server Today!

Well, it’s not all unicorns and rainbows: there’s some reasons why you might not want to virtualize SQL Server, and I’ll talk through those next.

Continued: Why You Shouldn’t Virtualize SQL Server

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

More On the Carbonite Backup Failures

David Friend, the CEO of Carbonite, left a comment on my blog entry about Carbonite.  I’d like to applaud his efforts for taking the time to do that, but the comment raised some ugly questions.  It appears that they were putting data on 15-drive RAID 5 arrays.  RAID 5 is the most cost effective array setup (other than RAID 0, which offers no data protection).

RAID 5 will only tolerate a single drive failure – if you lose more than one drive in an array, the whole array is gone and must be restored from backup.  As SATA drives grow larger and larger in capacity, they take longer and longer to rebuild when one goes bad, because so much data must be copied over to the new drive.  If a second drive fails while the first one is being rebuilt, you’re completely out of luck, and must restore from backup.  The more drives you add in a RAID 5 array, the riskier it gets, because it’s more likely that one of the drives will fail in the time span of the rebuild.  That’s pretty dangerous for a company that makes its living off your backups being available.

Worse, it isn’t clear from the interviews I’ve seen that Carbonite actually backed up your data somewhere other than those RAID 5 arrays.  It appears that their attitude towards backup was that YOUR machine held THEIR backup: when they ran into problems with their RAID arrays, Friend commented that:

“Carbonite automatically restarted all 7,500 backups…”

Meaning, they started getting data again from their clients’ machines, not restored them from tapes or other arrays.  This is further evidenced by interviews that the Enterprise Storage Group conducted with Friend last year before the lawsuit came out, as blogged by Steve Duplessie of the Enterprise Storage Group. When asked how Carbonite protected their backups, Friend replied that:

“This is backup – not archiving, so if that ever happened you’d still have your data on your PC.”

Like Bryan Oliver says, the only reason we do backups is to do restores.  If the answer to restore problems is to use your live server, that’s a failure.  Carbonite didn’t protect against regional internet outages, either:

“Regional internet outages (we use multiple redundant carriers) would take us offline if they all failed. But again, unless you were in the middle of a restore when it happened, you’d probably never notice.”

Translation: if you aren’t using our services, you’ll never notice when we’re down.  How about replication from one site to another?

“…we don’t replicate data across multiple sites. The likelihood of losing data because of software bugs or human error is probably orders of magnitude greater.”

Errr, not sure why he’d say that, since Carbonite wasn’t protecting against data loss due to bugs either.

I can see where Carbonite’s coming from: they view themselves as a cheap way to protect data, and it works most of the time.  That’s the same approach I take with Amazon S3 cloud storage for my personal backups, incidentally: for a few bucks a month, it’s a cheap insurance policy.  I have my data replicated across my laptop, my Time Machine external hard drive, and my VMware server.  If there’s a house fire, my stuff might be on Amazon S3.  It also might not.  Amazon hasn’t made me any promises about the safety and security of my data.

My problem with Carbonite’s approach is that they seemed to take my personal data backups even less seriously than I do, maintaining a single copy in a single place.  For all I know, maybe Amazon’s using that exact same approach.  Only time and lawsuits will tell.

For more commentary on the Carbonite problems, check out Steve’s blog post, “Head in the Cloud? Or Just up your……..?” (And yes, he apparently stole that from Wilbur.)

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

New PASS 2009 Summit Site Unveiled – #SQLPass

PASS is on the ball this year!  They’ve got the web site and spiffy logo up and running for the 2009 Summit in Seattle.

PASS Summit 2009 Logo

PASS Summit 2009 Logo

Looking through the site, one change jumps out at me: the schedule this year will be Tuesday through Thursday, with pre-conferences running on Monday and Friday.  In the past, the summit has run Wednesday through Friday, with extra-cost, day-long pre-conference sessions on Monday and Tuesday.

I love this change.  If you only want to attend one pre-conference session, now you don’t have to worry about possibly spending an extra day sitting around.  Last year I had a pre-con on Monday, but not on Tuesday.

Plus, it seems that a lot of people want to fly out on Friday afternoon to make it home to their families.  Now, you can do that without missing any of the good conference sessions.

It’s a tough economy, and I’m curious to know how many people were originally planning on going to the summit but are thinking about bailing due to the costs.  Last year, I wrote a post about justifying the costs of the PASS Summit, and I’m just as convinced today as I was back then.  If I hadn’t gone to the PASS Summit, I wouldn’t have met Jimmy May, who got me involved in my book deal.  I’m not saying you’ll get a book deal if you go to PASS, obviously, but good things happen when you put yourself out there.

To stay informed with the latest PASS developments, follow @SQLPass on Twitter, and if you use an RSS reader, subscribe to the Twitter search for #SQLpass tweets.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Why Your Sysadmin Wants To Virtualize Your Servers

DBAs and developers: right now, your Windows admin is talking to management about all the time, money and hardware he can save by virtualizing servers.

He might not start with SQL Servers first, because frankly, he’s scared of database administrators.  DBAs only have one answer: “No.”

Your sysadmin knows you will say no.

Your sysadmin knows how you roll.

Thing is, though, he’s going to start by eating his own dogfood: he’s going to virtualize the servers he manages himself, like file & print servers, secondary domain controllers and DNS servers.  He’s going to prove that it makes sense, and he’s going to get excited about how much easier it makes his job.  He’s going to sell it to management, and because it does make sense for servers like that, they’re going to get excited too.  The next thing that happens is they’ll mandate virtualization across the board – including your precious SQL Servers.

I know, because I used to manage VMware servers, and yes, we had some virtual SQL Servers.

To understand why it happens, it helps to be armed with knowledge about why virtualization really does make sense for a lot of servers.  Today, I’m going to cover some of the more popular benefits.

Virtual Hardware Drivers and Firmware Aren’t Tied Into the OS

Virtualization abstracts the hardware away from the operating system.  When you run VMware ESX on an HP BL460c blade, for example, the virtual servers running inside ESX don’t need any HP drivers.  Instead, they use a set of VMware video, network, and storage drivers.  These VMware drivers are the same no matter what kind of video card, network card or storage adapter ESX is hooked up to.

This enables sysadmins to shut a virtual server down, copy it to another host server, and start it up without going through a messy driver installation.  It just boots right up as if nothing’s changed.  Some restrictions apply: some older virtualization platforms and some older servers may still install tweaked drivers, like if you move from an AMD to an Intel server or vice versa.  Newer families of processors and newer versions of virtualization software do a better job of hiding these differences from virtual guests.

When sysadmins don’t have to hassle with hardware drivers and compatibility, it makes troubleshooting and maintenance easier.  The more servers in the shop (especially servers of different brands), the easier this gets.

In addition, this makes hardware changes easier.  When servers get old, their annual maintenance fees from the manufacturer increase.  Eventually, it becomes more cost-effective to buy a new server than it does to continue paying maintenance on an old, underpowered server.  Normally this would mean time-intensive reinstalls (or risky backup/restore tricks), but with virtualization, the sysadmin can simply shut the old server down, move the virtual guests onto faster/newer/cheaper hardware, and start them back up again.

Even better, if you’re running virtualization with SAN-based servers, you can move virtual guests around without even shutting them down.

Move Virtual Servers Between Hosts On The Fly

This concept is key to a lot of the benefits I’m going to describe later.  I know you’re not going to believe this, but it really does work.  For VMware ESX users, it’s been working for years.  It requires that the VMware hosts use a SAN, and all servers need to be able to see the same storage so that any virtual server can be started up on any host server.

Here’s how vMotion works in a nutshell:

  1. The VMware admin right-clicks on a guest server and starts the migration process by picking which new host server it should run on.
  2. VMware copies the contents of the server’s memory over the network to the new host, and keeps them both in sync.
  3. When they’re identical, VMware transfers control over to the new host’s hardware.
  4. The guest server is now running on a completely different server.

When it’s done right with high-speed networking and fast servers, the handoff can be completely transparent to end users.

If you haven’t seen vMotion in action, again, I know you’re not going to believe me, but trust me that it works.  I vMotioned servers all the time from one host to another and nobody had a clue.  Well, granted, some of my users were clueless anyway, but that’s another story.

This ability to slide virtual servers around to different hardware at any time, in real time, without taking the virtual server down, opens up a world of benefits.

Do Routine Maintenance Tasks During The Weekday

Even though virtualization removes problematic hardware drivers from the guest OS, there’s still hardware maintenance to be done.  The sysadmin still has to update firmware, update the bios, fix broken hardware, and move hardware around in the datacenter.  The difference is that with virtualization, the sysadmin can do these tasks on Tuesday at 10am instead of Saturday at 10pm.  He can simply evacuate all of the virtual guests off a server in real time, then take his time doing the necessary maintenance work during the day while he’s chock full of coffee.

I gotta confess: as a DBA, I’m jealous of this capability.  Sure, in theory, SQL Server 2005′s database mirroring meant that I could move databases from the production server to a secondary server with a minimum of downtime, but it’s still very noticeable to connected applications.  The connection drops, transactions fail, and my phone rings.  I long for the day when I can move databases around the datacenter undetected.  Until then, I’ll be coming in to work on Saturday nights.

Better Utilization Rates on Cheaper Hardware

Just in case you’ve been busy reading SQL Server Magazine and ignoring Fortune, we’re having a little problem with the economy right now.  There isn’t one.  (By the way, you might want to check the rest of your mail too, because Bernie Madoff has some bad news about your account.)

Answer - this guy.

The answer is on the front of the shirt.

Utilization isn’t a metric DBAs normally deal with, but for sysadmins, it means the percentage of horsepower we’re actually using on our servers.  To see an oversimplified version, go into Task Manager on your desktop and look at your CPU utilization rates.  It’s probably low, and it’s probably low on your SQL Server as well.  If you’re averaging 10%, stop and think for a minute about what it might be like to use 1/10th the number of servers and still have enough power to get the job done.

Granted, there’s issues with peaks and valleys, and we have to make sure all servers don’t need full power at the same time.  But if we pooled enough resources together, we could still easily cut the amount of hardware in half and still be way overpowered.

Virtualization enables sysadmins to do this because they can run more virtual guests per physical host.  It’s not uncommon to see 8-15 virtual servers running on a single $10,000 blade server.  More servers in less space, using less power, requiring less cooling and less networking gear – it’s a pretty compelling story to companies looking to cut costs. Back when I worked for Southern Wine, I wrote an HP c-Class Blade Chassis review, and you might find that interesting.

Would you rather have ‘em cut hardware costs, or cut people costs?

Easier Capacity Growth & Planning for Virtual Servers

When servers share resources in groups, it’s easier to reallocate hardware and plan for growth.  When I managed our VMware farm, I had enough capacity that I had wiggle room.  If someone needed a testbed server for a few weeks, I could easily spin up a new virtual server for them without stress.  I didn’t have to hunt around for leftover hardware, find space in the datacenter, get it wired up, and so on.  I simply deployed a template server (a simple right-click-and-deploy task with VMware, happened in a matter of minutes) and added it to the domain.  When they were done, I deleted the virtual server.  It’s a great way to win friends and favors.

At budget time, I didn’t have to take a magnifying glass to every single server to figure out exactly how it would grow and how much money I’d need to spend.  I just budgeted a ballpark number for growth, and incrementally added servers to the resource pool over the year.  When my reseller could cut me a great deal, I picked up new blades.  I wasn’t forced to rush purchases at the last minute without negotiating for good prices.

So Why Should You Care About Virtualization?

As a DBA or developer, most of these benefits don’t really matter to you.  You don’t care whether your sysadmins have to work weekends, or whether they have an easier job managing capacity.  I just wanted to explain to you why they’re going to push virtualization, because there are indeed some real benefits for sysadmins.  There’s benefits to you too, but your sysadmin isn’t going to realize those, and isn’t going to sell you on ‘em.  In my next post, I’ll explain why you might want to virtualize some of your SQL Servers.

Continued: Why You Should Virtualize SQL Servers

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube

Getting me to speak at your user group

I just updated my Upcoming Events page to include:

  • More upcoming events
  • How to get me to speak at your local user group meeting
  • Links to video archives of my past events

To quote Abraham Lincoln, for those who like that sort of thing, I should think it is just about the sort of thing they would like.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

More Posts - Website

Follow Me:
TwitterFacebookLinkedInGoogle PlusYouTube