Blog

I’m Finally a Microsoft MVP!

12 Comments

I’ve been working with the community for years, spreading the good word about Microsoft products and helping the community, and it’s finally paid off: I’m an MVP!

I’ve been recognized for the work with the Microsoft product that’s closest to me – literally.  Check out these pictures of my office over time:

My Office in 2003
My Office in 2003
My Office in 2004
My Office in 2004
My Office in 2007
My Office in 2007
My Office in 2009
My Office in 2009

Spot what they have in common?  While my office has moved from town to town, and while I’ve switched workstations from IBM to Dell to Apple, my love for Microsoft ergonomic keyboards has remained solid as a rock.

Years ago, I suffered from the symptoms of RSI, the disease formerly known as Carpal Tunnel.  I type a lot, and I type extremely quickly, and those two things spelled trouble for my wrists.  While using non-ergonomic keyboards, my pain increased to the point where I had to take ibuprofen regularly just to get through the workday.

After switching to the first Natural Keyboard, I felt less pain within a few weeks, and the pain was completely gone after just two months.  Typing became a joy again.  Microsoft enabled me to get my job done and stay in touch with my friends and family without resorting to sign language or, even worse, verbal communication.  Whether when I’m writing about how to become an MVP or about how my Windows friends keep getting viruses and sticking me with the bill, I’m doin’ it on a Microsoft ergonomic keyboard – the very best input device in the world.

Microsoft has recognized my silent, enduring work with the community by honoring me with the title of Microsoft Natural Keyboard MVP.  I was completely shocked and flattered, and I’m going to take today off to celebrate.

UPDATE April 1: I’m afraid there’s been a bit of a scandal.  It turns out that the MVP Committee searched the internet and found a picture of my cubicle with an Apple keyboard.  They confronted me, and they asked if I ever cybered with the Apple keyboard.  While I categorically deny those accusations, I will admit that I’m probably not the best spokesperson for the Microsoft input device community at this time, and I’ve given up the Microsoft Natural Keyboard MVP award.  I apologize for those who I’ve hurt, and I will do my best to do right by the community.  I’ve entered a 104-step recovery program, and Microsoft’s offered the assistance of Lauren to help me shop for a new keyboard.


Adding Reliability to Your Infrastructure

Architecture
8 Comments

I don’t wanna fly in a single-engine plane.

Call me chicken, call me scaredy-cat, but I’m not excited to get into an airplane that will kill me if an engine fails.  The next step up is a twin-engine plane – but I don’t get on all of those either.

I like my engines to be RAID 10.
I like my engines to be RAID 10.

If a single engine averages a failure once in every 10,000 hours of operation, then a plane with just one of those engines will experience a failure once every 10,000 hours.  What if we equip our plane with two of those engines – how often will we experience a failure?

  • Once in every 20,000 hours of operation, or
  • Once in every 5,000 hours of operation

The correct answer is once in every 5,000 hours of operation.  All other things being equal, two-engine planes are twice as likely to have an engine failure in the same span of time.

The only way a twin-engine plane is more reliable is if just one of the two engines is enough to power the airplane safely. If the airplane requires both engines in order to maneuver and land, then the second engine didn’t add reliability: it just added complexity, expense and maintenance woes.

If one engine fails, the other engine might suddenly be running at full capacity.  In day-to-day operations, we’d only be using around 50% of each engine’s power (because we got twice as much power as we needed in order to cover our disaster recovery plan).  This engine would have to suddenly go from 50% utilized to 100% utilized – and that’s when things really start to get tested.  This means we probably shouldn’t take our time to land if one engine fails: we should get our plane on the ground as fast as possible to minimize the risks of overworking the remaining engine.  It’s working much harder than normal, and it isn’t used to that kind of load.

The only way a twin-engine plane is more reliable is if the one remaining engine can last long enough to get us to the ground. If it can’t handle the stress of running at 100% capacity, we’re not much better off than we were in the first place.  Therefore, it probably makes sense to build in even more capacity; either using more powerful engines so that they each only need 80% of their power to handle our plane, or using three engines instead of two.

But we can’t just go bolting on engines like crazy: engines cost money, add complexity, and add weight, which makes the plane harder to get off the ground.

Now Replace “Engines” with “Servers”

Some disaster recovery plans call for two database servers: a primary server used for production, and then a secondary disaster recovery server at another site.  That secondary server is constantly refreshed with data from production – might be with log shipping, replication, database mirroring, etc.  So far, so good: we’ve improved the reliability of our production site, even though we’ve added complexity.

Later, management looks at that server sitting idle and says, “We can’t leave those resources lying around. Let’s use those for reporting purposes.  We’ll have reports run against the DR server, and that’ll make our production server much faster.”  Query loads grow over time, and before you know it, both of those servers are now production.  If even just the disaster recovery system goes down, we suddenly have a problem.

The only way a two-server disaster recovery plan is more reliable is if just one of the two servers is enough to power your application safely. Otherwise, you don’t have a disaster recovery plan: you have a pending disaster.  You have the insinuation of protection without enough actual protection.  Sure, your data will still be around if one server dies, but you won’t have enough horsepower to actually service your users.  In the users’ minds, that’s a failure.

To prepare for that disaster, do some basic documentation ahead of time.  Make a list of your environments, and note whether each DR server is purely DR, or if it’s actually turned into production over time.  Before disaster strikes, make a list of which user-facing services will need to be disabled, and which can remain standing.  Decide ahead of time whether to shut down reporting queries, for example, in order to continue to service other end user activities.

Now Replace “Engines” with “Drives”

RAID 5 protects your data by striping it across multiple drives and storing parity information too.  If any one drive fails in a RAID 5 array, you’re completely fine.  Pull out the failed drive, swap in a brand new one, and the RAID card will automatically begin rebuilding the missing data from parity data on the blank drive.  For more about this process, check out the Wikipedia article on RAID.

Hard drives have moving parts, and moving parts fail.  The more drives we add, the more likely we are to experience a failure.  We’re distributing the work across more drives, which increases performance, but it simultaneously increases risk.

When there’s a drive failure, the clock starts ticking.  We have to get a new drive in as fast as possible.  In order to reduce the failure window, enterprise systems use hot spare hard drives: blank drives that sit around idle doing nothing.  When there’s a failure on Saturday night at midnight (the universally agreed-upon standard time for drive failures), the raid array automatically uses these hot spare drives as their replacement and start rebuilding the array automatically. SAN administrators like hot spares, because they like doing other things on Saturday nights instead.

On Saturday nights, SAN administrators like to do karaoke at The Arbitrated Loop.
On Saturday nights, SAN administrators like to do karaoke at The Arbitrated Loop.

When they finally return to the datacenter on Monday to replace the dead drive with a fresh one, that fresh one becomes the new hot spare.  (Not all arrays work this way – I’m generalizing.  I can hear SAN admins typing their replies already.)

While the drive array rebuilds, the remaining drives are working harder than they normally would.  Not only are they handling their regular load, but they’re also simultaneously reading data to write it onto the fresh drive.  This means our hard drives are working overtime – just like the remaining engines in our plane scenario.

This becomes a tricky balance:

  • The more drives we add, the easier they can handle normal load from end users
  • The more drives we add, the more likely we are to have failures
  • But when we have failures, the more drives we add, the easier of a time we’ll have keeping up with the rebuilds
  • The larger the drives, the longer rebuilds take, which lengthens our time window for recovery

It’s just like planes: adding more stuff means managing a balance between cost, complexity and reliability.

The next time someone asks you to add more gear into your scenario or asks to take advantage of the disaster recovery gear that’s “sitting around idle”, it’s time to recalculate your risks and reliabilities.


SQL Server DBA Interview: Kendal Van Dyke

#SQLPass
0

Kendal Van Dyke is a Senior SQL Server DBA from Florida, and he’s been blogging since last year.  He’s syndicated at SQLServerPedia, and his recent series on RAID performance is a must-read.  I’ve talked back and forth with him a lot lately, and I figured I’d try something new: a virtual interview.  I emailed Kendal a list of questions, and here’s how it went down:

Kendal Van Dyke
Kendal Van Dyke

Brent: I think I recognize your avatar because you fragged me last night.  What made you decide to use your Xbox avatar as your online persona?

When I started getting really active in the online SQL community I didn’t have a decent headshot so I just used my avatar since I had already made one for XBox Live. It unintentionally turned into something recognizable since I used it everywhere. One of these days I really do need to take a good headshot though. I don’t think my avatar will be taken seriously by the PASS abstract selection committee, unless of course they think I really am a 6 foot tall cartoon.

Brent: I can tell by reading your series on RAID performance that you put a ton of work into it.  How many hours did the whole thing take, start to finish?

I didn’t keep track, to be honest! The testing was spread out over two months, with maybe 4-5 hours per week put into pushing buttons and rebuilding drives in different configurations. The hardest part was writing up the results. I put at least 4 hours of writing into each part of the series. All in all I probably put at least 50 hours into it, if not more. I’ve got more things in store that I hope to publish soon, too.

Brent: Blogging, when you do it right, is a part time job – if not a full time job.  It’s also the worst-paying job I’ve ever had, hahaha.  Why do you do it?

I gave a presentation at the first SQL Saturday in Orlando two years ago. As nerve racking as it was, I found that I really enjoyed sharing what I knew and I came to realize that I had more things to share that other people might be able to learn from. I also realized that there were a lot of one-off things that I wanted to document somewhere in case I ever ran into them again, and a blog seemed like the best way to kill two birds with one stone. I’ve had comments on some of my posts thanking me for helping solve a problem and that alone makes it worth it for me. It certainly isn’t for the money! haha

Brent: You got started blogging on your own web site.  If you’re trying to reach the most number of people, why do you blog on your own, instead of, say, using a popular group blogging site, writing magazine articles or writing books?

Bruce Campbell did a terrific commercial for Old Spice about experience that sums this up nicely. Although sites like SQL Server Central allow open submissions for articles, my impression is that writing for a magazine or a book – or even blogging on the bigger sites – is by invitation only. You don’t get an invitation unless people know who you are so you’ve got to build up your reputation somehow. Some people do that by answering forum posts. Others build up their social networks. I choose blogging and I ended up on Blogger because it was free, easy to get started, and I could figure out what to write about (and how) without any pressure. Now that I’ve got a year of experience I think I’ve gotten to the point where I can do those other things. Ironically, I don’t know that I want to blog on the bigger sites anymore – I worked hard to build up a personal brand and I don’t know that I’m ready to give up the control that I have over my own look and feel that having my own blog affords me. Instead, my next move is to take content that I would normally put on my blog, start publishing it in articles, and reference my blog if readers want to read more from me. That way I can maximize reach while still retaining the identity I built for myself. It’s a win-win for me. As for books, perhaps one day I’ll have gained enough recognition to be offered the chance to write something, but that’s still a long way out.

Brent: If you could give new bloggers any tips about the experience and how to get started, what would you say?

I’d start by reading your series on how to start a technical blog (I wish you had done that series before I got started, it would have really helped me!). Be patient because it’s going to take time to build a subscriber base and generate regular traffic. Having a pillar post really helps too. Before I wrote my disk performance series I was getting 10-12 visitors a day and now I’m over 100 per day just for that series. Also, don’t be afraid to ask other bloggers for feedback and help. I’ve found the blogging community to be very open when I’ve asked for advice.

Brent: You’re on Twitter as @SQLDBA.  How’d you find out about Twitter, and what made you start using it?

I’d heard bits and pieces about Twitter from news sites, blogs, etc, but I never paid much attention to it until the 2008 PASS summit. I didn’t get to go to the summit but I subscribed to the RSS feed for #sqlpass and got a kick out of what everyone was tweeting. That prompted me to sign up for an account and now I’m hooked!

Brent: The call to speakers is out for the PASS Summit in Seattle, and I know you’ve done some speaking.  Are you submitting any sessions this year?  I wanna encourage you since my favorite sessions always involve storage, heh.

Absolutely! One of my goals this year it to attend PASS as a speaker. And yes, the disk performance stuff will definitely be one of my abstracts. I haven’t submitted anything yet but when I do I’ll post the details on my blog.

Brent: Sounds good, looking forward to it!  Readers: swing by Kendal’s blog and check it out.


Open Letter to Non-Technical Friends with Windows Machines

12 Comments

First, I’m sorry about the virus you got.  Viruses, I mean, plural.  I totally don’t mean to kick you while you’re down, but we need to talk.

Yes, I’ll fix it.  Yes, I totally understand that you were just surfing the web for legitimate business needs.  No, I won’t look in your browser history.

Achewood Tells It Like It Is
Achewood Tells It Like It Is

It’s going to take me about four hours to back your stuff up, strip out the viruses, make sure everything’s working okay, run the latest updates, put a real antivirus program on there, put a safer browser on there, and teach you how to use it all.  Go ahead and grab a beer and a couple of Tylenol. No, not for you – for me.  I know you think I love doing computer work, but this isn’t exactly the part that calls to me.

Did you know the Geek Squad charges $400 for in-home service for this, and $300 at the store?  Ouch.  That’d pretty much erase the cost difference between the iMac I told you to get, versus the cheap Windows machine you picked up.  And that’s just one instance – and we both know this is gonna happen again, just like it did a couple of years ago.  Ah, you thought I’d forgotten that, huh?

Now remember, when I give it back to you, you can’t surf any suspicious sites with this thing.  I’ve done what I can by putting a better browser on there, but it’s always an arms race between the good guys and the bad guys.  If you go to bad web sites, odds are you’re still going to get infected, no matter how much I set up ahead of time.  Don’t download movies or music from web sites, because they’re probably not legit.  Don’t open movies people send you in email, no matter who it is – even if it says it’s me, it’s not really me.  I know you don’t understand how that works, but you have to trust me.  And don’t even think about going to any “adult” sites.  Maybe I could install virtualization on your machine and give you a separate instance, but I’m not sure that you’d be able to remember which window was safe and which one was dangerous.  I’d end up coming over to give you regular refresher lessons about how to use the thing, and I know you’d get frustrated because you don’t want to spend any time learning – you just want to go surf and play.

In fact, if you’re going to insist on sticking with Windows, and you’re going to keep your valuable business stuff on there like your accounting, what I’d really recommend is that you get two separate machines: one for business and one for…pleasure.  Although of course, even if you confine your “adult” surfing to the pleasure machine, you’re still going to get viruses on it, and you’re still going to rack up those $400 Geek Squad bills.

Or you could just buy a 20″ iMac for $1,000 or a Macbook for $1,300 and be done with it.  Your call.  You could even pick up a Mac Mini for under $500, but you have to bring your own keyboard, and quite frankly, you need a new keyboard.  This one is filled with stuff, and I don’t mean crumbs.

No, it doesn’t run a lot of software, but all you’re really doing is surfing the web, listening to music & movies, checking your email, and using Microsoft Office, and it’s great for all of those.  Best of all, you can go to all kinds of, uh, “web sites” and you won’t get infected, and I know how important that is to you.  Or at least, I imagine it would be if I’d checked your browser history.  Okay, look, I didn’t even have to check the history, because my anti-spyware stuff scrolled through a bunch of web site names as it was ripping out cookies.  We’re still friends, but let’s just say I’m wearing gloves when I have to touch your keyboard.

No?  You still wanna save money, eh?  Well, here’s where the bad news comes in: this is the last time I’ll fix your computer for free.  Next time, I’m pointing you to the Geek Squad.  I’m not spending my weekends fixing your computers just because you want some free pr0n.


Reasons Why You Shouldn’t Virtualize SQL Server

Virtualization
100 Comments

I’ve blogged about why you should virtualize SQL Server, but it’s not all unicorns and rainbows.  Today we’re going to talk about some of the pitfalls and problems.

When Virtualization Goes Right
When Virtualization Goes Right

It’s Tougher to Get More Storage Throughput

Servers connect to Storage Area Networks (SANs) with Host Bus Adapters (HBAs).  They’re like fancypants network cards, and they come in either fiberoptic (FC) or iSCSI varieties.  These components are the place to focus when thinking about virtualization.

If your SQL Server:

  • Has 2 or more HBAs connected to the SAN
  • Uses active/active load balancing software like EMC’s PowerPath to get lots of storage throughput
  • Actually takes advantage of that throughput

Then you’ll be dissatisfied with the current state of storage access in virtualization.  Generally speaking, without doing some serious voodoo, you’re only going to get one HBA worth of throughput to each virtual machine, and that’s the best case scenario.

If you’re running multiple servers on the same virtual host, the IO situation gets worse: it becomes even more important to carefully manage how many SQL Servers end up on a single physical host, and more difficult to balance the IO requirements of each server.

Never mind how much more complex this whole thing gets when we throw in shared storage: a single raid array might have virtual server drives for several different servers, and they can all compete for performance at the same time.  Think about what happens on Friday nights when the antivirus software kicks off a scheduled scan across every server in the shop – goodbye, performance.

No-Good Liar
No-Good Liar

It’s Tougher to Get Good Performance Reporting

Let’s look at the very simplest performance indicator: Task Manager.  On a virtual server, Task Manager doesn’t really show how busy the CPUs are.  The CPU percentages are a function of several things, and none of them are transparent or detectable to the database administrator.

Other virtual servers might be using up all of the CPU.

The virtualization admin might have throttled your virtual server.  They can set limits on how much CPU power you actually get.

Your host’s CPU can change.  Your server can get moved from a 2ghz box to a 3ghz box without warning.

And even if you dig into the underlying causes to find out what’s going on, there’s no reporting system that will give you a dashboard view of this activity over time.  You can’t look at a report and say, “Well, last Thursday my production SQL Server was hitting 100% CPU, but it’s because it was on a slow shared box, and on Thursday night at 5:00 PM it was migrated live over to a faster box, and that’s why pressure eased off.”

Not Everything Works As Advertised

Virtualization vendors have some amazing features.  We talked about vMotion and Live Migration, the ability to move virtual servers from one physical host to another on the fly without downtime.  While that does indeed work great, it doesn’t necessarily work great for every server in every shop.  If you’ve got a heavily saturated network, and your SQL Server’s memory is changing very fast (like in high-transaction environments or doing huge queries), these features may not be able to copy data over the network as fast as it’s changing in memory.  In situations like this, the live migration will fail.  I’ve never seen it bring the virtual server down, but I’ve seen it slow performance while it attempted the migration.

New features and new versions of virtualization software come out at a breakneck pace, and like any other software, it’s got bugs.  A particularly nasty bug surfaced in VMware ESX v3.5 Update 2 – on a certain date, VMware users couldn’t power on their servers because the licensing was expired – even if it wasn’t.  Imagine shutting down a server to perform maintenance, then trying to turn it back on and getting denied.  “Sorry, boss, I can’t turn the server back on. I just can’t.”  It took VMware days to deploy a fixed version, and in that time span, those servers just couldn’t come back on.

That’s an extreme case, but whenever more complexity is introduced into the environment, risk is introduced too.  Injecting virtualization between the hardware and the OS is a risk.

It’s Not Always Cost-Effective

All of the virtualization vendors have a free version of their software, but the free version lacks the management tools and/or performance features that I touted in my earlier articles about why sysadmins want to virtualize your servers.  The management tools and power-up editions cost money, typically on a per-CPU basis, and there’s maintenance costs involved as well.  If your virtualization strategy requires isolating each SQL Server on its own physical host server, then you’ll be facing a cost increase, not a cost savings.

Combining multiple guest servers onto less physical servers still doesn’t always pay off: run the numbers for all of your virtualization tool licenses, and you may end up being better served by a SQL Server consolidation project.  I did a webcast last year with Kevin Kline and Ron Talmage about choosing between consolidation and virtualization.  That information is still relevant today.

My Virtualization Recommendations for SQL Server

My recommendations are:

  • Virtualize only when it’s going to solve a problem, and you don’t have a better solution for that problem.
  • Get good at performance monitoring before you virtualize, because it’s much tougher afterwards.
  • Start by virtualizing the oldest, slowest boxes with local storage because they’ll likely see a performance gain instead of a penalty.
  • Avoid virtualizing servers that have (and utilize) more than 2 HBAs.

Why Would You Virtualize SQL Server?

Virtualization
16 Comments

I talked about why sysadmins want to virtualize servers, and today it’s time to talk about why you as a DBA or developer might want to virtualize some of your SQL Servers.

Virtualization Means Cheap High Availability

Yo momma was an early adopter.
Yo momma was an early adopter. That’s right – you’re adopted.

In a perfect world, every SQL Server would be a cluster on a SAN with instant failover.  This isn’t a perfect world, and we can’t afford to implement that level of protection for every server in the shop.

Virtual SQL Servers provide a level of high availability without the expense and hassle of clustering.  In my last post, I explained that sysadmins love virtualization because they can easily move guest servers around from one piece of hardware to another.  As a DBA, I love it for that same reason: suddenly, if the underlying hardware fails, I’m not sweating bullets.  If the RAID card fails, if the motherboard fails, if the memory fails, if the network card blows chunks, you name it, it’s not my problem.  The sysadmin simply boots up my SQL Server on another virtual host, and I don’t have to deal with screaming users.

I especially like this option for development and QA servers.  Management is rarely willing to implement expensive high-availability setups for development environments, but when the dev server goes down, the developers and project managers are kicking my door down.  The PM’s scream about how there’s so many expensive developers sitting around idle waiting for the server to come back up.  Virtualization avoids this problem altogether.

Virtual SQL Servers can React Quickly to Changing Needs

Capacity planning is hard work.  How much speed do we really need for a new application?  What happens if our estimates are wrong, and we suddenly need a whole lot more oomph?  What happens if we overbought, and we’ve got all this expensive hardware sitting around idle because the application never caught on with end users?

Virtual servers can be easily scaled up or scaled down.  Simply shut the virtual machine down, change the number of CPUs and memory it has with a few mouse clicks, and boot it back up again.  Presto change-o.

We can even move storage around.  Say we initially decided to put the server’s data on a cheap RAID 5 SATA array, and now our users have fallen in love with the application and they’re adopting it like crazy.  We can shut it down, move the virtual hard drives over to a faster RAID 10 SAS array, and boot it back up again.  Presto change-o, rapid storage performance improvements without long downtime windows to backup/restore.

I loved this approach because it let me keep old SQL Servers around as long as necessary.  For example, we decommissioned an old help desk database server, but I told the help desk folks they could keep it around as long as they wanted.  It hardly used any resources at all, because nobody queried it unless they had a question about an old help desk ticket that didn’t get transitioned correctly to the new system.  I set the server up with one virtual CPU, 1gb of memory, and put it on cheap RAID 5 SATA.  The end users loved me because I could provide the service for no cost.

Virtual Server Performance Isn’t That Bad

Virtualization has a really bad reputation because in the beginning, the software really did suck and admins didn’t understand how to use it.  Raise your hand if you installed VMware Workstation on your desktop years ago, and you tried to run multiple operating systems on a machine that barely had the horsepower to run Internet Explorer.  I was there, and I know how it was.

Things have changed: the software’s gotten much better, memory became absurdly cheap, and there’s resources to help use it better.

Busted!
Busted!

The 2008 book Oracle on VMware by Bert Scalzo is a good example.  Bert’s one of the Oracle experts at Quest, and he wrote this to help DBAs understand how to implement database servers in a virtual environment.  It does a good job of explaining enough about virtualization to make a DBA comfortable with the technology and avoid some common pitfalls.  Years ago, books like this weren’t available because the technology was changing so fast that best practices were wrong as fast as they were written.

You can still get really bad performance with virtualization, just as you can get really bad performance with SQL Server on native hardware.  But follow a few basic guidelines like avoiding oversubscription, properly configuring networking & storage, and using the right OS configurations, and you can get perfectly acceptable performance.  Yes, you’re still going to lose a few percent of overhead – but with today’s redonkulously fast CPUs and memory, I’m more concerned about optimizing T-SQL code than I’m concerned with losing 5% of my CPU speed due to virtualization.  It’s all about priorities, and that small performance loss is just one piece of the picture.

Virtualization is All About Give and Take

The economy has gone down the toilet.  Virtualizing non-mission-critical SQL Servers gives us the chance to cut costs without cutting service levels, and furthermore, gives us the chance to show management that we’ve got skin in the game.  If we insist that every single SQL Server is too important to be virtualized, and that there’s no way we can save money, then management just might look at us as part of the problem.  Be part of the solution: be proactive, find SQL Servers with low utilization levels, and offer them up as the first phase.

My approach was to start with the SQL Servers that I frankly didn’t really want to survive on their own anyway.  I had one particular application that had always demanded its own standalone SQL Server with no other applications on it.  The server was barely used, and it was a completely out-of-date and unsupported version of SQL Server.  The application manager wouldn’t allow me to patch it, wouldn’t upgrade to a newer version, and didn’t have the budget to move to newer hardware.

Virtualization administrators have slick tools like VMware Converter and Vizioncore vConverter to automate the physical-to-virtual (P2V) process.  These solutions take a backup of the physical server and clone it into a brand-new virtual server.  When the virtual server starts up, the new virtualization drivers are installed automatically, and the server is up and running smoothly without time-intensive application reinstalls.  I’ve blogged about virtualizing SQL Servers with P2V and I’ve performed it many a time – it’s a great way to get rid of hardware dinosaurs.

Even better, this whole process doesn’t require work from the DBA.  The time and effort is all put in by the virtualization administrator.  If your sysadmin is dying to try virtualizing a SQL Server, you can point them at your newly chosen dinosaur and tell ’em to have at it – you don’t have to be involved.  That’s the best of both worlds: management will see you as a part of the solution, but you won’t actually have to do any work.

My Best Practices for Virtualizing SQL Server on VMware


More On the Carbonite Backup Failures

Backup and Recovery
11 Comments

David Friend, the CEO of Carbonite, left a comment on my blog entry about Carbonite.  I’d like to applaud his efforts for taking the time to do that, but the comment raised some ugly questions.  It appears that they were putting data on 15-drive RAID 5 arrays.  RAID 5 is the most cost effective array setup (other than RAID 0, which offers no data protection).

RAID 5 will only tolerate a single drive failure – if you lose more than one drive in an array, the whole array is gone and must be restored from backup.  As SATA drives grow larger and larger in capacity, they take longer and longer to rebuild when one goes bad, because so much data must be copied over to the new drive.  If a second drive fails while the first one is being rebuilt, you’re completely out of luck, and must restore from backup.  The more drives you add in a RAID 5 array, the riskier it gets, because it’s more likely that one of the drives will fail in the time span of the rebuild.  That’s pretty dangerous for a company that makes its living off your backups being available.

Worse, it isn’t clear from the interviews I’ve seen that Carbonite actually backed up your data somewhere other than those RAID 5 arrays.  It appears that their attitude towards backup was that YOUR machine held THEIR backup: when they ran into problems with their RAID arrays, Friend commented that:

“Carbonite automatically restarted all 7,500 backups…”

Meaning, they started getting data again from their clients’ machines, not restored them from tapes or other arrays.  This is further evidenced by interviews that the Enterprise Storage Group conducted with Friend last year before the lawsuit came out, as blogged by Steve Duplessie of the Enterprise Storage Group. When asked how Carbonite protected their backups, Friend replied that:

“This is backup – not archiving, so if that ever happened you’d still have your data on your PC.”

Like Bryan Oliver says, the only reason we do backups is to do restores.  If the answer to restore problems is to use your live server, that’s a failure.  Carbonite didn’t protect against regional internet outages, either:

“Regional internet outages (we use multiple redundant carriers) would take us offline if they all failed. But again, unless you were in the middle of a restore when it happened, you’d probably never notice.”

Translation: if you aren’t using our services, you’ll never notice when we’re down.  How about replication from one site to another?

“…we don’t replicate data across multiple sites. The likelihood of losing data because of software bugs or human error is probably orders of magnitude greater.”

Errr, not sure why he’d say that, since Carbonite wasn’t protecting against data loss due to bugs either.

I can see where Carbonite’s coming from: they view themselves as a cheap way to protect data, and it works most of the time.  That’s the same approach I take with Amazon S3 cloud storage for my personal backups, incidentally: for a few bucks a month, it’s a cheap insurance policy.  I have my data replicated across my laptop, my Time Machine external hard drive, and my VMware server.  If there’s a house fire, my stuff might be on Amazon S3.  It also might not.  Amazon hasn’t made me any promises about the safety and security of my data.

My problem with Carbonite’s approach is that they seemed to take my personal data backups even less seriously than I do, maintaining a single copy in a single place.  For all I know, maybe Amazon’s using that exact same approach.  Only time and lawsuits will tell.

For more commentary on the Carbonite problems, check out Steve’s blog post, “Head in the Cloud? Or Just up your……..?” (And yes, he apparently stole that from Wilbur.)


New PASS 2009 Summit Site Unveiled – #SQLPass

#SQLPass
0

PASS is on the ball this year!  They’ve got the web site and spiffy logo up and running for the 2009 Summit in Seattle.

PASS Summit 2009 Logo
PASS Summit 2009 Logo

Looking through the site, one change jumps out at me: the schedule this year will be Tuesday through Thursday, with pre-conferences running on Monday and Friday.  In the past, the summit has run Wednesday through Friday, with extra-cost, day-long pre-conference sessions on Monday and Tuesday.

I love this change.  If you only want to attend one pre-conference session, now you don’t have to worry about possibly spending an extra day sitting around.  Last year I had a pre-con on Monday, but not on Tuesday.

Plus, it seems that a lot of people want to fly out on Friday afternoon to make it home to their families.  Now, you can do that without missing any of the good conference sessions.

It’s a tough economy, and I’m curious to know how many people were originally planning on going to the summit but are thinking about bailing due to the costs.  Last year, I wrote a post about justifying the costs of the PASS Summit, and I’m just as convinced today as I was back then.  If I hadn’t gone to the PASS Summit, I wouldn’t have met Jimmy May, who got me involved in my book deal.  I’m not saying you’ll get a book deal if you go to PASS, obviously, but good things happen when you put yourself out there.

To stay informed with the latest PASS developments, follow @SQLPass on Twitter, and if you use an RSS reader, subscribe to the Twitter search for #SQLpass tweets.


Why Your Sysadmin Wants To Virtualize Your Servers

Virtualization
14 Comments

DBAs and developers: right now, your Windows admin is talking to management about all the time, money and hardware he can save by virtualizing servers.

He might not start with SQL Servers first, because frankly, he’s scared of database administrators.  DBAs only have one answer: “No.”

Your sysadmin knows you will say no.
Your sysadmin knows how you roll.

Thing is, though, he’s going to start by eating his own dogfood: he’s going to virtualize the servers he manages himself, like file & print servers, secondary domain controllers and DNS servers.  He’s going to prove that it makes sense, and he’s going to get excited about how much easier it makes his job.  He’s going to sell it to management, and because it does make sense for servers like that, they’re going to get excited too.  The next thing that happens is they’ll mandate virtualization across the board – including your precious SQL Servers.

I know, because I used to manage VMware servers, and yes, we had some virtual SQL Servers.

To understand why it happens, it helps to be armed with knowledge about why virtualization really does make sense for a lot of servers.  Today, I’m going to cover some of the more popular benefits.

Virtual Hardware Drivers and Firmware Aren’t Tied Into the OS

Virtualization abstracts the hardware away from the operating system.  When you run VMware ESX on an HP BL460c blade, for example, the virtual servers running inside ESX don’t need any HP drivers.  Instead, they use a set of VMware video, network, and storage drivers.  These VMware drivers are the same no matter what kind of video card, network card or storage adapter ESX is hooked up to.

This enables sysadmins to shut a virtual server down, copy it to another host server, and start it up without going through a messy driver installation.  It just boots right up as if nothing’s changed.  Some restrictions apply: some older virtualization platforms and some older servers may still install tweaked drivers, like if you move from an AMD to an Intel server or vice versa.  Newer families of processors and newer versions of virtualization software do a better job of hiding these differences from virtual guests.

When sysadmins don’t have to hassle with hardware drivers and compatibility, it makes troubleshooting and maintenance easier.  The more servers in the shop (especially servers of different brands), the easier this gets.

In addition, this makes hardware changes easier.  When servers get old, their annual maintenance fees from the manufacturer increase.  Eventually, it becomes more cost-effective to buy a new server than it does to continue paying maintenance on an old, underpowered server.  Normally this would mean time-intensive reinstalls (or risky backup/restore tricks), but with virtualization, the sysadmin can simply shut the old server down, move the virtual guests onto faster/newer/cheaper hardware, and start them back up again.

Even better, if you’re running virtualization with SAN-based servers, you can move virtual guests around without even shutting them down.

Move Virtual Servers Between Hosts On The Fly

This concept is key to a lot of the benefits I’m going to describe later.  I know you’re not going to believe this, but it really does work.  For VMware ESX users, it’s been working for years.  It requires that the VMware hosts use a SAN, and all servers need to be able to see the same storage so that any virtual server can be started up on any host server.

Here’s how vMotion works in a nutshell:

  1. The VMware admin right-clicks on a guest server and starts the migration process by picking which new host server it should run on.
  2. VMware copies the contents of the server’s memory over the network to the new host, and keeps them both in sync.
  3. When they’re identical, VMware transfers control over to the new host’s hardware.
  4. The guest server is now running on a completely different server.

When it’s done right with high-speed networking and fast servers, the handoff can be completely transparent to end users.

If you haven’t seen vMotion in action, again, I know you’re not going to believe me, but trust me that it works.  I vMotioned servers all the time from one host to another and nobody had a clue.  Well, granted, some of my users were clueless anyway, but that’s another story.

This ability to slide virtual servers around to different hardware at any time, in real time, without taking the virtual server down, opens up a world of benefits.

Do Routine Maintenance Tasks During The Weekday

Even though virtualization removes problematic hardware drivers from the guest OS, there’s still hardware maintenance to be done.  The sysadmin still has to update firmware, update the bios, fix broken hardware, and move hardware around in the datacenter.  The difference is that with virtualization, the sysadmin can do these tasks on Tuesday at 10am instead of Saturday at 10pm.  He can simply evacuate all of the virtual guests off a server in real time, then take his time doing the necessary maintenance work during the day while he’s chock full of coffee.

I gotta confess: as a DBA, I’m jealous of this capability.  Sure, in theory, SQL Server 2005’s database mirroring meant that I could move databases from the production server to a secondary server with a minimum of downtime, but it’s still very noticeable to connected applications.  The connection drops, transactions fail, and my phone rings.  I long for the day when I can move databases around the datacenter undetected.  Until then, I’ll be coming in to work on Saturday nights.

Better Utilization Rates on Cheaper Hardware

Just in case you’ve been busy reading SQL Server Magazine and ignoring Fortune, we’re having a little problem with the economy right now.  There isn’t one.  (By the way, you might want to check the rest of your mail too, because Bernie Madoff has some bad news about your account.)

Utilization isn’t a metric DBAs normally deal with, but for sysadmins, it means the percentage of horsepower we’re actually using on our servers.  To see an oversimplified version, go into Task Manager on your desktop and look at your CPU utilization rates.  It’s probably low, and it’s probably low on your SQL Server as well.  If you’re averaging 10%, stop and think for a minute about what it might be like to use 1/10th the number of servers and still have enough power to get the job done.

Granted, there’s issues with peaks and valleys, and we have to make sure all servers don’t need full power at the same time.  But if we pooled enough resources together, we could still easily cut the amount of hardware in half and still be way overpowered.

Virtualization enables sysadmins to do this because they can run more virtual guests per physical host.  It’s not uncommon to see 8-15 virtual servers running on a single $10,000 blade server.  More servers in less space, using less power, requiring less cooling and less networking gear – it’s a pretty compelling story to companies looking to cut costs. Back when I worked for Southern Wine, I wrote an HP c-Class Blade Chassis review, and you might find that interesting.

Would you rather have ’em cut hardware costs, or cut people costs?

Easier Capacity Growth & Planning for Virtual Servers

When servers share resources in groups, it’s easier to reallocate hardware and plan for growth.  When I managed our VMware farm, I had enough capacity that I had wiggle room.  If someone needed a testbed server for a few weeks, I could easily spin up a new virtual server for them without stress.  I didn’t have to hunt around for leftover hardware, find space in the datacenter, get it wired up, and so on.  I simply deployed a template server (a simple right-click-and-deploy task with VMware, happened in a matter of minutes) and added it to the domain.  When they were done, I deleted the virtual server.  It’s a great way to win friends and favors.

At budget time, I didn’t have to take a magnifying glass to every single server to figure out exactly how it would grow and how much money I’d need to spend.  I just budgeted a ballpark number for growth, and incrementally added servers to the resource pool over the year.  When my reseller could cut me a great deal, I picked up new blades.  I wasn’t forced to rush purchases at the last minute without negotiating for good prices.

So Why Should You Care About Virtualization?

As a DBA or developer, most of these benefits don’t really matter to you.  You don’t care whether your sysadmins have to work weekends, or whether they have an easier job managing capacity.  I just wanted to explain to you why they’re going to push virtualization, because there are indeed some real benefits for sysadmins.  There’s benefits to you too, but your sysadmin isn’t going to realize those, and isn’t going to sell you on ’em.  In my next post, I’ll explain why you might want to virtualize some of your SQL Servers.

My Best Practices for Virtualizing SQL Server on VMware

I’ve got a 5-hour video training session on how to manage SQL Server in VMware and shared storage. Check it out now!


Getting me to speak at your user group

0

I just updated my Upcoming Events page to include:

  • More upcoming events
  • How to get me to speak at your local user group meeting
  • Links to video archives of my past events

To quote Abraham Lincoln, for those who like that sort of thing, I should think it is just about the sort of thing they would like.


Dev, Test and Production SQL Server Environments

In a perfect world, my test/QA servers get restored nightly from production. Let’s say every night at 9pm, the production full backups kick off, and they’re finished by 10pm. At 11pm, the QA box kicks off a restore job that grabs the latest full backups off the file share and restores them. The production backups are written to a file share, never local storage, so there’s no additional overhead on the production box for this restore process.

Now when QA tests deployment scripts in the QA environment, they can be reasonably sure that the scripts will perform the same way in production.  Sometimes developers and QA staff assume that production is exactly like development: they’ll make small tweaks over time to the dev server and schema, and their deployment scripts will fail because the production server is missing those tweaks.  If the QA server is restored every night, that possibility of failure is reduced.

Developers love this fresh-QA-box approach because it gives them fresh data every morning for testing.  Want to find out how a new query would perform with real production data?  Development environments are rarely refreshed from production because people might lose work, but an automated QA restore system means there’s always a non-production box with the freshest data possible.

The next perk is that now I’ve got an automated fire drill restore system going.  Every night, I know for sure that my production backups worked.  Unfortunately, this can also bite me in the rear: if my production backups fail for some reason, then my QA system is unusable until I fix it.  Backup failures suddenly become much more time-sensitive, and I may not have enough time to rerun the production backups.  To prevent problems, I recommend keeping at least two days of backups online so that you can restore from the previous (and successful) backup if the normal one fails.  I wasn’t ever ambitious enough to implement an automated restore-the-second-backup script because my production backups so rarely failed.

This approach doesn’t work if:

  • You’ve got sensitive data in production (credit card data, medical data, and other things you don’t want developers to have free reign over)
  • You can’t restore your production database fast enough (in which case, you probably want to work on tuning that anyway in order to meet your RPO and RTO)
  • Your QA box isn’t big enough to handle production (in which case you don’t know that dev code will scale to production anyway)
  • Your QA team needs 24/7 access to the QA box (in which case, I have questions)
  • You don’t do full backups nightly (if you’re doing diffs, you’ll just need a more complex auto-restore script like sp_DatabaseRestore)

Never Restore QA From Development

Some developers like to have a QA server that they restore from development periodically, and then tell the QA people to test their work on the QA server.

The only time I allow that scenario is when they’re going to take the same approach with the production server.  If their production upgrade plan is to take down production and restore the dev database onto it, then restoring QA from dev is a valid approach.  Otherwise, they’re cheating!

Restoring QA from production forces the developers to script out their deployments instead of point-and-click-table-changes in SSMS.

The Perfect Test/QA Server is Identical to Production

Keeping identical sets of CPUs, memory and storage in both environments means that you can run performance testing with confidence before a new dev query knocks the production server over.  This means query execution plans on each environment should be identical.  If the QA server has a different CPU count or memory size, your query plans can be different, making it harder to predict production performance.

For example, I worked in one environment where the servers were configured as:

  • Production – dual quad-core CPUs (2×4, 8 cores total) with 128GB RAM
  • QA – four single-core CPUs (4×1, 4 cores total) with 16GB RAM

Unfortunately, when the SQL Server engine builds an execution plan for a query, it’ll build differently based on the number of CPUs and the amount of memory.  In this environment, the queries and indexes were tuned for the QA box, but production performance never seemed to match up with their expectations.  Keeping identical hardware and identical database configurations mitigates that risk.

In the real world, we can’t always afford to keep these two environments identical, especially in infrastructure-as-a-service cloud providers when we’re paying every moment that a resource sits idle. Everything’s a compromise. The closer you can make test/QA to production, the more accurate your testing will be – but the more you’ll spend doing it.


Another backup failure: Carbonite

Backup and Recovery
32 Comments

TechCrunch reports that Carbonite, an online backup company, lost customer data.

But wait, this is different: it’s not their fault.  They’re suing Promise Technology, makers of popular storage gear, for selling them bogus equipment.  Bogus equipment?  You mean, like hard drives that fail?  That’s horrible!  Who could expect something like that?  Who could know about the dangers that lurk around every corner?

The Statistics are Staggering Alright
The Statistics are Staggering Alright

Carbonite’s web site warns, “You need to be aware that losing your most valuable files is a very real possibility.  You need to take proper precautions.”

Who knew they were referring to their own services?

Don’t point and laugh and say it could never happen to you because you do your own backups in-house, because I’ve seen too many backup strategies fail for too many reasons.  For the love of your own job, never mind your company’s revenue stream, take some time this week to:

  • Automate your backup testing – build a set of T-SQL scripts to automatically restore your production databases onto another server.  Restore a different server every day onto the same target testbed box.
  • Test your backups manually – if you don’t have the time to script the tests, just go run a restore of your largest backup.  Ideally, check the ones that hit tape, because those are the most risky.
  • Check every server’s job logs – I’ve seen so many cases where backups stopped working on a SQL Server, and alerting had also long ago stopped alerting.  These two failures are a 1-2 punch to the jaw of your career.
  • Find your single points of failure – if you’re relying on a single cloud vendor for all of your data protection, that’s a risk.  If you’re backing up straight to tape and you’ve only got one tape jukebox in-house, that’s a risk.
  • Figure out who you’re going to sue – because hey, work is hard.  If you can’t do it right, get rich trying.

Getting Help With A Slow Query

Execution Plans
3 Comments

StackOverflow users often ask, “I’ve got this query that runs slow – can you help?”  Here’s a few tips to get better, faster answers.

Get the query execution plan.

DBA Porn
Database diagnostics goodness

In SQL Server Management Studio, when you’re looking at the query editor window, click Query, Include Actual Execution Plan.  It’ll show a fancy-pants flow chart diagram thingy that gets experienced DBAs all excited. Run the query, and you’ll get a new Execution Plan tab in the results.

Right click on the execution plan and click View XML. Copy/paste those contents, and then go to PasteThePlan.com. There, you can upload your query plan to share a link with the public.

This diagram does not include the results from the query (like your customers or sales data), but it does include information about your database schema (tables, indexes, views).  If your schema is vitally secret, well, frankly, you need to hire a DBA, because you’re also probably not encrypting anything.  But I digress.

In your StackOverflow question, include the link to this query plan.

Run sp_Blitz to generate info about your server.

sp_Blitz is like a health check for your server. Run it like this:

sp_Blitz @OutputType = ‘markdown’, @CheckServerInfo = 1

This generates a text-friendly health check that you can copy/paste into your Stack Overflow question to give people an idea of the kind of server hardware you’re dealing with.

What happens if…

If a query takes 5 minutes to run, then how long does it take to run if you immediately execute it again with exactly the same parameters?  If it takes 5 minutes again, then we’ve got a different set of problems than if it takes a few seconds the second time.

If the query runs slowly from your application, then try running the exact same query from SQL Server Management Studio.  If you’re running a query with dynamic SQL, like sp_executesql, try running it just by itself.  Copy the string out, paste it into a new query window, and run it.  Is it faster?  Does the execution plan look different?  If so, grab that and upload it as well.

If the query uses parameters, and it’s slow with some parameters but not others (like some customers run fast, but other customers run slow), then it might be a plan problem or a statistics problem.  Run it with both sets of parameters (the fast and the slow) and include both sets of execution plans with your question.

Check your Windows event logs.

On the SQL Server, go into Control Panel, Administrative Tools, Event Viewer, and look at the System and Application logs.  If you see any alerts that aren’t “Informational”, you might have a problem on the server.  Examples include RAID array rebuilds due to a failed hard drive, memory failures, SAN controller errors, or an application that keeps crashing.

Sometimes, even Informational messages point to things that are making your query run slow.  For example, I’ve seen instances where people complained about slow query speeds off and on through the day, and I found out that the antivirus software on the server was doing frequent definition updates.  After each update, Symantec/Norton Antivirus does a scan of memory and a few files, which briefly brings performance to a crawl.  This doesn’t show up as an error, but just as an informational message, but it’s a problem for you.

I wouldn’t post these on the internet, though, because they can include some detailed information about your server.  If you’ve got questions about a specific event, it’s best to take a screen shot of that one event’s details and just post that.

Follow up with your question fast.

After you post the question on StackOverflow, set yourself a one hour timer with your phone, your computer, your microwave, whatever.  After an hour, go back to the site, and answer any and all comments about your question that have popped up since.  People will often need more information to solve your problem, and you want to catch ’em while you’ve still got their attention.  Continue this for the rest of the first day (revisiting hourly) and then set yourself a to-do item in your task list to go back each morning and follow up again.

I’ve seen so many questions that get answered, but the original questioner never revisits the site to find out what the problem was.  Even if you solve it on your own, at least go in there and make a note of that so that other people aren’t pulling up your question over and over to read through it.


PASS Summit 2009 Call to Speakers Open! #SQLPass

#SQLPass
3 Comments

The Professional Association for SQL Server Summit is being held again this November in Seattle, and the Call to Speakers is now open.  You can only submit 4 sessions, which seemed very small to me until I realized that I do entirely too many presentations.

In Twitter, somebody mentioned that it’d be a good idea if we could see what abstracts had been submitted so that we didn’t all try to cover the exact same topics.  With that in mind, I’m listing my abstracts here, but lemme tell you something: don’t let my abstracts hold you back from submitting sessions with the same topics as me.  Well, not the exact same – at least change the wording, maybe fix my typos.

Famous or Infamous? Turn Your Brand Up to Eleven

Blogging and Twittering aren’t just social distractions: they can be instrumental in your career by helping you to make more money, get yourself in front of the right clients and put you ahead of other job candidates.

Brent Ozar has been blogging at BrentOzar.com since 2002, and it paid off in 2008 when he was hired at Quest Software as a SQL Server Expert. He’ll explain how blogging and Twittering helped his career, and why he believes social networking and brand-building will be critical in the coming years.

Tom LaRock started blogging at SQLBatman.com using Brent’s guidelines. He’ll act as a devil’s advocate, and help draw the line between zealous online marketing and practical tips for people who make a living doing database administration, not blogging. He’ll explain what parts are easy for DBAs to do, and what parts require time and attention.

Session Goals:

  • Learn how to start and configure a blog and a Twitter account
  • Learn how to position yourself on the Internet and get noticed in all the noise
  • Hear real-world stories about when blogs turned from positive assets into dangerous liabilities

Yes, I’m Actually Using The Cloud

There’s a lot of hype around cloud-based databases. After you get past the knee-jerk reaction about security, what else matters? Is it time to buy in, and what should you watch out for? Brent explains some of the pros and cons hes experienced running SQL Servers in the cloud, and will demonstrate how easy it is to fire up a new SQL Server in the cloud.

Brent’s involved with StackOverflow.com as an advisor, and he’ll talk about the decisions they made about whether to host production and/or disaster recovery servers in the cloud.

Session Goals:

  • Learn to estimate an application’s costs in the cloud
  • Learn options for cloud-based disaster recovery
  • Learn how to talk to developers & managers about cloud database options

DRP101: Learn the Difference Between Your Log and Your Cluster

Developers and accidental DBAs: if you know more about how SQL Server handles crashes and disasters, you’ll be able to make a better decision about how to prepare. In this session, Brent will cover all of SQL Servers backup and high availability options at a high level, including clustering, log shipping, mirroring, replication and more. Hell show the pros and cons of each, and teach you how to pick the right method for your application.   We won’t have enough time to dive into actual implementation demos due to the number of solutions we’ll cover, but we’ll show screen shots and give links to the best resources for each method.

No prerequisites!  This session is targeted at DBAs and developers who don’t know their cluster from their logs.

Session Goals:

  • Learn the difference between high availability and disaster recovery
  • Learn real-world drawbacks of each solution
  • Learn which methods complement each other for even better protection

Social Networking for IT Professionals

Jason Massie and I are co-submitting this one, and we’re putting the abstract for this together over the weekend.

I find this hilariously appropriate because Jason & I met each other via Twitter, started a web site together, and I’ve never even met the guy.  If this session gets approved, we’re going to be meeting for the first time at the very conference where we’re going to speak about social networking!  It’d be even funnier if we pledged not to see each other face to face until the session is about to start, but I dunno if I can go that far.  Only if the session is scheduled for the first day.

Submit Your Abstracts Today!

I submitted abstracts last year and got turned down.  At the last minute, a few speakers couldn’t make it, and PASS asked me if I could go forward with one of my presentations.  For me, that was the best possible thing to happen: I got the privilege to speak at PASS, but I didn’t have any of the worries ahead of time like polishing my presentation over and over to make sure it was good enough.  They were just happy to have me speak.

I’m hoping that same approach works this year: I’ve asked the selection committee to please turn my sessions down, and I’ll write them anyway in hopes that another speaker gets food poisoning or hit by a bus.  In order for this plan to work, I need other speakers to be approved, and that’s where you come in.  Give it your best shot – I’m counting on you.  Thanks.  I’ll send flowers.


Should I Specialize?

27 Comments

A reader asked a great question, and I just had to share it in a more general form because it applies to so many of us production database administrators.

He’s currently a production SQL Server DBA, and his company’s bringing in a new application with a SQL Server back end.  He’s doing a good job as a production DBA, so the company offered him the chance to be the application administrator.  He’ll need to learn the application and specialize in it, and he won’t be doing as much – if any – SQL Server work in his new role.  Should he take it?

So many applications offer this same opportunity for database administrators: accounting software, sales force automation systems, document management software, I could go on and on.  For me, the choice boils down to a few basic questions.

Can you bet your career on this application?

Around 2001-2002, my company was preparing to retool our software from scratch, and and we had to decide between Java and .NET.  A few of us went to training classes for both platforms, did some basic development in each, and held meeting after meeting debating the pros and cons of each platform.  None of us had a really clear feeling about what the Way of the Future would be, but we started to realize that none of the options out there were the right answer.  It was entirely possible that ten years down the road we’d be retooling everything again due to unseen weaknesses in the platform we’d chosen.

No matter what language we chose, it would have been the third programming language I’d learned in my short programming career (not including things like batch files or HTML).  I realized the rest of my development career would be a continuous cycle of:

  1. Learning a language
  2. Getting good enough at it to build good applications (by building several bad applications first)
  3. Maintaining my initially bad applications, and rebuilding them with the new best practices for the language when things went wrong
  4. Deciding which language would solve the new business problems coming down the line
  5. Go back to step 1

Ugh.  I bailed out of programming and focused on database administration because ANSI SQL works not only in nearly every current programming language, but even on most database platforms.  I love to learn, but I don’t really love to learn new languages.  SQL Server has paid off for me: it’s still phenomenally popular, and I don’t see it going away in ten years.  The fact that SQL Data Services is switching from an abstract XML-style data storage over to standard table-style data storage is yet more evidence that things are looking good in the DBA arena.

SQL Server is a gateway drug: you can branch off from there into all kinds of hardcore stuff.  However, the more specialized you get, the more of a risk you’re taking that your new specialization will disappear over time, and the tougher you’ll find it to step back down to your old SQL habits.

If you become, say, an SAP BI administrator, you’ll make a lot of money today, but if that platform fades out, the knowledge you’ve gained won’t help you much in another job.  You’ll be forced to find another BI platform and learn a completely different way of doing things.  Going with a massively mainstream app like SAP is a relatively safe bet, but picking a smaller niche vendor is a riskier bet.

Is it easy to get training on the app?

Learning to be an Engineer
Learning to be an Engineer

Wanna specialize in web design?  Open the phone book, call your local community college, and they’ve probably got a series of courses you can take dirt cheap.  If you want to learn faster, you can call a place like New Horizons and do a couple of week-long boot camps.  Presto, you’re a designer.

If you learned the necessary skills to perform your job in less than a month, then so can anybody else.

Goodbye, job security.  As evidence of that fact, I give you 99designs.com: a site where people like me who can’t draw a stick man shell out some money, and graphic designers all over the world compete to build the best logo/site/design/etc.  Yesterday, I forked out about $250 for a logo for a site I’m working on, and already today I’ve got dozens of designs, some freakin’ amazing.  The contest goes on for an entire week, and during the whole time, I’m interacting with each of the designers giving them advice and tips on how I’d like their logos to be improved.

The tougher it is to get training, the more staying power your career will have.  It’s a balancing act, though: a lack of training options might also indicate that the market sees the technology as lacking staying power.

Do you like the team you’re going to work with?

You’re going to be using training wheels for a year while you get up to speed on the new technology.  Is your new team going to be supportive and tolerant of your mistakes and your downtime?  Are you going to enjoy working late nights with ’em when you’re under a tight deadline?  After years of work as a DBA, you might be accustomed to getting big things done in a short amount of time, but you may not have that luxury with your new application role.

I’ve had to make these decisions more than once.  Sometimes I’ve branched off briefly, like when I learned SAN administration and VMware administration.  As it happens, I’ve only gone down those routes when I felt it would make me a better database administrator, and I’ve come back to my roots each time.  If things were just a little different, I might have stayed with either of those technologies – and I bet they’re both going to be great career choices too.  It’s a fun decision to have to make!


Free SAN 101 video from the SSWUG Virtual Conference

8 Comments

I just flew out to Tucson to film my sessions for the upcoming SSWUG Virtual Conference.  It’s a pretty cool setup – SWUG has a full blown TV studio in their offices, and it’s really professional.  (Trust me – I know when something’s unprofessional.)

Invisible Cheezburger
Professor Ozar

The best way for me to explain it is to show you what a session looks like: here’s my SAN 101 presentation from last year.  You can watch it in its entirety for free.  You do need a high-speed internet connection – it doesn’t work well with slow connections like the one here in my hotel room.  This year, they’ve upgraded their cameras to high definition, too, so you can get better views of code-intensive demos.

If you like that SAN 101 presentation, then check out the full list of SQL Server abstracts.  There’s some fantastic speakers in there, top notch people, and it’s a heck of a deal for around $100.  In this economy, that’s the most cost effective way to get trained.  Show your boss my SAN 101 video to demonstrate the quality of what you’re getting for your money, and then point at the full list of sessions.  It pretty much sells itself once you see the video quality.

Register for the SSWUG V-Conference now and use VIP code SPVBOZSP09 for $10 off.  That coupon code can be combined with other discounts, too, like the early bird registration or the alumni registration.


Real SQL Server in the cloud is coming. Fast.

1 Comment

So the news is out – full-blown SQL Server is coming in the cloud – and it’s time to look back at history.

When I got started in IT, Novell was the dominant server platform.  If you wanted a “real” server, you got a Netware box, and you managed it with a keyboard.  None of this sissy mouse stuff.

Over the years, Windows took over as the default small business server platform.  It wasn’t as stable, wasn’t as fast, wasn’t as robust, but it had a killer advantage.  The people who wanted it could deploy it themselves without IT help.  Granted, when they grew and needed help, then they still needed IT, but your average mid-level manager could pull a computer out of the box, install Windows, and have some basic file and print sharing without any scary, hairy command line stuff.

When you read about the ease of deployment for SQL Data Services, think back to the Novell and Windows battle.  Sure, database administrators like us will beat our chests and brag about how our big, powerful, capital-intensive servers in the datacenter have all these cool features at our fingertips.  We’ll say we’re more reliable, more secure, more complete, and like BetaMax and Al Gore, we should obviously win the war.

But out there – out in the real world, where developers build applications on their own, on their desktops, wishing they could go to market without any scary, hairy SQL stuff, they’re going to be able to do it.

Read more about the SQL Data Services release on their blog.


Book Review: SQL Server 2008 Administration in Action

Book Reviews
1 Comment

Let’s start with the obvious: yes, there’s a man on the cover smoking something.  It’s probably not a coincidence that when I started writing this review, the iTunes Genius started playing Poison’s “Nothin’ But A Good Time.” There’s a reason they call it the Genius, and yes, I do have Poison in my MP3 collection, and yes, I paid for it.  (It was in my CD collection back when I listened to physical media, and it was in my tape collection before CDs came out.)

Sssssmokin!
Sssssmokin!

It’s Online, But It’s Not Books Online

Rod Colledge covers a wide range of material here, but the surprising part is that it doesn’t read like a copy/paste of Books Online.  Seems like the thicker a book is, the more it feels like a copy of BOL, and I can certainly understand why – it’s tough to produce a big volume of material with a personality of its own.  Big books require multiple authors, and then sometimes editors set about stripping the personality out to make it blend together.  Nothing against Books Online – I rely on it all the time for help with syntax and minute details.  Thing is, I don’t want to read a book cover to cover when it’s full of syntax and minute details.  Right now, Manning is offering an Early Access Edition via PDF, and I hope the casual language survives the editing process.

If anything, this book is probably a little light on syntax and implementation details – for example, Instant File Initialization is covered, but not in enough detail to explain step by step exactly how to configure it in Windows.  I don’t have a problem with this approach: I like reading books to understand concepts, and I rarely sit with the book propped open next to me and type the code in off the printed page.  I’m fine with doing a quick web search to get the exact content I need from sites like BOL.

Target Audience: Production Database Administrators and Performance Tuners

I recently write a book review of SQL Server 2008 Management in Action by Ross Mistry and Hilary Cotter, and I called it a great book for production DBAs and accidental DBAs.  I would categorize this book differently: it’s more focused at production DBAs who want to dive deeper into SQL Server.  Accidental DBAs will find this book too detailed and deep for their needs.

For example, in Chapter 14, Monitoring and Automation, Rod talks about deadlocks, including how to create them, how to monitor for them, and how to create a a SQL Server Profiler trace to catch blocked processes.  He also shows how to use the RML utilities to clean up your Profiler traces and get better insight out of them.  These types of topics are probably outside of what an accidental DBA would want to accomplish, but it’s exactly the kind of thing that a full-time production DBA has to get involved with sooner or later.

Both of these books, however, take the same approach of focusing on administration, not development.  If you want to learn how to write T-SQL, how to use functions, or how to design a schema, this is not the book for you – look for a SQL Server development book instead.  If you spend your day managing more than 25 instances of SQL Server, this is a good book for you.

Areas for Improvement

I’d quibble about the book’s organization.  For example, Chapter 2 is titled Storage System Sizing, but it encompasses a lot of aspects of storage from RAID levels, direct attached storage, SANs, solid state drives, etc.  Chapter 3 is called Physical Server Design, but it continues with storage topics like disk configurations and RAID array stripe sizes.  My advice: ignore the table of contents, and just dig through the entire book.  It’s worth it.

Appendix A lists the Top 25 DBA Worst Practices, including things like “Using RAID 5 volumes for write intensive applications.”  They’re great advice, but I might include a pointer to the section of the book that explains why it’s a worst practice.  It’d help the DBA drill down to learn more, and there’s certainly enough information in the book to back up what Rod suggests.  It’s not an issue of just tossing out suggestions without backing them up – if you read this book cover to cover, you’ll understand the reasoning behind each suggestion.

Overall: Good Resource for Curious Production DBAs

I’d describe the perfect buyer as someone who’s been working with SQL Server for a year or two, and who’s facing a lot of challenges on the job around configurations they haven’t seen before.  They might be tasked with building their first disaster recovery plan, not sure whether to choose log shipping or database mirroring, and they want to know what types of production issues they’ll face with each of those options.  They can read this book’s High Availability section and feel like they’re having a conversation with a friend who’s been there and done that.

I’d also recommend this book to someone preparing to get their MCITP certifications in SQL Server.  If you devour the material in this book, then you’re going to have the kind of skills and knowledge that it takes to get certified.  If you find this book too intimidating or too technically detailed, then you’re probably not going to like the certification process either – and the cost of this book beats the cost of most full-blown MCITP training materials.

Buy SQL Server 2008 Administration in Action from Amazon


#SQLPass – Track SQL Server Talk on Twitter

#SQLPass
0

In Twitter, when you put a pound sign or hash tag in front of a phrase, that makes it easier to search for the phrase in past Tweets.

If you want to alert your fellow SQL Server peeps about what’s going on at the Professional Association for SQL Server summit or at their regional events, include the phrase #SQLPass in your tweet.  Capitalization doesn’t matter.

Likewise, if you’re looking for the latest info on Twitter about PASS, here are ways to find it:

Both of these services include RSS feeds for the tag activity as well, so you can subscribe in an RSS reader and stay on top of what happens without getting overwhelmed in real time.

For more about the basics of Twitter, I’ve got a Twitter FAQ covering things like RT, OH, and hash tags.