DBA Job Transitioning: Disable Logins In Advance

1 Comment

Database administrators touch a lot of servers in their day-to-day duties.  We do a lot of troubleshooting, security testing, and run a lot of utilities for everything from performance monitoring to development servers.

In a perfect world, the database admin changes their domain password frequently.  When the password changes, the DBA knows pretty quickly if any software is running under that domain account because it starts to fail.  However, some other things like SQL jobs may also be running under the DBA’s login, and those may not run every day.  It may take a while to unearth all of the security problems.

To be proactive, disable the database administrator’s domain account several days before they leave.  (This assumes, of course, that the DBA gave at least two weeks notice!)  Afterwards, go through all of the database servers & applications and make sure they stay up.

That way, if something breaks, the DBA is still around to help fix things – instead of being long gone.

And yes, since this is my last week at Southern Wine, I’ll be disabling my login in advance too!

SQL Server Magazine post about the Quest SQL community


SQL Server Magazine just posted an interview with Billy Bosworth about the Quest Association for SQL Server Experts.  One of the quotes that stood out for me was:

“We want real people who do real jobs . . . that can interact with us very early in the process of building a product. That helps us to identify what the real pains are out there.”

I’m a member of the QASSE (I like to pronounce it in a way that rhymes with “posse”, but that’s just me) and I can vouch for Billy’s quote.  Every time I’ve interacted with Quest, it’s been a learning experience for both sides.  They ask a lot of questions about how their products are used, and how they can be used better.  They put a lot of time and effort into that listening, and they’re not just giving a BS marketing line.

Interviewing DBAs: Check their Business Decisionmaking

Professional Development

When interviewing a database administrator, it’s not enough to check their knowledge of technical skills: we have to check their ability to make sound business decisions. DBAs are in a position to make a lot of architectural decisions with long-term ramifications, and in smaller shops with just one or two DBAs, these decisions won’t come out into the open until it’s too late.

I interviewed a DBA candidate recently who demonstrated why this was important. His technical skills were solid: any technical question I asked, he was able to cover. He didn’t necessarily have all of the skills we were looking for, but hey, who does these days? SQL Server is a massive product with thousands of niche features, and nobody can be expected to know everything. On a technical basis, I knew this candidate would get the job done.

But When I Asked Business Questions…

The problems arose when I started asking business-oriented questions like, “It sounds like you’ve got a lot of SSIS and DTS packages, plus a lot of Agent jobs. How do you monitor all of them?” He responded by saying that they each sent an email at the end of the job, and every morning he checked his email to see what happened overnight.

Ouch. I don’t know about your shop, but at my shop, that’s too late, and it’s too single-point-of-failure. In a situation like that, I know that DBA can never take a vacation, and that’s bad for both the DBA and the shop. It leads to early burnout, and it sounded like that was the exact reason the candidate wanted to bail out of his current job.

There was nothing technically wrong about his answer: SQL Server Integration Services does have some pretty good error handling and alerting. The issue is that in a shop with only one database administrator, hand-coded error alerting is going to be tough for the next DBA to interpret. He’s going to turn in his two weeks notice, and the company won’t be able to hire a new DBA fast enough to do knowledge transfer. I feel sorry for that new DBA.

On The Other Side: As A Candidate

I’ve been on the other side of this, too: in my job hunt last week, I interviewed with a company and had the good fortune to be able to talk to their exiting database administrator. I asked him the same question, but this time as a candidate: “Tell me about your alerting systems.” The DBA proudly talked about his hand-coded monitoring systems that checked every server for things like blocking queries, CPU use, etc., and stored it all in tables we could query.

Grrreat. You mean to tell me that out of the dozens of server monitoring packages out there, not to mention the ones targeted specifically at SQL Server, you couldn’t find a single one that did the job? And not only do I have to take over your database servers hoping you were a decent DBA, but I have to take over your code hoping you were a decent developer? I made a note of that, and had I taken that job, I would have made it contingent on immediately buying a third party server monitoring system that I wouldn’t have to maintain. Starting as a new DBA is tough enough.

In both cases, that’s a bad business decision. A good DBA has to look out for the company’s interests, and has to do it with the knowledge that nobody’s double-checking the DBA’s work.

So when interviewing for a DBA position, whether you’re a candidate or the manager, ask not only how the SQL Server jobs are managed, but how the next person should expect to take that over.

More Articles on How To Hire and Interview DBAs

Changing Companies


When we moved from Miami to Houston for Erika’s new job as an air traffic controller, my employer wasn’t too sure whether or not telecommuting would work out. Southern Wine currently doesn’t have a pro-telecommuting policy by any means: one company exec stood up at a company meeting and said they would “never” allow it, and went so far as to repeat the “never” several times. However, my managers gave me a couple of months to finish up my current projects, and then said they’d re-evaluate their stance on telecommuting.

I didn’t want to push my luck, so I went ahead and made a few calls. I’m really excited to say that I’m going to be a SQL Server DBA at a leading global financial services firm with a huge presence in Houston, and it’s a company I really admire.

I got a laugh out of one question during the interview. I asked if they had any worries about me or thought I had any negatives, and they said they were worried that the job wouldn’t challenge me enough. Every job has its challenges, but for personal growth, I’ve been getting more enjoyment out of writing articles and spreading knowledge about database topics, SAN trends, and hardware issues. Southern gave me a lot of great opportunities to contribute in several IT areas, but I’m looking forward to a few less weekend phone calls. (Had three on-call episodes this weekend alone, and I’m not even on-call anymore!) Less surprise calls = more articles for you, dear reader.

I have to thank everybody at Quest Software: without their encouragement, I never would have thought to do something like submit an article for SQL Server Magazine or speak at a conference. Now, I can’t believe I waited so long to do it, and I can’t imagine doing it without their help. They really do believe in encouraging a strong sense of community around the products they support – not just the Quest software lines, but the underlying engines like SQL Server and Oracle. I could rattle off half a dozen Quest people, but I’d be remiss if I didn’t specifically thank Heather Eichman and Rachel Gross.

Don’t specify IP’s in your SQL connection string


In today’s SQL newsletter from I was horrified to find the following tip:

When you specify a server in an ADO.NET connection string, use the server’s IP address, not the server’s DNS name. By using an IP address instead of a DNS name, name resolution does not have to occur, reducing the amount of time it takes for a connection to be made. A server’s IP address can be used to specify either a default or named instance of a server running SQL Server. If you are running a cluster, use the virtual SQL Server IP address. [7.0, 2000, 2005]

As a guy who’s lived through more than one IP address renumbering disaster, please believe me when I say you should never do that.  IP addresses can and do change.  While server names can also change, it’s easy to make fake DNS entries for an old server name that points to the new name.  It’s impossible to make a fake IP address point to one in another subnet, especially if the server is being physically moved.

Disaster recovery is a great example, and performance testing offsite is another.  There are times when we’ll airlift an entire application and drop it somewhere else.  Everything will be the same, but the IP’s are different – and bam, the application fails.  Programmers have to be called in to track down every possible location in their app for a connection string.

Furthermore, applications don’t query DNS every time they query the server.  They have a DNS cache, and they will only look up the DNS name periodically – and that time is usually measured in hours or days, not seconds or minutes.

If your application is so blazingly fast that a 1-second DNS lookup once per day is a bottleneck for you, then I salute you, and you might consider using IPs in your connection string.  However, the rest of us need to use DNS names.

Normally I like and I recommend it for beginners, so I was surprised on this one, and I had to point it out.

Querying IBM Director 5.2 for firmware versions


IBM Director is a decent tool to manage firmware & driver versions, but it doesn’t have a nice report to show the firmware & driver versions for things like the RSA, raid card, network cards, etc.  I wrote the below T-SQL query to list out the most commonly needed versions & IP addresses after I couldn’t find a similar one online.  If you store your data in something other than SQL Server, you’ll need to modify the query to match your database platform.

You’ll need to modify it in your environment to match your table owner – for us, we named the service account IBMdirector, so the tables are all prefixed with that, but you may need to use DBO instead.

It will produce multiple result rows for a single machine if you have multiple raid cards, multiple network cards, etc – but you can filter that out based on your needs.

, obj.LABEL
, fwBios.Version AS BiosVersion
, COALESCE(fwDiag1.Version, fwDiag1.BuildNumber) AS DiagnosticsVersion
, fwBMC.Version AS BMCFirmwareVersion
, fwRSA.Name AS RSAFirmwareName
, fwRSA.BuildNumber AS RSAFirmwareBuildNumber
, rsaIP.MPIPAddress AS RSAipAddress
, rsaIP.MPSubnetMask AS RSASubnetMask
, rsaIP.MPGatewayIP AS RSAGatewayIP
, raid.Model AS RaidModel
, raid.BiosVersion AS RaidBiosVersion
, raid.FirmwareVersion AS RaidFirmwareVersion
, raid.DriverVersion AS RaidDriverVersion
, drvIPMI.Vendor_Name AS IPMIDriverVendorName
, drvIPMI.Version AS IPMIDriverVersion
, drvAPCI.Vendor_Name AS ACPIDriverVendorName
, drvAPCI.Version AS ACPIDriverVersion
, drvRSA.Vendor_Name AS RSADriverVendorName
, drvRSA.Version AS RSADriverVersion
, rsaDriver.REVISION AS RSADriverRevision
, drvQLogic.Vendor_Name AS QLogicDriverVendorName
, drvQLogic.Version AS QLogicDriverVersion
, drvSAV.Vendor_Name AS SAVDriverVendorName
, drvSAV.Version AS SAVDriverVersion
, drvVGA.Vendor_Name AS VGADriverVendorName
, drvVGA.Version AS VGADriverVersion
, drvBroadcom.Vendor_Name AS BroadcomDriverVendorName
, drvBroadcom.Version AS BroadcomDriverVersion
, dirAgent.Version AS DirectorAgentVersion
, (SELECT TOP 1 Firmware FROM IBMdirector.UMS_SRDISKDRIVES fwHD WHERE fwHD.MANAGED_OBJ_ID = obj.MANAGED_OBJ_ID ORDER BY fwHD.Firmware) AS HardDriveOldestFirmwareVersion
, (SELECT TOP 1 FirmwareVersion FROM IBMdirector.TWG_FIBRE_ADAPTER fwHD WHERE fwHD.MANAGED_OBJ_ID = obj.MANAGED_OBJ_ID ORDER BY fwHD.FirmwareVersion) AS HBAOldestFirmwareVersion
, (SELECT TOP 1 DriverVersion FROM IBMdirector.TWG_FIBRE_ADAPTER fwHD WHERE fwHD.MANAGED_OBJ_ID = obj.MANAGED_OBJ_ID ORDER BY fwHD.DriverVersion) AS HBAOldestDriverVersion
LEFT OUTER JOIN IBMdirector.UMS_FIRMWARE fwDiag1 ON obj.MANAGED_OBJ_ID = fwDiag1.MANAGED_OBJ_ID AND fwDiag1.Name LIKE ‘IBM Diagnostics%’
LEFT OUTER JOIN IBMdirector.TWG_DEV_DRIVER drvBroadcom ON obj.MANAGED_OBJ_ID = drvBroadcom.MANAGED_OBJ_ID AND drvBroadcom.DRIVER_NAME LIKE ‘Broadcom%’
LEFT OUTER JOIN IBMdirector.MP_VPD rsaDriver ON obj.MANAGED_OBJ_ID = rsaDriver.MANAGED_OBJ_ID AND rsaDriver.VPD_TYPE = ‘Device Driver’

SQL Backup Software: Part 5 – Justifying the Cost

Backup and Recovery

I’ve talked about how a SQL backup compression program can help database administrators, but now it’s time to convince the boss. We have to sell management on why it’s worth the money.

First, a few words about my approach to database administration, because I think it’ll help DBAs in general. Being a database administrator is a lot like running a company: our users are buying a product (a reliable place to put their data), and they pay us (our budget) for that service. Getting more money in the budget isn’t a matter of begging our managers: it’s a matter of convincing our users that they can get a better product from us if they want to spend more money.

I never go to my IT manager and say, “Don, I need ten grand for this.” He doesn’t care whether backups take 4 hours or 1 hour because it doesn’t make a difference to him. Instead, I have to go to the stakeholders – the people who really care about backup windows and backup sizes.

Data Warehouses: Putting a Price on the Nightly Window

We have a 1tb data warehouse that’s growing like a weed in a compost pile. The business wants to store more data from more sources, the users want to design cooler reports with more analytics, and the ETL guys want to build more aggregate tables to speed reporting queries. All of these things have one thing in common: they require more time in the nightly load windows.

At the start of each data warehouse expansion project or each new reporting project, I ask the developers a question: how much longer is this going to make your nightly window, and how do you plan to compensate for that? It’s like when the government adds a new department: sometimes they offset it by reducing expenditures somewhere else. The data warehouse developers can add more nightly load processing, but they have to compensate for it by making another part of the process go faster. If they can’t find that part to optimize, then I can sell them more time by reducing the nightly backup window – but that costs money.

The first step is usually to add backup compression software because it’s the lowest-hanging fruit. When the project manager looks at the cost of backup compression software versus the cost of refining the difficult ETL process, the backup compression software is a no-brainer.

Later in the process, reducing the backup window even further can be a justification for switching to SAN snapshot backups, and I’m implementing those now on this data warehouse I’ve been mentioning. In March, we’ll be moving the data warehouse storage from an IBM DS4800 to an IBM N-Series (a rebadged NetApp) and using snapshots for nearly instantaneous snapshots, effectively eliminating our backup window. Of course, most SQL Server setups can’t afford that capability – and that’s where backup compression is the next best thing.

Time to Restore: Putting a Price on Recovery Time

The smaller the backup file, the faster it is to read off disk or tape. Granted, there is CPU overhead involved in decompressing the backup, but generally speaking, smaller database backup files will restore faster than larger ones. In the case of a disaster where the server must be rebuilt from scratch, there’s a price to waiting that long.

In the case of our 2-terabyte SAP BW data warehouse, it’s the difference between fitting the database on a single LTO3 tape versus spanning multiple tapes. That single tape read means less time. Granted, for a data warehouse, time isn’t always money.

For our sales ordering system, however, time is definitely money. We have a tight window every day where all of our sales force places their orders for the day, and the database simply can’t go down. For high availability, we use database mirroring to our disaster recovery site. That ensures that the database can always take orders, but the DR site has slower bandwidth than our primary site. If the primary server crashes, we have to restore it as fast as possible to get our redundancy back and run at full speed again. I can’t get the 100+ gb of native database backups across the wire from DR to our primary site fast enough, and instead I would have to ship tapes from DR. That’s an overnight delay. If I use database compression software, however, I can restore across the wire and bring the primary database server back in a matter of hours instead of a day later.

Database administrators should be performing regular fire drill restores onto new hardware. The next time the shop does a fire drill restore, time it. Take the end result time, and ask management if that time lag is acceptable. If it’s not, ask how much that long window is costing the business, and there’s the justification to get backup compression software.

Refreshing Dev & QA Faster: Putting a Price on Developer Time

Developers like to work with the freshest data possible. Heck, if they could, they’d do all of their development on the production box itself. They don’t want to be down for hours or days while the development & QA servers are refreshed from the production backups.

This is especially true for serious production issues that can’t be replicated in development or QA. Sometimes an emergency comes up that requires immediate testing and remediation, and the issue isn’t seen in any environment other than production. The developers need a copy of the production data as fast as possible in order to test their fix, and every minute counts.

Ask the development teams if they’d be interested in being able to refresh even their largest databases on demand – like over lunchtime instead of overnight. If the current restores take hours, and if they could be done in under an hour with backup compression, then this is a real option. If their departments are willing to fork over the expense of backup compression licenses for their development & QA servers, the systems teams may be willing to fork over the remaining production licenses as a team effort.

In Summary: Look For Your Users’ Pain Points

Hopefully I’ve given you some ideas about how you can look at the problems your users face, and offer them solutions to those problems with the tools at your disposal. Sometimes SQL Server database administrators focus too much on our own problems and struggle with getting the budget dollars to fix them – when in reality, we should be looking at how those issues affect our users, and realize that the users have their own budget money to spend.

Continue Reading SQL Server Backup Best Practices

SQL performance tuning: it’s about training too

1 Comment

When developers and SQL Server database administrators work together to make an app faster, it shouldn’t just be the DBA working alone in the dark.  If the DBA takes the slow code, goes into his cube, tweaks it (and indexes) and comes back out with a faster system, that fixes the problem once – but it doesn’t fix the long-term problem.

For the best long-term results, a DBA should pick a member of the application team and cross-train them on database tuning skills.  Do performance tuning sessions together, walking the developer through the analysis.  That way, they see the performance problems with their own eyes, they learn to recognize signs of problematic code, and they know how to avoid writing it in the first place.

Perfect example: a few weeks ago, a couple of our developers rewrote a slow user-defined function from scratch.  They then tested it in their development environment and were surprised to find that it was slower than the live version.  Another emergency came up, and now that I’ve got some time to tune it for them, they don’t want me to bother.  They don’t want to waste my time tuning something that they’re probably going to rewrite again anyway.

I told them to give me the code anyway, and at least I can review it to see what made it slower than they’d expect.  Even if this code will never see the light of day, it’ll at least be a good training exercise.

Behind the times or ahead of the curve?

1 Comment

I have this strange personality quirk – I can’t decide whether I’m behind the times or ahead of the curve.  I found out about Dinner With The Band, an HDTV podcast where a music-savvy chef (Sam Mason) cooks for a band, and then they play a couple songs in his NYC apartment.  It’s a slick mashup between cooking and music, two of my favorite things.  Anyway, they seem to have started and stopped filming episodes back in 2007.  The fact that I just now found out about them, man, ouch, that’s behind the times.

But then I started researching my favorite band off the show, Tokyo Police Club, and I went to subscribe to their blog:

First Subscriber!

YES!  YES!  I’m the first subscriber!  That means I’m ahead of the curve again!  Woohoo!

SQL Backup Software: Part 4 – Features Still Missing

Backup and Recovery
1 Comment

In this issue of my multi-part series about SQL Server backup compression software, I’m going to talk about some features that database administrators might find useful, but don’t yet exist in the current generation of backup software.

In-Depth Backup & Restore Progress Monitoring

On multi-terabyte data warehouses – especially those running new backup compression software that the DBA may not be familiar with yet – DBAs want to know how the backup is coming along. None of the products on the market do a particularly good job on a centralized console showing up-to-the-minute statuses across a farm of servers. Everybody has pieces of the picture, but no one shows the whole thing at once. Statistics I’d like to see include:

  • Status of the restore (is it initializing the data & log files or actually restoring data – very important distinction for large data warehouses)
  • Estimated time to completion
  • Compression ratio
  • Throughput rate (especially compared to this server’s historical average or the datacenter’s average)
  • Overlapping jobs (full backups that have run into the transaction log backup window, or jobs that won’t finish in a given amount of time)
  • Trend alerting (jobs that are taking longer than normal, jobs getting much lower throughput than normal)

The good news is that by querying each vendor’s activity database directly, an ambitious DBA with time on their hands can produce these reports themselves. The bad news is that few of us have that much time on our hands! Well, I do, but I use it to write my blog, heh.

Ability to Send Emails During Restores

I’m spoiled by Symantec/Veritas NetBackup, which will automatically email the system administrators whenever a restore finishes (regardless of whether it succeeded or not). This feature helps me empower my junior DBAs: while I want them to do their own restores, I also want to know who’s taking advantage of the service, and I also want to know when I need to jump in and help with a failed restore.

Ideally, I would like the ability to enter an email address during the restore process and get an email when the restore finishes. Optionally, it’d be even better to get emails upon each 10% of progress made. My data warehouse restore took several hours, and I wish I didn’t have to keep logging in to check on its progress.

Diagnostic Reporting & Recommendations

The backup software should be able to pinpoint the current backup bottleneck. Some examples might be:

  • CPU load – if the CPU is too heavily loaded, the backup software may be waiting on CPU time to compress the backup. It would alert the DBA and suggest a lower compression ratio.
  • Source disk throughput – if the source drives aren’t responding quickly enough, and if CPU load is low, the backup software could suggest a higher backup compression ratio. That way, the backups could be compressed smaller without affecting the backup times or the server response times.
  • Network throughput – if the backup target is a UNC path, and if the network bandwidth is saturated on the database server, the software could suggest adding network cards for better performance or using a higher compression ratio.
  • Target disk throughput – as with our source disk throughput scenario, if the target drives are too slow, the software could suggest higher compression ratios.

This sort of reporting could be an add-on license, or be used as a cross-selling tool for the vendor’s SQL Server performance monitoring software. In Quest’s case, the combination of Quest LiteSpeed for backup and Quest Spotlight for performance monitoring means that the two products have all of the information necessary to make these configuration recommendations.

Restore Source Selection

When it’s time to restore databases, Idera SQLsafe does a great job of integrating history across servers. When selecting a source for the restore, SQLsafe lets the DBA pick from a list of source servers and databases. Idera also includes a button to fetch the most recent backup of that source database including all matching transaction logs.

Quest LiteSpeed does a better job of picking source files off disk, though: their file-picking wizard shows all machines in the network (not just SQL Servers) and lets users drill into each machine’s file shares. LiteSpeed accesses the restore files via UNC path. This beats Idera’s UNC selection method, which requires the end user to manually type in the full UNC path including file name.

The best restore source selection would be a combination of both Idera and Quest’s methods.

Even better, I’d like to see a true disaster recovery wizard. Instead of specifying a single database to restore, I’d like to be able to restore an entire server: fetch the latest full & transaction logs for all databases on a given server, and restore all of them to another target server.

Automatically Skip T-Logs on Simple Mode Databases

Our developers and junior DBAs routinely create new databases without advance warning. I have to know that those databases will get backed up without human intervention, and the only way to do that with today’s backup software is to use a single maintenance plan for all user databases. If we set up individual maintenance plans for different groups of databases, then any new database wouldn’t be backed up until it was manually added to that maintenance plans. That’s not an acceptable risk for us, so we have to use a single maintenance plan for all user databases.

However, on a single server we’ll have some databases in full mode and some in simple, depending on their purposes. For example, we have systems monitoring databases like Ipswitch WhatsUp, and we don’t mind losing transactional history for a day.

The problem comes in with the single maintenance plan for all databases. If we only use one maintenance plan, today’s backup software isn’t smart enough to skip transaction log backups on databases in simple mode. It will try to do a t-log backup, and then send an error because it can’t perform the backup.

One solution would be to have the maintenance plans automatically skip t-log backups for databases in simple mode. Another solution would be to use policy-based management that defines a default policy for any newly created databases, and then lets me move simple-mode databases into their own policy.

Take Corrective Action Inside the Management Console

Both Idera SQLsafe v4.6 and Quest Litespeed v4.8 have dashboards that show recent activity, like which databases haven’t been successfully backed up in a given time range.

That’s a great start, but from that same screen, I want to take corrective action (like starting a backup or scheduling one) and diagnostic actions (like viewing the error logs or the event logs). If something is color-coded red to indicate trouble, then it should have a right-click mechanism to correct the problem.

Load-Balanced Scheduling

A while back, I got the nickname of Six Million Dollar Man. Somebody said that if they followed all of my suggestions, our products would take six million dollars to deliver. Here’s a good example of my wacko ideas.

For most of our production servers, we do full backups every day and transaction log backups every 15 minutes. Here’s the thing, though: I don’t care when the fulls happen, and I don’t always know when the server is at its lowest load point (where the backup should be done). The time of the full backup doesn’t really matter – I just want to find the right time to do it.

To make that easier, I want a report that shows the critical hardware metrics for a server so that I can pick the best time for a backup window. It should look at source disk load, target disk load, network throughput, CPU load and the size of the databases, and then suggest the best time window for the full backups.

As if that wasn’t tough enough, I want it to be aware of the backups that are already running, and highlight that load or subtract it from the report. Meaning, if there’s a set of full backups running from 4pm to 6pm, then don’t just show high server load during that window and say it’s the wrong time to run a backup. The backup is creating the load!

And finally, I would want to balance the backup load across multiple servers. I don’t want to try to back up two large data warehouses at the same time to the same file share, or two blades in the same chassis since that would max out my network throughput. I would want the ability to say that full backups should avoid overlapping.

(Pipe dream, right? I know.)


Some of my suggestions are pretty expensive to implement, but they give database administrators an idea of the kinds of features missing in the market now. This list shouldn’t stop a DBA from buying backup compression software – there’s still a huge ROI for backup compression, and I’ll talk about that ROI in my next blog post justifying the purchase price. (I’m writing these a little out of order!)

Continue Reading Justifying The Cost of SQL Server Backup Software

What’s the Largest Database You’ve Worked With?

When hiring a production DBA, I start by asking them, “In terms of file size, what’s the largest database you’ve worked with?”

In a perfect world, we’d only hire very experienced candidates, people who’ve worked with larger databases than we’re dealing with. In reality, though, we can’t always afford those candidates – if they’re even available at all. For some positions (like a junior production DBA with a senior already on staff), we can live with a lower level of experience – someone who can grow into the position. In other positions (like a senior lead), we can’t afford to have them learning on the job.

Non-DBA readers: to find the answer to this question, remote desktop into the shop’s largest SQL Server and do a search for files with MDF or NDF extensions. Add all of the file sizes together, and make a note of the largest one. For example, the total size might be 200 gigs, with the largest file being around 100 gigs.

I generally classify servers into these vague size groups, going by total size of all databases on the server:

  • Under 50 gigs – these servers almost run themselves once the basic maintenance plans are set up. The Windows system admins manage the SQL Server without training, and management takes less than an hour per week. Everything is installed on the C drive without serious performance issues.
  • 50-250 gigs – The system admins encounter some performance bottlenecks. Initial setup should be done by a DBA, and basic maintenance plans may require some planning and regular attention. Requires some performance tuning, but can be learned by a new DBA. Management takes less than 4-8 hours per week once the server is set up correctly, but the initial configuration repairs (when a DBA takes over for the first time) may take a few weeks.
  • 250GB-1TB – Requires a full time database administrator on staff, although not one DBA per server. Backups must be done with specialized backup compression software, index and statistic updates may run for hours. Performance tuning should be done only by staff with experience. Management takes 4-16 hours per week once the server is set up by a trained DBA, but initial configuration repairs can take a month or more.
  • Over 1TB – Even the simplest tasks require experience from other servers in this tier.

Database administrators have a tough time upgrading their skills from one tier to the next without hands-on experience. A DBA can manage more servers of the same tier without training, but the larger the server gets, the more planning and training it takes to do a good job on maintenance.

DBAs will also rarely step down in size, because pay goes along with the size of the environment. If the candidate says they’re used to working with terabyte-sized data warehouses, but they’re willing to take a job managing a couple of 100gb servers, start asking why.

If the company can’t find or can’t afford a DBA with experience on their size of database servers, there are three workarounds:

Training from an experienced local mentor.

Ask each candidate a tough question: “If you ran into a problem you couldn’t solve, do you have anybody in town that you could call in to help?” They might have a contact, someone they’ve worked with before, who would be willing to lend a hand. If not, there’s always consultants, but their skills can be tough to judge too. Expect a junior candidate to need mentor help for a few days per month when upgrading their skills from one tier to the next.

Buy software to make the job easier.

Companies like Quest, Idera and Red Gate provide a lot of tools to automate database administration. They build experience into the tools, making it easier for junior DBAs to pinpoint performance bottlenecks, solve index and statistics problems, and automate manual tasks. Sure, the software costs money, but the upside is that a junior DBA with good software tools can sometimes outperform a senior DBA who does everything by hand. (That excludes the >250gb tier, though – get an experienced pro for that tier.) Another advantage is that these tools persist even when the staff turns over, and they’re reusable over time.

Send the DBA to a training school or camp.

I have to be honest: I hate this option. The trainers are all too often “book experts” who don’t have real-world experience. The school curriculum moves at a fast pace, spending a set amount of time on each of many subject areas that the junior DBA may not need help with. Instead of spending a week at a boot camp covering a zillion topics, the junior DBA would get much more help from a week of targeted, hands-on time with an experienced pro looking at their system’s specific problems. That way, the junior learns exactly what’s applicable to the problems their company is facing, not canned solutions for things they won’t encounter for months or years.

More DBA Career Articles

  • Moving from Help Desk to DBA – a reader asked how to do it, and I gave a few ways to get started.
  • Development DBA or Production DBA? – job duties are different for these two DBA roles.  Developers become one kind of DBA, and network administrators or sysadmins become a different kind.  I explain why.
  • Recommended Books for DBAs – the books that should be on your shopping list.
  • Ask for a List of Servers – DBA candidates need to ask as many questions as they answer during the interview.
  • Are you a Junior or Senior DBA? – Sometimes it’s hard to tell, but I explain how to gauge DBA experience by the size of databases you’ve worked with.
  • So You Wanna Be a Rock & Roll Star – Part 1 and Part 2 – wanna know what it takes to have “SQL Server Expert” on your business card?  I explain.
  • Becoming a DBA – my list of articles about database administration as a career.

SQL 2008 release date pushed back


Remember all the hoopla when SQL 2005 came out? Microsoft swore we’d get more frequent releases because the 5-year lag after SQL 2000 meant it wasn’t worth paying for Software Assurance on SQL Server. They promised we’d get new stuff faster.

Fast forward to today’s announcement:

“Microsoft is excited to deliver a feature complete CTP during the Heroes Happen Here launch wave and a release candidate (RC) in Q2 calendar year 2008, with final Release to manufacturing (RTM) of SQL Server 2008 expected in Q3.”

I literally just finished installing our first SQL 2008 development (non-lab) box and I was scheduling a meeting with our developers to encourage them to migrate their apps. Thank goodness they announced this before I held the meeting – I’d have looked like an idiot.

Update: the SQL Server 2008 release date is still scheduled for August.  I moderated a great webcast on SQL Server 2008’s new features with Kevin Kline, Hilary Cotter and Geoff Hiten talking about what’s coming.

Dell EMC AX150i review

Dell EMC AX150i
Dell AX150i

The iSCSI EMC AX150i is resold by Dell, and refurbished versions are available pretty inexpensively through the Dell Outlet. We just picked up our second AX150i with 12 500gb SATA drives for around $6,000 total, or about $1,000 per raw terabyte. It’s a great deal for lab or development storage, but there are a couple of gotchas that I haven’t found in other reviews.

The AX150’s Drive Pool Setup Limitations

The AX150i stores its operating system on the first four drives (bays 0, 1, 2, 3). Those 4 drives therefore have less available capacity than the other 8 drives in the AX150i. Those drives cannot be used as hot spares, either, because of their specialized roles.

This breaks the AX150’s drives into the following categories:

  • Disks 1-4: some space consumed by OS, can’t be hot spares
  • Disks 5-11: full amount of free space
  • Disk 12: hot spare (could be any disk number, but I’m picking 12 for this example)

If the user tries to create a disk pool, and tries to mix some of disks 1-4 with disks 5-11 (like creating a 6-disk array with disks 1, 2, 3, 4, 5, 6), the AX150 throws out a stern warning. It will allow the configuration, but the warning is pretty ugly.

So if the user can’t mix drives from those 3 groups (OS, empty, hot spare), that basically leaves the following possibilities:

  • One pool with disks 1, 2, 3, 4 – can either be raid 5 (roughly 1.5tb capacity) or raid 10 (1tb capacity)
  • One pool with disks 5, 6, 7, 8, 9, 10, 11 – an awkward total of 7 disks, meaning raid 5 (3tb capacity) or only use 6 disks in a raid 10 and get 1.5tb capacity plus an extra hot spare
  • One hot spare (or two, if disks 5-10 were used as a raid 10)

Total config space for raid 5 is 4.5tb, and for raid 10 is 2.5tb. That space quantity is fine, but the hard-coded setup limitations on which disks can be grouped together are a pain – especially when we’re only dealing with 12 drives.

The AXI150i Has No Load Balancing

The AX150i has two iSCSI ports, and it does support multipathing with the stock Microsoft iSCSI initiator. I’ve successfully set up both of the iSCSI ports on the same switch, set up two network cards on a Windows server with the MS iSCSI initiator, and then pulled various network cables out during I/O activity. Sure enough, the drives fail over without loss of connectivity or data. To me, that’s an astounding bargain at the sub-$10k price point.

However, with one Windows server and one AX150i, I haven’t been able to break the 90mbps bottleneck – meaning, I don’t get more than one network card of throughput. I’ve tried multiple disk pools, tried multiple drives, different drives on different network ports, and I haven’t been able to figure out how to get it to use two full network ports.

This is not a huge problem at this price point – I’m thankful enough that I’ve even got failover in an array this inexpensive. However, consumers need to be aware that two iSCSI ports doesn’t mean two 1gb network cards with 100% saturation.

It’s Not Expandable

Dell/EMC don’t claim this model is expandable, but this needs to be emphasized to the prospective buyer. Shops with two or more AX150’s can’t combine them, and can’t migrate data from one AX150 to another. If there’s a small amount of free space on two AX150’s, it can’t be combined to create a single LUN.

Again, not a big problem at this great price point, but something to be aware of. This feature is available at higher price points from companies like EqualLogic (now owned by Dell) and LeftHand Networks.

And Yet: A Big Winner At This Price Point

Shops that haven’t invested in shared storage yet can get their feet wet with an AX150 without a big capital commitment. I’d recommend it for a shop that isn’t sure whether or not they’ll go with iSCSI or fiber down the road, or if they’re just not sure about shared storage, period. This class of storage is cheaper to get into – and out of – than a LeftHand chassis. An AX150 can be had for under $10k, whereas the LeftHands and EqualLogics run at least twice that much. Granted, they offer twice as many features and much better scalability.

We took the approach that these are for development sandbox use only, never production. Our VMware lab farm is hooked up to an AX150, and so is my SQL Server data warehouse testbed. It’s a testament to their ease of use that I’m always tempted to slap a production array on one just because we’ve got the extra space.

SQL Backup Software: Part 3 – New Features for Backup/Restore

Backup and Recovery

In the last decade, Microsoft has brought SQL Server a long way – bringing in things like .NET code, data encryption, partitioning, and a darned slick management user interface. But one of the things that hasn’t changed has been the process of backing up and restoring data. We still back up to a file, and we still restore the entire database. The process is clunky, usually manually managed, and no new backup/restore features have arrived since Microsoft has been focusing on things like the BI stack. (I can’t exactly complain about that!)

Other vendors, though, have introduced several new features that give database administrators new tools in their backup & recovery arsenal. I’m going to talk about these features in general, not how a specific vendor implements them, because I want to cover so much ground, and some of the products are easier to show in a single screen shot. Here we go:

Object Level Recovery – restoring single objects and rows

The most urgent restore requests seem to be the smallest ones: someone dropped a single table, someone deleted a key record, someone hosed up a stored procedure. Everyone expects the DBA to be able to recover objects in a moment’s notice, but SQL Server’s native backups don’t make that easy. We have to restore the whole database, then pluck out the specific objects we want, and copy them over to the online database. What a hassle! Today’s third party backup software automates all of that by browsing inside the backup file, letting us pick specific objects, and asking us where to restore them. When testing this capability in backup software, look for these abilities:

  • Restore multiple objects at once
  • Select objects without reading the entire backup file (time-consuming)
  • Script capabilities (so that you can create a single restore script, then reuse it for the exact objects you want without browsing through the entire backup file)

Central dashboard to manage all SQL Servers

A DBA that manages more than ten servers wants to spend less time checking their backups and restores. Vendors solved this problem with simple, intuitive dashboards that show whether any backups were missed in the last, say, 24 hours, or whether any database hasn’t been backed up in a certain time range. The DBA can see at a glance that all of the servers are protected. Contrast this with the native SQL Server backups, where there’s no graphical way to glance at a server and know that all of the databases are backed up. Some things to watch for:

  • Manage servers without an agent – useful for non-licensed servers
  • Central database of managed servers – so you don’t have to maintain the server list on every DBA’s workstation

Detailed progress status for currently running backups

I’ve taken this one for granted since I started using Idera SQLsafe and Quest LiteSpeed, and whenever I have to deal with a plain SQL Server, I can’t believe I lived without it. At a glance in the management UI of either program, I can see exactly how much has been backed up so far. Sounds crazy, but when the DBA is dealing with a 2 terabyte data warehouse, he doesn’t want to wait around to find out if data is getting out of the server. Look for:

  • Status monitoring from anywhere – on any DBA’s workstation, they should be able to see status updates for any server’s running backups

Encrypted database backups

For public companies and medical companies, this just sells itself. Database administrators who don’t have backup compression software can get this budgeted just by selling it as the way to encrypt database backups.

I also wrote a tech brief for Quest called “10 Things DBAs Probably Don’t Know LiteSpeed Can Do.” It requires registration, but hey, it’s free. (I guess that means it’s not one of the best things in life, eh?)

Continue Reading Features missing from today’s SQL Server backup software.

Microsoft Technology Center: The DBA Experience


Our contacts at Microsoft offered us the services of a Microsoft Technology Center to do a two-week lab exercise on our data warehouse. We’d kept piling one application after another on our data warehouse SQL Server until the poor thing was plain out of maintenance windows and couldn’t keep up, and we needed info from the pros to keep scaling it up.

Before my visit to the MTC, I tried searching the internet for other peoples’ experiences with the labs, but I came up emptyhanded. I resolved to write up a recap after the trip was over so that other DBAs heading to an MTC engagement would understand how it works, what to expect, and things to look out for.

What A Microsoft Technology Center Is

For us, the Microsoft Technology Center was really all about two things: getting insight from Microsoft experts and experimenting with different system-wide architecture options.

Microsoft staffs the labs with gurus who have extensive knowledge of MS products. When we ran into trouble with the testing functionality of Visual Studio, our MS guy quickly summoned the relevant expert. We got tricky questions answered in a matter of minutes instead of having to reinvent the wheel. As a result, we could make a lot of progress in a short amount of time.

Walking around the MTC, I was struck by how all of the place’s value is the technical people and the gear. The office could have been anywhere in the world, just a typical New Horizons-looking place with conference rooms and whiteboards. The building was completely worthless – until the people walked in, and suddenly it was priceless. Out of nowhere, I understood the Microsoft “People Ready” ads.

A minor drawback, though, is that the experts aren’t always available at a moment’s notice. They may be in other meetings, working with other clients at the lab, or gone for the day, or just didn’t have experience in that one particular aspect of the product.

The MTC datacenter is stocked with relevant hardware ahead of time for the client’s lab engagement. They did a great job of simulating our environment: while it wasn’t an exact match, it was more than close enough to simulate what we were doing in production. We were able to experiment with adding more memory, changing SAN configurations, changing SQL settings and other things we couldn’t possibly do in rapid fire succession in our production environments.

What An MTC Isn’t

The MTC doesn’t have every piece of software, particularly non-Microsoft software. I needed to bring a copy of our data warehouse, and they had Quest LiteSpeed available to do database restores. However, they couldn’t have copies of our reporting software (BusinessObjects), our ETL software (IBM DataStage) or our load testing software (Mercury LoadRunner) due to licensing restrictions. We were warned about this ahead of time, and we could have rented or borrowed trial licensing from our vendors, but we would still have to build that complete environment in another location from scratch. That kind of setup turns a two-week lab engagement into a four-week project with more risk and more planning. That fact alone made the Microsoft BI stack more appealing.

Since we couldn’t use our reporting, ETL or load test software, we had to have Plan B’s for those, and that brings in the next problem: the Microsoft Technology Center is not the place to learn new products. For example, we needed to be able to load test an IIS application without our normal Mercury LoadRunner, so we used Microsoft Visual Studio Team Test. VSTT met all of our needs and was a lot of fun to work with, but it came with a one week learning curve: we didn’t get a useful benchmark out of it until the fifth business day of the lab. Our QA staff had to rebuild their use case tests with a new product, we had to interpret IIS errors we’d never seen before, and we couldn’t compare the lab environment test results from VSTT with the production test results from LoadRunner.

I walked away with a lot of valuable QA experience and a great impression of Visual Studio Team Test, but looking back, that’s probably not the best thing for a database administrator to learn, which brings up the next weakness.

The Microsoft Technology Center isn’t staffed with everyone from the client – only with the people who are sent to the remote lab. During the lab setup, I wish we could have had at least one of our own developers and a QA staffer on hand. These people would have gained much more from the lab setup experience, the load testing and the code mitigation ideas. We would have had lower setup time, faster test iteration times, and faster mitigations. Instead, we had to do a lot of conference calls back and forth with the home office to get everybody on the same page and get status updates on the work.

The MTC also isn’t a silver bullet for application problems: during testing, I uncovered a few horrifying bugs that we just couldn’t fix fast enough to learn from the lab. We wanted to optimize one particular set of nightly processes, but while reviewing the code, I found business logic errors that required a rewrite of a major stored procedure. Since that stored proc represented the vast majority of the nightly load, we couldn’t take full advantage of the lab experience for that one objective – the code mitigation would have to wait until we returned to the office and met with our entire staff.

What We Gained From Our Experience at the MTC

On one poster at the MTC, a client quote said that they got frank, honest and valuable technical feedback from the Microsoft staff. That one thing sums up the biggest benefit in my view. Client managers, client technical staff and Microsoft staff can sit in a conference room, hash out a process, and be honest about what’s worked and what hasn’t worked at other clients.

For example, I’d talked to our BI manager in the past and emphasized that we couldn’t run ETL loads during business hours without a major impact to the end users. The system would still work, but reporting performance would degrade significantly. I couldn’t put a pricetag on hearing that same opinion from the Microsoft side: here was the vendor telling us that no, you can’t do that the way your system is engineered now, but that if you’d like to do that, it can be done with well-designed modifications to the ETL process.

At the Microsoft Technology Center, these kinds of recommendations and opinions carry more weight because they come from independent advisors, people who have a good interest in seeing the product succeed. They won’t overpromise something that the product can’t do, and they know the signs of a project that will fail.

Who To Send to an MTC Engagement

When planning an MTC lab session, a company should send the staff who will do the best job of listening to Microsoft, implementing Microsoft’s recommendations, and then conveying the lessons learned to the rest of the staff.

Notice that I didn’t say to send the best developers or the best DBAs.

The lab isn’t about being extremely good at what you do: the lab is about being a good listener, giving the right answers to company-specific questions, and helping Microsoft work together with the company to deliver an improved implementation.

I think our company sent the right mix of people (although more would always be better), but sitting through the sessions, I can see how that would easily go wrong. During the first couple of days, our main MS lab contact said the same thing several times: “I know this is going to be a tough conversation, but we need to talk about doing this process a different way.”

I responded the same way every time: “This is not going to be a tough conversation. We’re not here because we’re doing things right – we’re here because we need help and guidance! Tell us what to do.”

I can totally see that conversation going different ways, though, because as we related Microsoft’s plans back to our home office over the phone, we had some ugly talks. Some folks can be pretty entrenched in the ways they’ve always done things, and they’re not receptive to new ideas.

At the same time, I wouldn’t recommend sending new staff, either: send the staff with the most possible experience with the company’s internal applications and the most in-depth knowledge of how the company does business. For example, one performance improvement that was tossed around briefly was to disable our indexes during ETL, and then rebuild them after ETL finished. Because I’m familiar with how our ETL runs across different time zones, I was able to explain why it wouldn’t work for our business.

Send a project manager, or designate one person to be the PM for the lab. That person is accountable for making sure the lab stays on track, that it meets objectives, and that the staff back at the home office deliver on urgent needs that come up as a part of the lab engagement. I initially thought a PM was a crazy addition to a technical lab, but it was a great idea.

What To Ask Microsoft Beforehand

Find out exactly who will be involved from Microsoft’s end, the extent of their involvement, and their schedules. Our particular engagement was arranged at the last minute, and as a result, we didn’t get the quantity of Microsoft staff that we’d expected. The onsite experts had already been booked for other client engagements, and when we ran into problems, we couldn’t always get the help we needed. Getting the exact list of MS people who will be in each day’s activities helps to set the right expectations as to how much work will be done by the client, versus how much will be done by Microsoft.

Ask Microsoft to set up VPN access ahead of time to the lab for offsite team members who can’t go to the MTC. These team members, like people who can’t leave the office, will still be a big part of the lab engagement and they’ll need to contribute in order for the lab to keep moving forward. We had difficulties with our corporate firewall blocking VPN access to the MTC lab, and to get around it, I had to resort to installing the client on all of our MTC lab machines.

Consider a Microsoft SQL Server Health Check First

You can save a lot of time & money by doing a Microsoft SQL Server Health Check beforehand.  Microsoft sends a SQL Server expert to your site, observes your hardware & configuration in your environment, and delivers a set of recommendations.

At Microsoft’s Technology Center in Chicago



Carlos and I flew into Chicago midday yesterday, and a winter storm came in right after us.

We’re at Microsoft’s Technology Center on Wacker drive, and our work area on the 23rd floor has a wall of windows with a great view of the city.  We watched the sky get darker in the afternoon as the snow marched in.  After work, our dinner table looked out on the river and the city streets as the white stuff accumulated.  What a peaceful, relaxing way to spend the holidays – visiting, that is, not living here or commuting in it.

This morning, as I walked the couple of blocks from the hotel to the MTC, I had a great time looking around and shooting pictures.  Fresh snow is gorgeous before everybody (like, uh, me) trudges through it and leaves dirty footprints and slush.  The poor folks who have to come in an hour from now, well, that’s another story.

Our visit to the MTC comes right before deploying a new in-house application.  We’ve struggled with some performance problems in the SQL and the code – nothing we couldn’t solve with time, but we haven’t been able to get the dedicated hardware time or the dedicated staff time.  Our Microsoft contacts saw what was going on and responded by offering us time up at the MTC with their gurus.

That solved the dedicated hardware time, and we thought it would solve the dedicated staff time by forcing our guys to sit in a conference room thousands of miles away from day-to-day production needs, just focusing on getting the job done.  Not so much – we still had dueling teams of execs trying to get us to prioritize one project over another, even with us flying here to this standalone environment!  Funny.

Yesterday, I focused on getting a new VMware environment up (part of the deliverables are to show the performance difference between a virtual app server and a physical one), building a few application servers, and whiteboarding the ETL process with the Microsoft guys.  Today, we’re shooting for performance benchmarks on our from-scratch environment.

SQL Backup Software: Part 2 – Quad Cores are Changing the Game

Backup and Recovery

In my last post, Why Native SQL Backups Suck, I talked about the weaknesses of native SQL Server backups. Today, I’m going to extend that a little by talking about one of the greatest surprises for DBAs in recent history: the advent of dirt-cheap multi-core processors that don’t cost extra for SQL licensing.

How SQL Server Licensing Is Affected by Quad Cores

Microsoft SQL Server is licensed by the CPU socket, not the core, so it costs the same to license a single-core CPU as it does a quad-core CPU. I’ve used that logic to convince executives to upgrade older single-core database servers to new multi-core hardware because they can often pay for the server hardware via license savings. It’s twice as cheap to license a brand new 2-cpu quad-core box than it is to license a 4-cpu box with single cores, and the license savings completely pays for the cost of a new server.

Most of the time, quad-core CPU’s aren’t really a compelling feature for database administrators because SQL Server experiences more I/O backups than CPU power backups. We pour money into drives, HBA’s, and preach the benefits of raid 10, but we don’t spend a lot of time comparing processors in great detail. I/O is the big bottleneck. This is especially true during the backup window. Backing up a SQL Server database consists of reading a lot of information from drives, and then writing that same information to another set of drives (either local or across the network).

So during the backup window, we have all these extra cores sitting around idle with nothing to do.

Let’s Do Something With Those Cores!

Why not use that extra idle CPU power to compress the data before we send it out to be written?

The users won’t notice because they’re already waiting on I/O anyway, especially during backup windows when we’re taxing the I/O subsystems.

If we dedicate this extra CPU power to data compression, we now have smaller amounts of data being sent out for writes. Our backup size gets smaller, which in turn – decreases our I/O load! In effect, we’re trading CPU power for I/O power. The more CPU power we have for data compression, the more I/O we free up.

The equation gets interesting when we start to relate how much I/O speed we buy with each additional processor core. Going from a single-core CPU to a quad-core CPU enables a massive amount of backup compression power, which means much less data needs to be written to disk. If less data is being written to the backup target, then we have two options: our backup windows become shorter, or we can use cheaper/slower disks.

Using Backup Compression To Save Money

Choosing the latter method means that the shiny new quad-core database server may pay for itself. I’ve been able to say, “You need more drives for your new project? I’ll sell you my raid 10 of high-end, 73gb 15k SAN spindles because I’m downsizing to a raid 5 SATA array.” Trading off those expensive drives enabled me to buy more quad-core database servers, which could compress the backup files better, and I could live with the SATA drives as a backup target. My backup time window stayed the same, and I gained faster CPU power outside of my backup window because I had more cores.

Cheap quad-core processors enable a database administrator to trade CPU power for I/O speed in the backup window – but only when using those newfound cores to actively compress the backup data. SQL Server 2000 & 2005 can’t natively do that, and that’s where backup compression software comes in.

The same quad-core power works in our favor at restore time, too. During restores, the SQL Server has to read from the backup file and then write those objects out to disk. With backup compression software, the server does less file reads from the backup file because the backup is smaller. This means faster restores with less I/O bottlenecking, and fast restore times are important to a DBA’s career success. The faster we can restore a database in an emergency, the better we look.

Old Servers Trickle Down to Dev & QA

This pays off in another (albeit obscure) way: development & QA servers. At our shop, we’re constantly replacing big, multi-cpu (but single-core) servers with smaller quad-core servers. As a result, we have a lot of 4-way and 8-way servers lying around that are relatively expensive to license in production. They make absolutely perfect development & QA SQL Servers, though, since SQL Server Developer Edition isn’t licensed by the socket, but instead by flat rate. I’ve been able to take these 8-way servers by saying, “No one else can afford to license these for their applications, but I can use them for development.” Then, those 8 cores pay off in faster restores from our production database. I’m able to refresh development & QA environments in shorter windows because I can uncompress them faster than I would on a smaller server.

If faster backup & restore windows were the only tricks available in backup compression software, those alone would be a great ROI story, but there’s more. In the next part of my series, New Features for SQL Backup and Restore, we’ll look at ways backup software vendors are able to jump through hoops that native backups can’t.

Continue Reading New Features for SQL Server Backups

SQL Server Backup Software: Part 1 – Why Native SQL Backups Suck

Backup and Recovery

Before we start looking at SQL Server backup compression software, we need to spend a few minutes looking at the weaknesses of the native SQL Server backup process. In order to judge the fixes, we have to know what’s broken.

Native SQL Server backups take the same disk space as the data.

When we back up 100gb of data with a native backup, we’ll end up with a 100gb backup file. If a database has 100gb allocated, but it’s half empty (like in the case of unused log files), then the backup size will be roughly 50gb – the size of the data.

Large amounts of data take a long time to write to disk.

The slowest thing in the backup process is usually writing the backup file, whether it’s over the network or to local disk. Reads are typically faster than writes, so unless the database is under heavy transactional load at the time of the backup, the reads won’t be the bottleneck. As a result, the more data that has to get written to disk, the longer the backup will take.

We could alleviate that by purchasing faster and faster arrays for our backup targets, but that gets pretty expensive. Our managers start to ask why the DBA’s fastest raid array is being used for backups instead of the live data!

Large amounts of data take a REALLY long time to push over a network to a DR site.

This affects log shipping or just plain copying backup files over the WAN. Compressing the data as little as 25% cuts transmission times by that same amount, and cuts the amount of bandwidth required to replicate the application data. In a large enterprise where multiple applications are competing for the same WAN bandwidth pipe, other teams will ask why the SQL DBA can’t compress their data before sending it over the wire.

We can work around that problem by installing WAN optimization hardware like a Cisco WAAS appliance, but these have their own drawbacks. They must be installed on both ends of the network (the primary datacenter and the DR site), require a lot of management overhead, and they’re expensive. Really expensive.

Another workaround is to compress the backup files with something like WinZip after the backup has finished, but that’s a manual process that has to be automated by the DBA, actively managed, and adds a lag time for the compression before the data can be sent offsite.

SQL Management Studio doesn’t come with reports about the backup process.

Business folks like to say, “You get what you measure.” The idea is that if you start numerically measuring something in an objective way, that number will start to improve simply because you’re focusing on it and talking about it with others. SQL Server native backups are something of a black box: there’s no quick report to show how long backups are taking per database, how often they’re failing, and how long it would take to do a full restore in the event of an emergency.

I find it hilariously ironic that my job as a database administrator revolves around storing precise metrics for others, enabling them to do dashboards and reports, but SQL’s native backup system doesn’t offer any kind of dashboard or report to show its own backup & restore times and successes. SQL 2005 SP2 started to offer some database performance reports inside of SQL Server Management Studio, but they still don’t address the backup/restore metrics.

An ambitious DBA could build their own reports, but they have to manually consolidate data from all of their database servers across the enterprise and keep it in sync. Whew – I get tired just thinking about that. (I should probably subtitle my blog as “The Lazy DBA” come to think of it.)

Cross-server restores are a ton of manual work.

If the DBA wants to bring a development server up to the most current production backup, including transaction logs, they either have to write a complicated script to parse through a list of available backups, or they have to do a lot of manual restores by picking files.

Even worse, in the event of a disaster, the database administrator has to scramble through directories looking for t-logs, writing restore scripts, and hoping they work. Really good DBA’s plan this scenario out and test it often, but let’s be honest: most of us don’t have that much time. We write our DRP scripts once, test them when we have to, and cross our fingers the rest of the time.

That frustrates me because the restore process has been the same since I started doing database administration back in 1999 as a network admin. For years, I looked for the most reliable restore scripts I could find, played with them, spent time tweaking them, and wasted a lot of time. In my mind, this is something that should be completely integrated with the shipping version of SQL Server just because it’s so central to a DBA’s job.

Enough Backup Problems: Show Me Solutions!

So now we’ve seen some of the weaknesses in SQL Server 2005’s native backups. In my next couple of blog posts, I’ll talk about how third party backup compression software gets around these obstacles and offers better features with more capabilities.

Continue Reading with Part 2: Quad Cores are Changing the Game

SQL Performance Tuning: Estimating Percentage Improvements

1 Comment

When I’m doing performance tuning on an application in the early stages of its lifecycle (or any app that’s never had DBA attention before), I end up with a ton of recommendations within the first day of performance tuning.  The resulting to-do list can seem overwhelming to project managers and developers, so I include one of the following two sentences as a part of each recommendation:

  • This change will improve performance by a percentage, or
  • This change will improve performance by an order of magnitude

I know, I know, that phrase doesn’t come up too often, so it helps to check out Wikipedia’s definition of order of magnitude:

“Orders of magnitude are generally used to make very approximate comparisons. If two numbers differ by one order of magnitude, one is about ten times larger than the other.”

So the two sentences translate into:

  • This change will improve performance by 10-90%, or
  • This change will improve performance by 10-100x

I could use those latter two sentences instead of the “percentage versus order of magnitude” sentences, but those latter sentences make me sound like I’m taking wild, uneducated guesses.  In reality, sure, I am taking wild, uneducated guesses, but on an informed basis – I’m just not putting a lot of time into categorizing the improvements.

Jeff Atwood’s excellent Coding Horror blog has a two-part post about estimation that should be required reading for every DBA.  Part 1 is a quiz, and Part 2 explains the answers.

So why am I leaving so much gray area in my recommendations?  Why break suggestions into such widely varied categories?  Is it smart to lump a 10% improvement in with an 80% improvement?  In the early stages of performance tuning, yes, because the DBA can’t necessarily predict which changes will be the easiest to implement, or the order in which they’ll be implemented.  When a database administrator first looks at an application, queries, stored procedures and database schema, some things pop out right away as massive opportunities for performance gains.  These recommendations are so instrumental to application performance that they often have wide-ranging impacts across the entire app.  After those changes are made, everything speeds up so much that the other recommendations have even less of an impact than they might have originally had.

For example, in a project I’m currently tuning, I found that the three largest tables in a database (which had ten times more records than all of the remaining tables combined) were constantly queried by a single field, the equivalent of a DivisionID integer field.  All of the queries hitting those tables included the DivisionID, and the application frequently did huge update statements that affected all records with a single DivisionID number.  Partitioning those three tables by DivisionID and putting each DivisionID on its own set of disks would result in a staggering performance improvement and a tremendous increase in concurrent nightly ETL processing, since more divisions could run simultaneously.

I made other performance recommendations as well, but frankly, if the developers implemented every other recommendation except the partitioning, they would still have been struggling with their nightly windows, and the implementation time would have put the project way behind on their deadlines.  On the other hand, if they just implemented the partitioning, they would sail through their nightly windows and make their project delivery deadline.  That’s the definition of an “order of magnitude improvement.”