So you’re completely pissed off that your third party software isn’t working, and you’re getting ready to call the vendor’s support line and let loose a stream of obscenities. I feel you: I’m using Symantec Antivirus too. <rimshot> But before you pick up the phone, stop and think about a few things that will help you get a much better support experience.
1. Did you check the Event Logs?
Don’t leap to conclusions and assume that everything else in the system is working absolutely perfectly. Go into Control Panel, Administrative Tools, Event Viewer. Look at the Application and System event logs. If you see so many red and yellow icons that it looks like a pinata, then maybe you’ve got a bigger problem to troubleshoot.
Don’t just check the exact date/time of your problem, either: sometimes a long-running problem will affect what you’re trying to do.
Example: a user’s backups are taking far longer than expected. Scrolling back through the event viewer, the RAID card noted that a drive had failed, and the system was running under degraded performance while the array was being rebuilt with the hot spare drive.
2. Does it work without the vendor’s product?
If you’re troubleshooting, say, a SQL Server backup product, and you’re getting all kinds of errors when you try to do a backup or a restore, try doing it without the third party product. Do a native-only backup/restore/whatever, and see if the same issues happen.
If they do, don’t bother calling the vendor, because you’ve got a bigger problem. But don’t call Microsoft yet, because you might not have removed enough unknowns from the equation.
3. Narrow it down as far as possible.
Make a list of everything involved with the issue, and then rule out each of those parts. The more specific you can be about the problem, the faster you can get a fix.
The worst example of this is when someone calls in saying, “I get an error now and then.” What error? What are you doing when the error occurs? What’s running? What changed? Let’s help by ruling things out.
Example: let’s say our nightly jobs to back up a database keep failing. My list of moving parts might include:
- The backup script
- The SQL Server Agent job
- The backup software
- The database
- The SQL Server instance
- The server hardware
So then to rule out each of those, I would do the following:
- The backup script – try using a simple “BACKUP” command, outside of any other scripts, or use the SQL Server Management Studio GUI
- The SQL Server Agent job – try a different time of day. True story – had a customer whose jobs failed every night, and we eventually figured out that somebody had an OS-level scheduled task to stop SQL Server at a specific time – which happened to be during the backup time window.
- The backup software – try using only native backups, not third party apps.
- The database – try backing up a different database. If it’s a big database, try a small one, like model/msdb/master/etc. If those work, then try another large database.
- The SQL Server instance – it can be SQL Server-level settings like service account permissions. Try using another instance with the same database and see if it breaks.
- The server hardware – don’t just think the server itself, but break this down into pieces too. If you’re using iSCSI as a backup target, try backing up to local disks. If you’re using local disks and you’ve got SAN space available to test, try backing up to the SAN.
If you’re running something from your machine, then try it from another workstation.
4. Write down the exact steps to reproduce the problem.
To get the absolute fastest support possible with the best end result answer, boil the issue down to an exact set of steps that anyone can follow on their own machine. Support organizations are huge, and these days people are working from home a lot.
When you write down the steps, your target audience is a developer who just got emailed about your case from his PDA and he’s curious to see if he can reproduce it on his laptop, which doesn’t have an internet connection at the moment. If you can give him enough information to reproduce the issue without connecting to the internet, you won.
Bonus points if they don’t have to use any of your data – that way, they can’t blame your data as the problem. Even better, this lets people test that they’ve actually fixed the problem without getting your data involved.
Example: check out this problem I had in the Microsoft data mining forums. I wrote up exactly how to reproduce the problem by entering random data in an Excel spreadsheet. Anybody, from a developer to a support engineer, can test this case on their own machine.
5. Study the Art of War.
I love this book, and it comes in useful in so many business interactions. You can read The Art of War for free online. I recommend saving a copy in a text file and putting it on your PDA. The next time you’re waiting in line somewhere, bored out of your gourd, open it up and start reading.
Some of the principles won’t seem obviously beneficial to IT workers, so stroll down to your local bookstore and you’ll probably find a version of The Art of War targeted at businesspeople. There’s dozens of versions of this book with different interpretations of what it means for modern life.
In terms of a support call, here’s what it means: you don’t want to go to war. If you call support, you may not like the results. I abhor calling support because I’ve had some ugly experiences with some unqualified staff. As a result, if I can figure out how to fix the problem myself, I’ll probably learn things that will make me a better warrior – I mean, DBA.
If you have to wage war, bring overwhelming force to the battlefield. Come armed with every single event log for your system, documentation about its configuration, screenshots of errors, and the scripts that produced the error. When the support call starts, I like to ask, “Can I email you the information I’ve gathered? It’s all in a zip file.” I start that email before I even start talking, because it’ll take a while for this data to course through the internet. By the time I’ve finished my explanation, they’ve got my files.
If you didn’t bring enough force, you have to be fast. If support asks for something you didn’t get ahead of time, don’t say you’ll call them back – gather the information immediately. Don’t hang up, don’t arrange a later meeting, just get it right now. Otherwise, the person you’re working with won’t be on duty later, or they’ll have forgotten about the specifics of your case. Show that you’re willing to focus, and they’ll return the favor.
Sometimes your business managers won’t allow you to stop production to gather the information you need. In that case, make it abundantly clear to management that you want to move forward, but they’re choosing not to. That way, when a manager comes in and asks why you haven’t fixed the problem yet, you can point to the exact email where the business management said you had to delay until a given date.
I’m sure I’m missing more links between The Art of War and calling support – heck, I could have made an entire blog entry out of that – but I’m off to follow up with my next support call….
Bonus Tip from K. Brian Kelley: Know your escalation levels.
Escalation is the process of moving things up the chain to get to a higher level of support. There’s usually a documented process: first level will work on it a given amount of time or until they feel like they can’t make any more progress, and then they’ll escalate the call up to the second tier of support. That second tier is more experienced and has more tools at their disposal. If they can’t solve it, they escalate it up another tier, and eventually you’re talking to the product developers.
When the call starts moving, make it clear what your timeline requirements are with support. When I have a production-down situation, I ask the support engineer to agree at the beginning that we’re going to escalate the call if we don’t have significant progress within one hour. (That time can be different depending on how severe the outage is.) Then every time we escalate, I repeat that process with the new engineer.
On the other hand, if I’m dealing with a non-urgent server, say a QA box where I’ve got a replacement available for my staff, then I’ll tell the support engineer that I can wait 24 hours before we need to escalate it. Putting a time on it helps both of you work with appropriate urgency.
You don’t have to wait for the time to lapse, either: if you’re getting frustrated with the support engineer and you don’t feel like your case is going in the right direction, ask it to be escalated. I feel horrendously guilty whenever I pull that, but the sad fact is, sometimes it’s the best thing for everybody involved. The support engineer doesn’t want to waste your time – heck, they want you off the phone too!