Bottlenecks and Bank Balances

6 Comments

Pop quiz: should you be worried if your SQL Server’s page life expectancy is averaging 214?

There’s only one correct answer: it depends.  Successful performance tuning boils down to a simple cycle:

  • Measure the application’s performance
  • Find the current bottleneck
  • Improve that bottleneck so that it’s not the bottleneck anymore
  • Measure to find out how much your application performance improved
  • Ask the application owner if it’s good enough now. If so, move on to the next application. If not, go back to step 2.

And every one of those steps is equally important.

If You Don’t Find the Right Bottleneck

The Bottleneck Is Plenty Big Enough for You
This Bottleneck Is Plenty Big Enough for You

Performance tuning isn’t about zooming in and focusing on a single number in incredible detail; rather, it’s about stepping away and getting the full picture. Time and again, I get emails asking about whether a single metric is OK, but upon questioning, the DBA has leapt to a conclusion without surveying the environment as a whole. If you spend your tuning time closely examining a single metric, trying to figure out how to improve that one metric, you might not improve the performance of your application.

Sure, your page life expectancy might be pretty bad – but is that the one thing keeping your application from performing faster?

Take a step back and gather a complete set of Perfmon metrics. Look at CPU, memory, disk, and network performance as a whole. Find the thing that’s in the absolute worst shape possible. In that link, I explain the order that I look at metrics to find which one looks like the most likely bottleneck.

If You Don’t Focus on Improving the Bottleneck

I was recently working with a client frustrated with their application performance. I found two issues:

  • CPU-intensive user-defined functions were being called thousands of times per query
  • The storage subsystem was nowhere near as fast in practice as the vendor had claimed

The application’s bottleneck was the CPU-intensive UDFs. The server was frequently pegged at 100% CPU, and queries just couldn’t run any faster until they were rewritten to rip out the UDFs. I put together a recommended plan of action to take those UDFs out, which would make the application an order of magnitude faster. I noted that they should probably start working on the storage performance in a second track, because the instant the UDFs were removed, storage was going to become a problem. With the CPU-burning UDFs out of the way, the server would be able to churn through more records faster, but the storage subsystem wouldn’t be able to deliver records fast enough to satisfy the users.

On our next status update call, they said they’d reworked the storage subsystem. SQLIO reported dramatically faster storage throughput, but they were only seeing a minor improvement in application performance. I had to break the bad news to them that they’d focused on the wrong problem first. After we revisited my report together, they pursued the UDFs with renewed vigor, and suddenly the application was blazing fast. Thankfully I’d documented my findings in writing, but if I’d have been an internal employee, I might have communicated that in verbal form instead. I might have lost the ensuing battle to fix the UDFs because the manager would have thought my advice was bogus.

If You Don’t Measure Your Improvements

You Get What You Measure
You Get What You Measure

All DBAs are consultants.

Some of us think we’re full time company employees, but in reality, we’re delivering a service. Whether they’re developers, project managers, end users, or other DBAs, they’re looking at you just as if you were an outsider. You’re expected to stride in, identify the problem, mitigate it, and show that your work delivered a return on investment. The investment, in case you’re not following, is your paycheck.

Don’t believe me? Poke around in an application and then throw up your hands, saying you can’t find a way to make it go faster. The next phone call the project manager makes will be to an external consultant, and the project manager probably won’t call you first next time. (Sometimes, that’s not a bad thing.)

If you put in a lot of hard work to make an application go faster, but you don’t measure the before-and-after effects of your work, someone else is going to take credit. The developers will say the improvements were due to a new version of their code, because they’re working on the code at the same time you’re working on the database. The sysadmins will say they defragged the muffler bearings. The SAN guy will say he tweaked the flux capacitor. The project manager will say he made everybody worked long hours and that did the trick. The only way, the ONLY way that the DBA can ever take credit is to take clear before-and-after measurements for proof. Run the same code base before & after your tweaks, and measure application performance. Follow up with a written report, even if it’s a one-paragraph email, summing up your changes and the performance improvements, and copy your manager on it.

If You Don’t Ask the App Owner If It’s Good Enough

Never confuse what YOU think about a metric with what the BUSINESS thinks about a metric. Your CEO doesn’t care about page life expectancy. (Your CEO probably doesn’t even care about the DBA’s life expectancy.) Before you spend time or money improving an application, you have to find out whether it’s the most important thing to your business right now.

Say hello to the most important metric you will ever calculate: opportunity cost.

A Long Night at Zig Zag Has Its Own Opportunity Cost
A Long Night at Zig Zag Has Its Own Opportunity Cost

Opportunity cost is the cost of doing something as compared to the cost of doing something else. If you spend eight hours today improving the page life expectancy of a particular server, is that worth more to the business than anything else you could be doing in those eight hours? Could you spend eight hours doing something more valuable?

I use opportunity cost whenever anyone asks me to do something.

As an employee, if a project manager asks me to tune a particular application, I bring them into my manager’s office and say, “It will take me three days to make that application faster. I’ll probably make it an order of magnitude faster, because I’ve never tuned that server before. However, if I take those three days to do it, I won’t make the deadline for Project Snazzywidget. Which one is worth more to you?” At that point, it’s a political decision and a business decision, not a technical decision. If you’re doing the best job of any employee he has, your manager will put you on the most valuable project – which in turn increases your worth again.

As a consultant, I approach the problem differently: “Here’s the thing – I could spend another three days working on this application, but from this point forward, I’m only going to be able to achieve incremental improvements, not the order-of-magnitude improvements we saw in the first few rounds of tuning. I hate to make you guys go through that for a small gain – but is there another application in-house that isn’t delivering the performance you want?” This resets the client’s expectations, and they start seeing you as a weapon that they can point at slow applications. They’ll cherish your time and focus you where your effort will pay off the most. This keeps your perceived ROI high. If you deliver jawdropping results each time you tune an application, you can justify higher billable rates.

After all – isn’t your bank balance the one metric you really want to improve?

Previous Post
Travel Tips for Non-Frequent Flyers
Next Post
Cloud-based database thoughts before #SQLPass

6 Comments. Leave new

  • Great article Brent, I love the booze references throughout.

    Reply
  • I see hope Brent. :o)
    This topic is more a bsuiness topic than not. I keep thinking that IT is business and it’s driven by the same business metrics that drive the rest of the company.
    Younger developers/DBAs/Whatever should understand this as the first lesson the first day at work.

    As usual, excellent!

    Reply
  • “All DBAs are consultants.” So True.

    Reply
  • Stephen Dyckes
    October 27, 2009 11:42 am

    I cannot stress enough how important it is to keep a tangible record of the improvements! And then to toot your own horn (not many other people will toot your horn for you). Great article as usual, you hit the nail on the head.

    Reply
  • Your post hits the nail on the head once again. I wanted to affirm two key things I took away from your article.

    1. Obtain performance metrics. Tangible data is king. I agree that your business manager’s might not know what page life expectancy is but when push comes to shove and you are in a board room with managers that claim it is your fault that their application is running slowly, you will thank your lucky stars that you have numbers to help prop up your case. I am forever dealing with people claiming things are ‘slow’ and I am doing nothing about it. I sometimes enjoy this conversation. I look at their bosses and say, ‘…actually here is the performance data which I have recorded and used SQL Server Analysis Services to help me process and here is what the real problem is…’. I of course always thank Brent for his link on how to analyze perfmon results in the cloud. 10 times out of 10 when all they have is ‘it’s slow’ and you have actual numbers, preferable with pretty colors and graphs, you will win.

    2. Let someone else fight the political fight about what takes priority. Don’t get in the middle of it or you will end up with no hair eating bacon flavored ice cream and turn into a giant ball of stress. That is what your boss and your boss’ boss get paid for. Make them work for their money while you do what you love, being a dba of course.

    Brent, Thanks again! Great post.

    Reply
  • Great article.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.