Using SQL Server Without Monitoring Software is Like Overdriving Your Headlights

Last Updated 3 years ago

The strength of your car’s headlights, the speed of your reactions, and the power of your brakes determine how fast you can drive safely at night.

Your headlights give you enough warning time to hit the brakes and stop your car before you hit an object. The better your headlights, the more distance you can see ahead, and the faster you can drive with confidence.

Let’s say your car’s headlights illuminate 350 feet in front of your car. Sure, you can see reflective signs from much farther away, but if a pile of lumber fell off a truck on the highway, it’s not going to be covered with reflective paint. To find out whether that 350 foot distance is far enough, we need to know three things:

How fast is your reaction time? – What’s the length of time you see the pile of lumber until the time your foot starts to move on the brake pedal? This length of time is affected by your alertness, your training, your muscle speed, and the iPhone you’re playing with while you’re driving.
How fast is your car going? – At 60 miles per hour, your car traveled 132 feet in the 1.5 seconds it took before your foot began to push the brake pedal. The faster you’re going, the more distance your car will travel.
How good are your brakes? – A Porsche 911 can stop from 60mph in 100 feet. If you’re going faster, or if your brakes aren’t as good as a 911’s, then you’re going to eat up more distance.

These factors combine to determine your safe stopping distance. Michael Schumacher might be able to drive at 100mph in a Ferrari using candles for headlights. You, however, are pushing the limits of your talents at 30mph even with those expensive fake xenons you installed on your Civic Si. Err on the wrong side of this formula, and you’re doing what’s called overdriving your headlights. People often learn this when the state trooper fills out the forms explaining why they hit the deer.

Monitoring and Forecasting: SQL Server’s Headlights

Most of us don’t bother with predictive analysis – we gather statistics with Perfmon, the DMVs, wait stats, and so on, but we’re looking in the rear view mirror – not ahead of us. Heck, some of us don’t even go that far; we wait until our phone rings and the users scream about dead bodies – uh, I mean, dead queries – lying around. We don’t have the time to project future growth and capacity needs because it’s a painful, time-intensive process with the native tools.

Serious performance tuning, though, requires looking ahead. Imagine being the head DBA for the New York Stock Exchange or NASDAQ systems: you simply can’t afford to wait until your end users call about lag times. You have to predict performance needs as far ahead as possible in order to design and implement an infrastructure to support those needs.

Unfortunately, there’s no quick and easy way to forecast how much our SQL Server CPU, memory and storage needs are going to grow in the next twelve months. DBAs can build their own data warehouses with performance data sourced from the DMVs, wait stats, SQL 2008’s Performance Data Warehouse, or Perfmon counters. To anticipate future needs and look down the road, the DBA would build time-based reports that use historical information to predict the future. This is left as an exercise for the reader.

Another approach is to use your company’s financial metrics to predict system loads. DBAs in the financial industry often relate their system loads to stock trading volumes. They know that at a certain level of stock trading, their systems will have a certain level of loads. Instead of guessing SQL Server load directly, they can look at their financial analysts’ predictions of future stock volumes and use those to predict SQL Server activity. Your business may have similar transaction volumes – sales quantities, employee head counts, or widget production rates – that can help you guesstimate future database loads.

The farther down the road you can see, the more time you buy for reaction and braking.

Reaction Times in the DBA World

Reaction times measure the lag between your monitoring software’s alerts and your first action. Take one step backward if you’ve got an Outlook rule set up to automatically move all performance alerts out of your inbox and into another folder for later perusal. Uh-huh, I thought so. If you want to get a job driving the Ferraris of SQL Servers, you’re going to need to break the rules – specifically, your Outlook rules. Michael Schumacher didn’t win titles driving cars with automatic transmissions, and you can’t do serious SQL Server performance tuning by ignoring alerts.

The first challenge is to reduce false alarms coming from your monitoring system. Reconfiguring your monitoring system can be a constant challenge, especially if your monitoring systems are controlled by a different group in the company. This is one of the things I like about SQL Server monitoring software – the dashboard metrics show a historical high/low range, so you’re only alerted when a metric is outside of the established norm for a given server. Some even refine their baselines based on the day of week and time of day, because a CPU load metric that’s perfectly normal for noon on a workday might be completely out of the ordinary on Saturday at 9AM.

The second challenge is to be able to act faster when the alert comes in. In my experience, reaction times seem to be grouped in sizes of companies:

Small companies with one DBA – as soon as the DBA sees the alert, they either know right away exactly what action they’re going to take, or they have to hit the web to figure things out. They don’t have more senior DBAs on staff who may have seen a particular problem before, so some problems take longer than others to research before reacting.
Midsize companies with a couple/few DBAs – one DBA might be on call, but they may not know exactly what action they’re going to take right away. They may need to get approval from other DBAs before acting, or they may have security restrictions that stop them from taking serious actions like rebooting servers.
Large enterprises with multi-tier support groups, rigid policies, and run books – some problems are handled autonomously by help desk staff or first level support, and more serious problems are quickly escalated to the appropriate staff.

Each type of company has different solutions for reducing reaction times, and this sounds like a great idea for a future blog post series. Note to self…

Braking Times: The Time Required to Fix Things

Just as race teams improve their stopping power by using new tools like ceramic brakes, you can use more advanced technology to help you solve problems faster. Some of the examples include:

Clustering improves your reaction time when dealing with broken server hardware.
Database mirroring helps you recover from a borked SAN or page corruption quicker.
SQL Server Enterprise Edition helps your database mirroring become even more powerful by compressing the data and by fetching corrupt pages from the mirror.
Virtualization can give you high availability for servers that might not ordinarily be able to afford HA.

Just like ceramic brakes, which can cost upwards of $10,000, none of these technologies are free. If you wanna go seriously fast, you gotta spend serious money, despite what you heard from that guy selling “Type R” stickers on eBay. Start by improving your reaction times first, because that’s free.

Another Giveaway: Keyboards and Mice

What’s On My Bookshelf?

1 Comment. Leave new

Dave Schutz
September 25, 2009 6:58 pm

Sometimes I know what those deer feel like.

Reply