Silent CPU Killers

10 Comments

It’s always important to make sure that SQL Server is running at peak efficiency – nobody wants to waste resources. In the past we’ve concentrated on ways to reduce I/O – either network I/O or disk I/O. The increasing number of cores available in modern CPUs meant that CPU tuning wasn’t as much of a concern. With the licensing changes coming in SQL Server 2012, tuning SQL Server for CPU should be in front of every DBA and developer’s mind.

Old or Missing Statistics

Statistics help SQL Server determine the best way to access data. When we have accurate statistics, the database is able to examine the statistics and make good guesses about how much data will be returned by a given query. The statistics are then used to determine the most appropriate join or data access method (seek vs scan, for instance). The problem with statistics is that they’re only good for a short period of time. Statistics are recorded as a histogram that describes how data is distributed in a table or index. If the histogram is inaccurate, then SQL Server has no reliable way of knowing how much data is present.

Looking at the AdventureWorks sample database, we can see there’s an index on the SalesOrderHeader table IX_SalesOrderHeader_OrderDate. If we use DBCC SHOW_STATISTICS to look at the index, we can see that the most recent valued for RANGE_HI_KEY is ‘2008-07-31 00:00:00’. What happens when we query for data that’s outside of that range? SQL Server assumes that there’s only 1 row present no matter how many rows there are. There could be 0 rows, 5 rows, or 500,000 rows. SQL Server doesn’t have any statistics about the data, so it cannot make an accurate decision.

Old, or missing, statistics can be cause huge performance problems because SQL Server will have to make a bad guess (1 row) regardless of how much data is present. SQL Server may choose an inappropriate join type and perform a large number of additional reads; a nested loop join may be used when a merge or hash join would be more appropriate.

The solution is to make sure that statistics are kept up to date.

  1. Understand data change rates. If data is loaded as part of regular jobs, make sure that index maintenance is performed on a regular basis and that statistics are updated during index maintenance.
  2. Understand the type of data change. If data is written at the end of table, there’s a chance its data won’t be included in statistics (especially in the case of something like an order date). If a lot of data coming into the system is time sensitive, regular statistics updates will be necessary to keep things performing well.
  3. Make sure that AUTO_UPDATE_STATISTICS is enabled for all databases. Even if your jobs to update statistics fail to run, having AUTO_UPDATE_STATISTICS enabled will make sure that SQL Server updates your data as it changes. Statistics will not get updated as often as we’d like, but it is better than nothing.

Ordering Data

The ORDER BY clause exists with good reason, right? Users don’t just need to get data back from the database, they want to get it back in a specific order. While that’s true, ordering data has a definite cost associated with it, if the data isn’t already in the correct order. Take a look at this query:

When we look at the actual execution plan, sorting the results take up 79% of the cost of the query. While this is a simple query, it’s easy to see how sorting can chew up a lot of CPU in SQL Server.

There’s no easy solution. However, many programming languages have the ability to use functional programming techniques to sort data in memory. The .NET framework makes it even easier by providing datatypes like the DataTable and DataSet. Both of these datatypes make it easy to sort data in memory in an application server. The idea here is that CPU is cheap on a Windows Server – we only pay to license the server, not each socket or CPU core. It is also much easier to horizontally scale the middle tier of an application than it is to horizontally scale SQL Server.

Functions

Functions are one of the biggest offenders when it comes to wasting SQL Server’s CPU time. The tricky part about functions is that they seem like they an easy way to encapsulate and re-use functionality. While functions accomplish this goal, there are other aspects of functions that make them a bad choice for many activities.

Simple, single statement table valued functions can be easily inlined by SQL Server. That is to say SQL Server will be able to include the execution plan of the function in the calling execution plan. The function’s execution plan can be optimized with the rest of the query. This can potentially lead to additional optimizations within the function. Multi-statement table valued functions and scalar functions won’t be optimized much of the time. Instead SQL Server will repeatedly execute the underlying function – once for every row in the result set. Over very large result sets this can result in a considerable amount of CPU utilization. IT’s often best to replace functions like this with inlined query logic. For a look at the behavior of functions in SQL server, take a gander at SQL Server Functions: The Basics.

Another downside to functions in SQL Server is that they frequently don’t get good statistics from the optimizer. When a table valued function returns data, SQL Server has no way to know (at compile time) how much data will be returned from the function. Instead it makes the assumption that there is only 1 row present in the result set from the TVF. This can result in poor join choice as well as bad data lookup decisions, just as if there were incorrect or missing statistics on an index or table.

Summing It Up

Keeping statistics up to date, moving ordering out of the database, and eliminating costly functions will go a long way to keeping your CPU utilization low. There are, of course, many other ways to combat high CPU, but these are some of the biggest culprits and some of the items that can be most easily fixed by database administrators and developers alike.

Previous Post
sp_Blitz® Updated to v5
Next Post
The 9 Letters That Get DBAs Fired

10 Comments. Leave new

  • I agree, functions are often a source of evil.

    PS: The link to ‘SQL Server Functions: The Basics’ is broken.

    Reply
  • Excellent post, Thanks Brent.

    Reply
  • This is going to be interesting for places like the one where I’m currently working. They’ve been pushing for thinner virtual app servers and pushing the load onto the large DB boxes (hangover from a transition from Oracle web apps). Wonder which way they’ll jump on the tradeoff between core licensing and developer time to rewrite code.

    Reply
  • (Bad link for “SQL Server Functions: The Basics.”)

    Good article 🙂

    Reply
  • cpu utilization is more in sql server (configuration is 8 core 128 gb RAM ) and they have increased to 16 core 248 gb RAM and again we are getting same issue .what could be the reason behind this , some time huge transactions are going on so can you please suggest how can we resolve this
    please guide me what are the basic checks we have to do

    Reply
    • That sounds like a bloated plan cache issue, no matter how many CPUs are added, eventually they will redline. You need to track down the abusing query(s) in the plan cache. Also, adding resources is completely dependent on the SQL Server version and edition. It would also assist if you could define “huge transactions”

      Reply
  • This happens in AWS EC2 instances

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.