It’s always important to make sure that SQL Server is running at peak efficiency – nobody wants to waste resources. In the past we’ve concentrated on ways to reduce I/O – either network I/O or disk I/O. The increasing number of cores available in modern CPUs meant that CPU tuning wasn’t as much of a concern. With the licensing changes coming in SQL Server 2012, tuning SQL Server for CPU should be in front of every DBA and developer’s mind.
Old or Missing Statistics
Statistics help SQL Server determine the best way to access data. When we have accurate statistics, the database is able to examine the statistics and make good guesses about how much data will be returned by a given query. The statistics are then used to determine the most appropriate join or data access method (seek vs scan, for instance). The problem with statistics is that they’re only good for a short period of time. Statistics are recorded as a histogram that describes how data is distributed in a table or index. If the histogram is inaccurate, then SQL Server has no reliable way of knowing how much data is present.
Looking at the AdventureWorks sample database, we can see there’s an index on the SalesOrderHeader table
IX_SalesOrderHeader_OrderDate. If we use
DBCC SHOW_STATISTICS to look at the index, we can see that the most recent valued for
RANGE_HI_KEY is ‘2008-07-31 00:00:00’. What happens when we query for data that’s outside of that range? SQL Server assumes that there’s only 1 row present no matter how many rows there are. There could be 0 rows, 5 rows, or 500,000 rows. SQL Server doesn’t have any statistics about the data, so it cannot make an accurate decision.
Old, or missing, statistics can be cause huge performance problems because SQL Server will have to make a bad guess (1 row) regardless of how much data is present. SQL Server may choose an inappropriate join type and perform a large number of additional reads; a nested loop join may be used when a merge or hash join would be more appropriate.
The solution is to make sure that statistics are kept up to date.
- Understand data change rates. If data is loaded as part of regular jobs, make sure that index maintenance is performed on a regular basis and that statistics are updated during index maintenance.
- Understand the type of data change. If data is written at the end of table, there’s a chance its data won’t be included in statistics (especially in the case of something like an order date). If a lot of data coming into the system is time sensitive, regular statistics updates will be necessary to keep things performing well.
- Make sure that
AUTO_UPDATE_STATISTICSis enabled for all databases. Even if your jobs to update statistics fail to run, having
AUTO_UPDATE_STATISTICSenabled will make sure that SQL Server updates your data as it changes. Statistics will not get updated as often as we’d like, but it is better than nothing.
ORDER BY clause exists with good reason, right? Users don’t just need to get data back from the database, they want to get it back in a specific order. While that’s true, ordering data has a definite cost associated with it, if the data isn’t already in the correct order. Take a look at this query:
SELECT SalesOrderID, OrderDate, DueDate, ShipDate, SalesOrderNumber, PurchaseOrderNumber FROM Sales.SalesOrderHeader WHERE OrderDate = '2007-02-01 00:00:00.000' ORDER BY SalesOrderNumber ;
When we look at the actual execution plan, sorting the results take up 79% of the cost of the query. While this is a simple query, it’s easy to see how sorting can chew up a lot of CPU in SQL Server.
There’s no easy solution. However, many programming languages have the ability to use functional programming techniques to sort data in memory. The .NET framework makes it even easier by providing datatypes like the DataTable and DataSet. Both of these datatypes make it easy to sort data in memory in an application server. The idea here is that CPU is cheap on a Windows Server – we only pay to license the server, not each socket or CPU core. It is also much easier to horizontally scale the middle tier of an application than it is to horizontally scale SQL Server.
Functions are one of the biggest offenders when it comes to wasting SQL Server’s CPU time. The tricky part about functions is that they seem like they an easy way to encapsulate and re-use functionality. While functions accomplish this goal, there are other aspects of functions that make them a bad choice for many activities.
Simple, single statement table valued functions can be easily inlined by SQL Server. That is to say SQL Server will be able to include the execution plan of the function in the calling execution plan. The function’s execution plan can be optimized with the rest of the query. This can potentially lead to additional optimizations within the function. Multi-statement table valued functions and scalar functions won’t be optimized much of the time. Instead SQL Server will repeatedly execute the underlying function – once for every row in the result set. Over very large result sets this can result in a considerable amount of CPU utilization. IT’s often best to replace functions like this with inlined query logic. For a look at the behavior of functions in SQL server, take a gander at SQL Server Functions: The Basics.
Another downside to functions in SQL Server is that they frequently don’t get good statistics from the optimizer. When a table valued function returns data, SQL Server has no way to know (at compile time) how much data will be returned from the function. Instead it makes the assumption that there is only 1 row present in the result set from the TVF. This can result in poor join choice as well as bad data lookup decisions, just as if there were incorrect or missing statistics on an index or table.
Summing It Up
Keeping statistics up to date, moving ordering out of the database, and eliminating costly functions will go a long way to keeping your CPU utilization low. There are, of course, many other ways to combat high CPU, but these are some of the biggest culprits and some of the items that can be most easily fixed by database administrators and developers alike.