Blog

Aggregate functions are convenient – they solve a business need and they make development easy. Unfortunately, not all business requirements are so easy to solve. Let’s look at one example: rolling averages.

The Rolling Average

A rolling average is a simple concept; an average is computed over a fixed subset of data. Rolling average calculations are most frequently used with time series data and help remove short term fluctuations while highlighting long term trends – utility bills often feature a rolling average of consumption to help the customer understand their usage. Consumers aren’t concerned about electricity usage being high on one day in August when there was a heat wave; they want to see how their consumption is changing over time.

Rolling Averages with Common Table Expressions: 2005 – 2008R2

Starting with SQL Server 2005, it became easy to write a rolling average in a single T-SQL statement using a Common Table Expression. CTEs rapidly became popular; an incredible amount of prevously difficult functionality was made possible through CTEs including recursive queries and rolling averages. Take a look at this example in the AdventureWorks2012 sample database:

WITH    cte
          AS ( SELECT   DENSE_RANK() OVER ( ORDER BY tm.YearName, tm.MonthOfYearNumber ) AS r ,
                        tm.YearName AS [year] ,
                        tm.MonthOfYearNumber AS [month] ,
                        SUM(SubTotal) AS SubTotal
               FROM     dbo.TimeMaster tm
                        LEFT JOIN Sales.SalesOrderHeader AS soh ON tm.ActualDate = soh.OrderDate
               WHERE    tm.ActualDate BETWEEN '2005-07-01'
                                      AND     '2008-08-01'
               GROUP BY tm.YearName ,
                        tm.MonthOfYearNumber
             )
    SELECT  cte1.[year] ,
            cte1.[month] ,
            AVG(cte1.SubTotal) AS AverageSubTotal
    FROM    cte AS cte1
            JOIN cte AS cte2 ON cte1.r > ( cte2.r - 12 )
                                AND cte1.r <> cte2.r
    GROUP BY cte1.[year] ,
            cte1.[month]
    ORDER BY cte1.[year] ,
            cte1.[month] ;

While not the most straightforward approach to constructing a rolling average, the CTE manages to get the job done. In this query, we are using the CTE to create a work table and then performing a self-join. This same sort of thing is possible using a temporary table or table variable, but the CTE accomplishes it in one statement and is, arguably, easier to read.

Common Table Expressions also hide a dark secret – SQL Server executes the CTE body every time the CTE expression, cte in this example, is referenced. The more complex the Common Table Expression is, the more work that has to be performed. Running this rolling average with STATISTICS IO turned on, it’s easy to see the multiple executions in the form of two scans on each table:

Table 'TimeMaster'. Scan count 2, logical reads 14, physical reads 1, 
    read-ahead reads 5, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 2, logical reads 208, physical reads 1, 
    read-ahead reads 102, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 147, physical reads 0, 
    read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.

On a database this small, this doesn’t pose major performance problems, but this will cause big problems for a moderately sized database.

Rolling Averages with Window Functions: 2012 and beyond

SQL Server 2012 provided better support for windowing functions. Although support for OVER() was already available in SQL Server 2005, SQL Server 2012 brings considerably more functionality to the table. By using the ROW or RANGE clause of the windowing function, it’s possible to simplify the query and improve performance. Take a look:

SELECT  YearName ,
        MonthOfYearNumber ,
        AVG(st) OVER ( PARTITION BY YearName, MonthOfYearNumber 
                       ORDER BY YearName, MonthOfYearNumber
                       ROWS 12 PRECEDING )
FROM    ( SELECT    tm.YearName ,
                    tm.MonthOfYearNumber ,
                    SUM(COALESCE(SubTotal, 0)) AS st
          FROM      dbo.TimeMaster tm
                    LEFT JOIN Sales.SalesOrderHeader AS soh ON tm.ActualDate = soh.OrderDate
          WHERE     tm.ActualDate BETWEEN '2005-07-01'
                                  AND     '2008-08-01'
          GROUP BY  tm.YearName ,
                    tm.MonthOfYearNumber
        ) AS x ;

Although the two queries are remarkably different, the biggest difference is the introduction of ROWS 12 PRECEDING. This takes the place of the self join in the previous example. Instead of writing out a join ourselves, we simply tell SQL Server that we’d like an average of st over the last 12 rows sorted by year and month. What kind of effect does this have on the work SQL Server performs?

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, 
    read-ahead reads 0, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
Table 'TimeMaster'. Scan count 1, logical reads 7, physical reads 1, 
    read-ahead reads 5, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.
Table 'SalesOrderHeader'. Scan count 1, logical reads 104, physical reads 1, 
    read-ahead reads 102, lob logical reads 0, lob physical reads 0, 
    lob read-ahead reads 0.

111 reads instead of 369 reads. Clearly this change makes for a substantial performance improvement for SQL Server. We’ve reduced the number of reads, eliminated some query complexity, and made it somewhat obvious to the future developers how they could modify or build on this going forward. Changing the new query to a rolling average by day instead of by month is simple and requires even fewer lines of code:

SELECT  ActualDate ,
        st AS SubTotal ,
        AVG(st) OVER ( ORDER BY ActualDate ROWS 365 PRECEDING ) AS RollingAverageSales
FROM    ( SELECT    ActualDate ,
                    SUM(COALESCE(soh.SubTotal, 0)) AS st
          FROM      dbo.TimeMaster tm
                    LEFT JOIN Sales.SalesOrderHeader soh ON tm.ActualDate = soh.OrderDate
          WHERE     tm.ActualDate BETWEEN '2005-07-01'
                                  AND     '2008-08-01'
          GROUP BY  ActualDate
        ) AS x
ORDER BY x.ActualDate;

This performs the name number of logical and physical reads as the monthly rolling average using a window function.

Summing Up

There you have it – two different ways to perform a rolling average in SQL Server. One method is clearly a lot easier than the other. There are a number of optimizations in SQL Server 2012 to make it easy for you to build this functionality and to improve SQL Server performance at the same time.

↑ Back to top
  1. Thanks Jeremiah. I am just now in the middle of migrating a 2008R2 system which computes SMA/EMA/MACD’s to SQL 2012. It is using R2′s windowing function OVER, but has to use ROW_NUMBER and do a little extra math due to the lack of the new 2012 ROWS PRECEDING clause. I LOVE the new ROWS PRECEDING! And performance is WAY faster in 2012 than it was in R2. Thanks for the examples!

    • You’re welcome! Thanks for sharing a bit about your use case – you’ve hit upon another technique that I’ve used with some moderate success on SQL Server 2008R2 and earlier: combining ROW_NUMBER with a semi-join to perform SUM/AVG/etc. I hope your migration goes smoothly.

  2. I was trying to work through your examples, but cannot find the table dbo.TimeMaster in the AdventureWorks2012.

    Any idea what I’m doing wrong?

    Thanks.

    • You’re not doing anything wrong there. The TimeMaster table is a local time dimension that was shared with me in the past. If you need a time dimension, you can build a calendar table yourself in many different ways. I put together an example a while ago on my old blog that you could use to build a calendar table.

  3. in your first exemple I think we should use cte2 instead of cte1 in : AVG(cte1.SubTotal) AS AverageSubTotal, shouldn’t we?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php