Rolling Averages in SQL Server

SQL Server
13 Comments

Aggregate functions are convenient – they solve a business need and they make development easy. Unfortunately, not all business requirements are so easy to solve. Let’s look at one example: rolling averages.

The Rolling Average

A rolling average is a simple concept; an average is computed over a fixed subset of data. Rolling average calculations are most frequently used with time series data and help remove short term fluctuations while highlighting long term trends – utility bills often feature a rolling average of consumption to help the customer understand their usage. Consumers aren’t concerned about electricity usage being high on one day in August when there was a heat wave; they want to see how their consumption is changing over time.

Rolling Averages with Common Table Expressions: 2005 – 2008R2

Starting with SQL Server 2005, it became easy to write a rolling average in a single T-SQL statement using a Common Table Expression. CTEs rapidly became popular; an incredible amount of prevously difficult functionality was made possible through CTEs including recursive queries and rolling averages. Take a look at this example in the AdventureWorks2012 sample database:

While not the most straightforward approach to constructing a rolling average, the CTE manages to get the job done. In this query, we are using the CTE to create a work table and then performing a self-join. This same sort of thing is possible using a temporary table or table variable, but the CTE accomplishes it in one statement and is, arguably, easier to read.

Common Table Expressions also hide a dark secret – SQL Server executes the CTE body every time the CTE expression, cte in this example, is referenced. The more complex the Common Table Expression is, the more work that has to be performed. Running this rolling average with STATISTICS IO turned on, it’s easy to see the multiple executions in the form of two scans on each table:

On a database this small, this doesn’t pose major performance problems, but this will cause big problems for a moderately sized database.

Rolling Averages with Window Functions: 2012 and beyond

SQL Server 2012 provided better support for windowing functions. Although support for OVER() was already available in SQL Server 2005, SQL Server 2012 brings considerably more functionality to the table. By using the ROW or RANGE clause of the windowing function, it’s possible to simplify the query and improve performance. Take a look:

Although the two queries are remarkably different, the biggest difference is the introduction of ROWS 12 PRECEDING. This takes the place of the self join in the previous example. Instead of writing out a join ourselves, we simply tell SQL Server that we’d like an average of st over the last 12 rows sorted by year and month. What kind of effect does this have on the work SQL Server performs?

111 reads instead of 369 reads. Clearly this change makes for a substantial performance improvement for SQL Server. We’ve reduced the number of reads, eliminated some query complexity, and made it somewhat obvious to the future developers how they could modify or build on this going forward. Changing the new query to a rolling average by day instead of by month is simple and requires even fewer lines of code:

This performs the name number of logical and physical reads as the monthly rolling average using a window function.

Summing Up

There you have it – two different ways to perform a rolling average in SQL Server. One method is clearly a lot easier than the other. There are a number of optimizations in SQL Server 2012 to make it easy for you to build this functionality and to improve SQL Server performance at the same time.

Previous Post
How Does SQL Server Store Data?
Next Post
What’s Better: Disabling vs. Dropping Indexes?

13 Comments. Leave new

  • Thanks Jeremiah. I am just now in the middle of migrating a 2008R2 system which computes SMA/EMA/MACD’s to SQL 2012. It is using R2’s windowing function OVER, but has to use ROW_NUMBER and do a little extra math due to the lack of the new 2012 ROWS PRECEDING clause. I LOVE the new ROWS PRECEDING! And performance is WAY faster in 2012 than it was in R2. Thanks for the examples!

    Reply
    • Jeremiah Peschka
      February 25, 2013 11:46 am

      You’re welcome! Thanks for sharing a bit about your use case – you’ve hit upon another technique that I’ve used with some moderate success on SQL Server 2008R2 and earlier: combining ROW_NUMBER with a semi-join to perform SUM/AVG/etc. I hope your migration goes smoothly.

      Reply
  • Brendan Costigan
    February 25, 2013 10:43 am

    I was trying to work through your examples, but cannot find the table dbo.TimeMaster in the AdventureWorks2012.

    Any idea what I’m doing wrong?

    Thanks.

    Reply
    • Jeremiah Peschka
      February 25, 2013 11:24 am

      You’re not doing anything wrong there. The TimeMaster table is a local time dimension that was shared with me in the past. If you need a time dimension, you can build a calendar table yourself in many different ways. I put together an example a while ago on my old blog that you could use to build a calendar table.

      Reply
  • in your first exemple I think we should use cte2 instead of cte1 in : AVG(cte1.SubTotal) AS AverageSubTotal, shouldn’t we?

    Reply
  • It would be helpful if the timemaster code is put here. The old blog contains all different columns

    Reply
  • One thing I found when I was working with this and wanted to double check my logic by using Excel to calculate the average, when you use 12 rows preceding, you actually end up with a 13 month rolling average. It is fine if that is what the intention is, but I think when most people see the number 12, they are thinking they are getting a 12 month average. The Average function in this situation uses the current value and the 12 preceding values, thus making a 13 month average. To see this, you have to test a row beyond row 13. I’m not sure if you will see it when you are also partitioning. The code I was using did not also partition since they wanted to see the average continue over all of the time in the database.

    Reply
  • I realize this is an old post but, for anyone looking for the “dbo.TimeMaster” table, this will create one according to the columns I’ve seen used in the queries.

    WITH cteDate AS
    (
    SELECT TOP (DATEDIFF(dd,'1900','2100'))
    ActualDate = DATEADD(dd,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,'1900')
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2
    )
    SELECT ActualDate = ISNULL(ActualDate,'1900')
    ,YearName = ISNULL(DATENAME(yy,ActualDate),0) --Should be called YearNumber
    ,MonthOfYearNumber = ISNULL(DATEPART(mm,ActualDate),0)
    INTO dbo.TimeMaster
    FROM cteDate
    ;
    ALTER TABLE dbo.TimeMaster
    ADD CONSTRAINT PK_TimeMaster
    PRIMARY KEY CLUSTERED (ActualDate)
    ;

    Reply
  • While we’re at it, there’s hope for people using less than 2012.

    I don’t have 2012 on the computer that I’m working on right now and I don’t have AdventureWorks2012 but if it’s roughly the same as AdventureWorks2008, then it may be missing a critical index for the first query. Add this index. In my testing, it cut the number of reads almost in half.

    CREATE INDEX By_OrderDate ON Sales.SalesOrderHeader(OrderDate) INCLUDE(SubTotal)

    To Jeremiah’s point, there is a pretty good penalty for calling a CTE more than once. When you need to do so, it’s frequently much better to materialize the results of the CTE in a Temp Table. Doing so cuts the number of reads roughly in half again. Here’s the code…

    CREATE TABLE #PreAgg
    (
    R INT NOT NULL PRIMARY KEY CLUSTERED
    ,[year] INT NOT NULL
    ,[month] INT NOT NULL
    ,SubTotal MONEY
    )
    ;
    INSERT INTO #PreAgg WITH (TABLOCK)
    (R,[year],[month],SubTotal)
    SELECT R = ROW_NUMBER() OVER (ORDER BY tm.YearName, tm.MonthOfYearNumber)
    ,[Year] = tm.YearName
    ,[Month] = tm.MonthOfYearNumber
    ,SubTotal = SUM(SubTotal)
    FROM dbo.TimeMaster tm
    LEFT JOIN Sales.SalesOrderHeader soh ON tm.ActualDate = soh.OrderDate
    WHERE tm.ActualDate >= '2005-07-01'
    AND tm.ActualDate ( pa2.r - 12)
    GROUP BY pa1.[year], pa1.[month]
    ORDER BY [year],[month]
    ;

    Reply
    • p.s. I didn’t verify that the original code calculated the running average correctly. Since I used the same algorithm, I don’t guarantee that my code calculates it correctly, either.

      Reply
  • I do wish that folks would make it so that when you post code to a forum, it would preserve all spaces, including leading spaces, so it doesn’t slam all nicely formatted code to the left.

    Reply
    • Howdy Jeff,

      I added code tags to your comments, which… highlighted them! But the formatting is still all out of whack when they post.

      FWIW, the formatting is correct when I go into the editor.

      We all know you write pretty code, anyway 🙂

      Thanks!

      Reply
  • Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.