Windowing Function Examples for SQL Server

Aggregations and grouping can be a pain in the rear for the novice SQL developer. Way back in 2003, the ANSI/ISO standards people figured this out and added windows and ranking functions to the standard. In 2005, Microsoft added a few of these functions (ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE()) as well as the OVER() clause so we could whet our appetites. The combination of window and ranking functionality has been used and abused to add all kinds of functionality to applications. SQL Server Denali takes it to the next step and adds a whole slew of additional functionality to window and rank and analyze data in without having to find a BI expert.

OVER() Defining Your Windows

I’m sitting in my office right now, when I look out the window I can see a shrub, part of the neighbor’s car, and the grass in my lawn. I can also see my cat lazily napping. If I move my chair over a little bit, the view out of the window changes and I can see the driveway and the road. Changing where my chair is changes the view I can see of the outside. Changing the definition of a window in T-SQL changes the view of the data for a function in the query.

AVG(view) OVER (PARTITION BY chair)

PARTITION BY and ORDER BY

Much like GROUP BY in a query, PARTITION BY lets us divide up the results in partitions. The difference is that PARTITION BY acts locally – on a single function – instead of on the entire query.

We’re all counting on you

Looking at the results from this query, we can see that the ROW_NUMBER() function gives us the number of a row – in FirstName order – in the results when it’s grouped up by LastName. Or, to put it differently, it gives us the relative position of the row defined by the window of LastName.

A window is a way of defining a local scope for a function. Instead of using convoluted logic, subqueries, and temporary tables, it’s easy to definite local scope for queries to do something useful.

Finding Other Rows with LAG and LEAD

It’s easy to produce queries based on relatively fixed windows, but what if you need to know what’s going on just before and just after the current row. Surely there must be a way to do something like that.

SQL Server 2012 brings support for the LAG and LEAD functions. The LAG function can be used to show a previous row. LAG accepts two additional optional parameters, the number of rows to go back a default value to use, just in case a NULL is found.

Let’s look at employee performance in the AdventureWorks2008R2 database by comparing their current performance with previous performance. This query, without the LAG function requires creating some sort of temporary table and joining to it once over each of last_month, three_months_ago, six_months_ago, nine_months_ago and twelve_months_ago. It might end up looking something like this (if we used a CTE instead of a temporary table):

Although this query is clear enough, it requires a considerable amount of disk access and performs 6 logical scans and 4116 logical reads.

Re-writing this query to use the LAG function makes the query easier to read, easier to understand, and much faster for SQL Server to run. It only requires 1 logical scan and 686 logical reads.

While this isn’t the most interesting report, it’s very common in many fields to compare current performance to previous performance. Stocks and 401(k) portfolios do this. My utility bills all show the last year of history as well as what I’m being charged this month.

Your sales are down, Ted.

Relative Aggregates Using ROWS BETWEEN

Moving a window relative to the current row is pretty interesting, but LAG and LEAD can only do so much. What if you need to show a three month average? What if that three month average needs to be centered on the current month?

This query adds in something called the frame clause (ROWS BETWEEN start AND end). The frame clause makes it easy to define a sliding window for the function being used.

Historically speaking, you never were very good.

There are a few ways to use the frame clause to change how the window is computed. As you saw in the previous query it’s possible to state the number of rows before and after the current row. That’s simple enough. It’s also possible to define a range of rows to use. One of the most first uses that developers will find for frame clauses is to produce a running total:

Run faster. Work harder. Sell. Sell. Sell.

In the past, developers would have had to resort to trickery with messy self-joins to get running totals. Thankfully, it’s now possible to define a frame for the window and use it compute aggregates over a range. I did resort to a bit of trickery to demonstrate the running total. The default value for a frame clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so it’s not that impressive and can be achieved with T-SQL that looks like any other window function.

The ranges in a frame clause are all relative to the current partition of the window function. UNBOUNDED PRECEDING will get all rows from the start of the frame, and the start of the frame is relative to whatever has been defined in the OVER clause. UNBOUNDED FOLLOWING will get all rows from the current row to the end of the frame.

Brand New Analytic Functions

There are a few other functions that are being added to T-SQL in SQL Server 2012. I haven’t mentioned them yet because they’re statistical functions for computing distribution and median information. This is about the extent that I’ll mention them, but here they are:

  • CUME_DIST() – calculates the position of a value relative to a group of values
  • PERCENT_RANK() – calculates the percent rank of a value in a group of values
  • PERCENTILE_CONT(X) – looks at the percent rank of values in a group until it finds one greater than or equal to X.
  • PERCENTILE_DISC(X) – looks at the cumulative distribution of values in a group until it finds one greater than or equal to X

I don’t do a lot of analytical work, and certainly nothing where I’m going to be computing cumulative distributions, but it’s good to know that these functions are available. A lot of people do need to perform this functionality and they’ve resorted to using T-SQL trickery or SQL-CLR to get the same results when they could have just had these functions in the database.

These analytical functions are a bit different than the other functions that I’ve mentioned so far. First of all, they all don’t require an OVER() clause. Because these are analytical functions, you might want to perform the analysis over the entire query, so there’s no explicit need to apply a window and frame to the function.

The other thing that makes these functions different is that they allow a WITHIN GROUP specification. WITHIN GROUP is new in SQL Server 2012 and it applies an ordering to the analytical function’s actions but not to the results.

Here’s an example using all four to demonstrate how they work on the data in AdventureWorks, and to show how poorly those poor sales clerks get paid:

[

Mike, I have good news: you’re getting a corner office.

SQL Server Windowing Functions Training Video

For more information, watch this free 20-minute training video on SQL Server 2012’s new windowing functions:

Summing it up

SQL Server 2012 adds many new features, not the least of which are the new window functions, frame clauses, and analytic functions. When you take each of these features on their own, they add powerful functionality to SQL Server. When you combine these three T-SQL enhancements, they give database developer tremendous power and flexibility to quickly develop reports that analyze data and deliver the results without needing extra features, tools, or products.

Previous Post
Upcoming Training
Next Post
Which Sessions Will YOU Send to the SQL PASS Summit?

13 Comments. Leave new

  • Very useful! Thank you very much.

    Reply
  • Nice details of the upcoming new functions in denali. Thank you.

    Reply
  • Thanks! These features will be tremendous for database analytics.

    Reply
  • Can’t wait to see the query plans!

    Reply
  • Stephen Archbold
    July 15, 2011 4:48 am

    I had to stop at the picture of the view outside your office. It even made my cubicle walls cry.

    Ok I kept going, great article, thanks 🙂

    Reply
  • Shy Engelberg
    August 18, 2011 2:30 am

    Hi,

    Great post! one small correction –
    the default for the window range is not ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    but RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
    the difference is that when using RANGE, CURRENT ROW means al the rows with the same value in the ORDER BY columns.
    this script can demostrate the difference:

    create table T1(ID INT)
    insert into T1(ID)
    VALUES(1),(2),(2),(3),(4),(4)

    SELECT ID,COUNT(*) OVER (ORDER BY ID)
    FROM T1

    SELECT ID,COUNT(*) OVER (ORDER BY ID rows between UNBOUNDED Preceding and CURRENT ROW)
    FROM T1

    SELECT ID,COUNT(*) OVER (ORDER BY ID range between UNBOUNDED Preceding and CURRENT ROW)
    FROM T1

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.