Windowing Function Examples for SQL Server

Last Updated 3 years ago

Aggregations and grouping can be a pain in the rear for the novice SQL developer. Way back in 2003, the ANSI/ISO standards people figured this out and added windows and ranking functions to the standard. In 2005, Microsoft added a few of these functions (ROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE()) as well as the OVER() clause so we could whet our appetites. The combination of window and ranking functionality has been used and abused to add all kinds of functionality to applications. SQL Server Denali takes it to the next step and adds a whole slew of additional functionality to window and rank and analyze data in without having to find a BI expert.

OVER() Defining Your Windows

I’m sitting in my office right now, when I look out the window I can see a shrub, part of the neighbor’s car, and the grass in my lawn. I can also see my cat lazily napping. If I move my chair over a little bit, the view out of the window changes and I can see the driveway and the road. Changing where my chair is changes the view I can see of the outside. Changing the definition of a window in T-SQL changes the view of the data for a function in the query.

moving windows — AVG(view) OVER (PARTITION BY chair)

PARTITION BY and ORDER BY

Much like GROUP BY in a query, PARTITION BY lets us divide up the results in partitions. The difference is that PARTITION BY acts locally – on a single function – instead of on the entire query.

SELECT LastName ,
FirstName ,
ROW_NUMBER() OVER (PARTITION BY LastName ORDER BY FirstName) AS position
FROM Person.Person ;

SELECT LastName ,

FirstName ,

ROW_NUMBER() OVER (PARTITION BY LastName ORDER BY FirstName) AS position

FROM Person.Person ;

Looking at the results from this query, we can see that the ROW_NUMBER() function gives us the number of a row – in FirstName order – in the results when it’s grouped up by LastName. Or, to put it differently, it gives us the relative position of the row defined by the window of LastName.

A window is a way of defining a local scope for a function. Instead of using convoluted logic, subqueries, and temporary tables, it’s easy to definite local scope for queries to do something useful.

Finding Other Rows with LAG and LEAD

It’s easy to produce queries based on relatively fixed windows, but what if you need to know what’s going on just before and just after the current row. Surely there must be a way to do something like that.

SQL Server 2012 brings support for the LAG and LEAD functions. The LAG function can be used to show a previous row. LAG accepts two additional optional parameters, the number of rows to go back a default value to use, just in case a NULL is found.

Let’s look at employee performance in the AdventureWorks2008R2 database by comparing their current performance with previous performance. This query, without the LAG function requires creating some sort of temporary table and joining to it once over each of last_month, three_months_ago, six_months_ago, nine_months_ago and twelve_months_ago. It might end up looking something like this (if we used a CTE instead of a temporary table):

WITH sales AS (
SELECT h.SalesPersonID ,
ROW_NUMBER() OVER (PARTITION BY h.SalesPersonID
ORDER BY DATEPART(yyyy, h.OrderDate),
DATEPART(mm, h.OrderDate)) AS rn ,
DATEPART(yyyy, h.OrderDate) AS [Year],
DATEPART(mm, h.OrderDate) AS [Month],
SUM(h.SubTotal) AS TerritoryTotal
FROM Sales.SalesOrderHeader AS h
WHERE SalesPersonID IS NOT NULL
GROUP BY SalesPersonID ,
DATEPART(yyyy, OrderDate) ,
DATEPART(mm, OrderDate)
)
SELECT s.SalesPersonID,
s.[Year],
s.[Month],
s.TerritoryTotal,
s1.TerritoryTotal AS last_month ,
s3.TerritoryTotal AS three_months_ago ,
s6.TerritoryTotal AS six_months_ago ,
s9.TerritoryTotal AS nine_months_ago ,
s12.TerritoryTotal AS twelve_months_ago
FROM sales AS s
LEFT OUTER JOIN sales AS s1 ON s.SalesPersonID = s1.SalesPersonID
AND s.rn = s1.rn + 1
LEFT OUTER JOIN sales AS s3 ON s.SalesPersonID = s3.SalesPersonID
AND s.rn = s3.rn + 3
LEFT OUTER JOIN sales AS s6 ON s.SalesPersonID = s6.SalesPersonID
AND s.rn = s6.rn + 6
LEFT OUTER JOIN sales AS s9 ON s.SalesPersonID = s9.SalesPersonID
AND s.rn = s9.rn + 9
LEFT OUTER JOIN sales AS s12 ON s.SalesPersonID = s12.SalesPersonID
AND s.rn = s12.rn + 12

WITH sales AS (

SELECT h.SalesPersonID ,

ROW_NUMBER() OVER (PARTITION BY h.SalesPersonID

ORDER BY DATEPART(yyyy, h.OrderDate),

DATEPART(mm, h.OrderDate)) AS rn ,

DATEPART(yyyy, h.OrderDate) AS [Year],

DATEPART(mm, h.OrderDate) AS [Month],

SUM(h.SubTotal) AS TerritoryTotal

FROM Sales.SalesOrderHeader AS h

WHERE SalesPersonID IS NOT NULL

GROUP BY SalesPersonID ,

DATEPART(yyyy, OrderDate) ,

DATEPART(mm, OrderDate)

)

SELECT s.SalesPersonID,

s.[Year],

s.[Month],

s.TerritoryTotal,

s1.TerritoryTotal AS last_month ,

s3.TerritoryTotal AS three_months_ago ,

s6.TerritoryTotal AS six_months_ago ,

s9.TerritoryTotal AS nine_months_ago ,

s12.TerritoryTotal AS twelve_months_ago

FROM sales AS s

LEFT OUTER JOIN sales AS s1 ON s.SalesPersonID = s1.SalesPersonID

AND s.rn = s1.rn + 1

LEFT OUTER JOIN sales AS s3 ON s.SalesPersonID = s3.SalesPersonID

AND s.rn = s3.rn + 3

LEFT OUTER JOIN sales AS s6 ON s.SalesPersonID = s6.SalesPersonID

AND s.rn = s6.rn + 6

LEFT OUTER JOIN sales AS s9 ON s.SalesPersonID = s9.SalesPersonID

AND s.rn = s9.rn + 9

LEFT OUTER JOIN sales AS s12 ON s.SalesPersonID = s12.SalesPersonID

AND s.rn = s12.rn + 12

Although this query is clear enough, it requires a considerable amount of disk access and performs 6 logical scans and 4116 logical reads.

Re-writing this query to use the LAG function makes the query easier to read, easier to understand, and much faster for SQL Server to run. It only requires 1 logical scan and 686 logical reads.

SELECT SalesPersonID ,
[Year] ,
[Month] ,
TerritoryTotal ,
LAG(TerritoryTotal) OVER (PARTITION BY SalesPersonID
ORDER BY [Year], [Month]) AS last_month ,
LAG(TerritoryTotal, 3, 0.00) OVER (PARTITION BY SalesPersonID
ORDER BY [Year], [Month]) AS three_months_ago ,
LAG(TerritoryTotal, 6, 0.00) OVER (PARTITION BY SalesPersonID
ORDER BY [Year], [Month]) AS six_months_ago ,
LAG(TerritoryTotal, 9, 0.00) OVER (PARTITION BY SalesPersonID
ORDER BY [Year], [Month]) AS nine_months_ago ,
LAG(TerritoryTotal, 12, 0.00) OVER (PARTITION BY SalesPersonID
ORDER BY [Year], [Month]) AS twelve_months_ago
FROM (
SELECT SalesPersonID,
DATEPART(yyyy, OrderDate) AS [Year] ,
DATEPART(MONTH, OrderDate) AS [Month] ,
SUM(SubTotal) AS TerritoryTotal
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,
DATEPART(yyyy, OrderDate),
DATEPART(month, OrderDate)
) AS x
WHERE SalesPersonID IS NOT NULL
ORDER BY SalesPersonID,
[Year],
[Month];

SELECT SalesPersonID ,

[Year] ,

[Month] ,

TerritoryTotal ,

LAG(TerritoryTotal) OVER (PARTITION BY SalesPersonID

ORDER BY [Year], [Month]) AS last_month ,

LAG(TerritoryTotal, 3, 0.00) OVER (PARTITION BY SalesPersonID

ORDER BY [Year], [Month]) AS three_months_ago ,

LAG(TerritoryTotal, 6, 0.00) OVER (PARTITION BY SalesPersonID

ORDER BY [Year], [Month]) AS six_months_ago ,

LAG(TerritoryTotal, 9, 0.00) OVER (PARTITION BY SalesPersonID

ORDER BY [Year], [Month]) AS nine_months_ago ,

LAG(TerritoryTotal, 12, 0.00) OVER (PARTITION BY SalesPersonID

ORDER BY [Year], [Month]) AS twelve_months_ago

FROM (

SELECT SalesPersonID,

DATEPART(yyyy, OrderDate) AS [Year] ,

DATEPART(MONTH, OrderDate) AS [Month] ,

SUM(SubTotal) AS TerritoryTotal

FROM Sales.SalesOrderHeader

GROUP BY SalesPersonID,

DATEPART(yyyy, OrderDate),

DATEPART(month, OrderDate)

) AS x

WHERE SalesPersonID IS NOT NULL

ORDER BY SalesPersonID,

[Year],

[Month];

While this isn’t the most interesting report, it’s very common in many fields to compare current performance to previous performance. Stocks and 401(k) portfolios do this. My utility bills all show the last year of history as well as what I’m being charged this month.

AdventureWorks Employee Performance — Your sales are down, Ted.

Relative Aggregates Using ROWS BETWEEN

Moving a window relative to the current row is pretty interesting, but LAG and LEAD can only do so much. What if you need to show a three month average? What if that three month average needs to be centered on the current month?

SELECT SalesPersonID,
the_year,
the_month,
TerritoryTotal,
AVG(TerritoryTotal) OVER (PARTITION BY SalesPersonID
ORDER BY the_year, the_month
ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING) AS three_month_average ,
AVG(TerritoryTotal) OVER (PARTITION BY SalesPersonID
ORDER BY the_year, the_month
ROWS BETWEEN 6 PRECEDING
AND 6 FOLLOWING) AS year_average
FROM (
SELECT SalesPersonID,
DATEPART(yyyy, OrderDate) AS the_year,
DATEPART(MONTH, OrderDate) AS the_month,
DATEPART(QUARTER, OrderDate) AS the_quarter,
SUM(SubTotal) AS TerritoryTotal
FROM Sales.SalesOrderHeader
GROUP BY SalesPersonID,
DATEPART(yyyy, OrderDate),
DATEPART(month, OrderDate),
DATEPART(QUARTER, OrderDate)
) AS x ;

SELECT SalesPersonID,

the_year,

the_month,

TerritoryTotal,

AVG(TerritoryTotal) OVER (PARTITION BY SalesPersonID

ORDER BY the_year, the_month

ROWS BETWEEN 1 PRECEDING

AND 1 FOLLOWING) AS three_month_average ,

AVG(TerritoryTotal) OVER (PARTITION BY SalesPersonID

ORDER BY the_year, the_month

ROWS BETWEEN 6 PRECEDING

AND 6 FOLLOWING) AS year_average

FROM (

SELECT SalesPersonID,

DATEPART(yyyy, OrderDate) AS the_year,

DATEPART(MONTH, OrderDate) AS the_month,

DATEPART(QUARTER, OrderDate) AS the_quarter,

SUM(SubTotal) AS TerritoryTotal

FROM Sales.SalesOrderHeader

GROUP BY SalesPersonID,

DATEPART(yyyy, OrderDate),

DATEPART(month, OrderDate),

DATEPART(QUARTER, OrderDate)

) AS x ;

This query adds in something called the frame clause (ROWS BETWEEN start AND end). The frame clause makes it easy to define a sliding window for the function being used.

centered average — Historically speaking, you never were very good.

There are a few ways to use the frame clause to change how the window is computed. As you saw in the previous query it’s possible to state the number of rows before and after the current row. That’s simple enough. It’s also possible to define a range of rows to use. One of the most first uses that developers will find for frame clauses is to produce a running total:

SELECT soh.SalesPersonID, soh.OrderDate,
soh.SubTotal ,
SUM(soh.SubTotal) OVER (PARTITION BY SalesPersonID
ORDER BY soh.OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS total_sales_to_date
FROM Sales.SalesOrderHeader AS soh
WHERE soh.SalesPersonID IS NOT NULL

SELECT soh.SalesPersonID, soh.OrderDate,

soh.SubTotal ,

SUM(soh.SubTotal) OVER (PARTITION BY SalesPersonID

ORDER BY soh.OrderDate

ROWS BETWEEN UNBOUNDED PRECEDING

AND CURRENT ROW) AS total_sales_to_date

FROM Sales.SalesOrderHeader AS soh

WHERE soh.SalesPersonID IS NOT NULL

In the past, developers would have had to resort to trickery with messy self-joins to get running totals. Thankfully, it’s now possible to define a frame for the window and use it compute aggregates over a range. I did resort to a bit of trickery to demonstrate the running total. The default value for a frame clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so it’s not that impressive and can be achieved with T-SQL that looks like any other window function.

The ranges in a frame clause are all relative to the current partition of the window function. UNBOUNDED PRECEDING will get all rows from the start of the frame, and the start of the frame is relative to whatever has been defined in the OVER clause. UNBOUNDED FOLLOWING will get all rows from the current row to the end of the frame.

Brand New Analytic Functions

There are a few other functions that are being added to T-SQL in SQL Server 2012. I haven’t mentioned them yet because they’re statistical functions for computing distribution and median information. This is about the extent that I’ll mention them, but here they are:

CUME_DIST() – calculates the position of a value relative to a group of values
PERCENT_RANK() – calculates the percent rank of a value in a group of values
PERCENTILE_CONT(X) – looks at the percent rank of values in a group until it finds one greater than or equal to X.
PERCENTILE_DISC(X) – looks at the cumulative distribution of values in a group until it finds one greater than or equal to X

I don’t do a lot of analytical work, and certainly nothing where I’m going to be computing cumulative distributions, but it’s good to know that these functions are available. A lot of people do need to perform this functionality and they’ve resorted to using T-SQL trickery or SQL-CLR to get the same results when they could have just had these functions in the database.

These analytical functions are a bit different than the other functions that I’ve mentioned so far. First of all, they all don’t require an OVER() clause. Because these are analytical functions, you might want to perform the analysis over the entire query, so there’s no explicit need to apply a window and frame to the function.

The other thing that makes these functions different is that they allow a WITHIN GROUP specification. WITHIN GROUP is new in SQL Server 2012 and it applies an ordering to the analytical function’s actions but not to the results.

Here’s an example using all four to demonstrate how they work on the data in AdventureWorks, and to show how poorly those poor sales clerks get paid:

SELECT p.LastName ,
p.FirstName ,
e.OrganizationLevel,
eph.Rate * 2080 AS Salary ,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY eph.Rate * 2080 DESC)
OVER (PARTITION BY e.OrganizationLevel) AS MedianContinuous ,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY eph.Rate * 2080 DESC)
OVER (PARTITION BY e.OrganizationLevel) AS MedianDiscrete ,
CUME_DIST() OVER (PARTITION BY e.OrganizationLevel
ORDER BY eph.Rate * 2080 DESC) AS CUME_DIST,
PERCENT_RANK() OVER (PARTITION BY e.OrganizationLevel
ORDER BY eph.Rate * 2080 DESC) AS PERCENT_RANK
FROM HumanResources.Employee AS e
JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID
LEFT JOIN HumanResources.EmployeePayHistory AS eph ON eph.BusinessEntityID = e.BusinessEntityID
GROUP BY p.LastName,
p.FirstName,
e.OrganizationLevel,
eph.Rate
ORDER BY OrganizationLevel ASC,
Salary DESC ;

SELECT p.LastName ,

p.FirstName ,

e.OrganizationLevel,

eph.Rate * 2080 AS Salary ,

PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY eph.Rate * 2080 DESC)

OVER (PARTITION BY e.OrganizationLevel) AS MedianContinuous ,

PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY eph.Rate * 2080 DESC)

OVER (PARTITION BY e.OrganizationLevel) AS MedianDiscrete ,

CUME_DIST() OVER (PARTITION BY e.OrganizationLevel

ORDER BY eph.Rate * 2080 DESC) AS CUME_DIST,

PERCENT_RANK() OVER (PARTITION BY e.OrganizationLevel

ORDER BY eph.Rate * 2080 DESC) AS PERCENT_RANK

FROM HumanResources.Employee AS e

JOIN Person.Person AS p ON e.BusinessEntityID = p.BusinessEntityID

LEFT JOIN HumanResources.EmployeePayHistory AS eph ON eph.BusinessEntityID = e.BusinessEntityID

GROUP BY p.LastName,

p.FirstName,

e.OrganizationLevel,

eph.Rate

ORDER BY OrganizationLevel ASC,

Salary DESC ;

[

Mike, I have good news: you’re getting a corner office.

SQL Server Windowing Functions Training Video

For more information, watch this free 20-minute training video on SQL Server 2012’s new windowing functions:

Summing it up

SQL Server 2012 adds many new features, not the least of which are the new window functions, frame clauses, and analytic functions. When you take each of these features on their own, they add powerful functionality to SQL Server. When you combine these three T-SQL enhancements, they give database developer tremendous power and flexibility to quickly develop reports that analyze data and deliver the results without needing extra features, tools, or products.

Upcoming Training

Which Sessions Will YOU Send to the SQL PASS Summit?

13 Comments. Leave new

Deepak
July 14, 2011 8:55 am

Very useful! Thank you very much.

Reply
Meher
July 14, 2011 10:46 am

Nice details of the upcoming new functions in denali. Thank you.

Reply
Dan
July 14, 2011 12:57 pm

Thanks! These features will be tremendous for database analytics.

Reply
tobi
July 14, 2011 7:03 pm

Can’t wait to see the query plans!

Reply
Stephen Archbold
July 15, 2011 4:48 am

I had to stop at the picture of the view outside your office. It even made my cubicle walls cry.

Ok I kept going, great article, thanks 🙂

Reply
Shy Engelberg
August 18, 2011 2:30 am

Hi,

Great post! one small correction –
the default for the window range is not ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
but RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
the difference is that when using RANGE, CURRENT ROW means al the rows with the same value in the ORDER BY columns.
this script can demostrate the difference:

create table T1(ID INT)
insert into T1(ID)
VALUES(1),(2),(2),(3),(4),(4)

SELECT ID,COUNT(*) OVER (ORDER BY ID)
FROM T1

SELECT ID,COUNT(*) OVER (ORDER BY ID rows between UNBOUNDED Preceding and CURRENT ROW)
FROM T1

SELECT ID,COUNT(*) OVER (ORDER BY ID range between UNBOUNDED Preceding and CURRENT ROW)
FROM T1

Reply
- Jeremiah Peschka
  August 18, 2011 6:21 am
  
  Good catch, thanks!
  
  Reply