Window Function Examples for SQL Server
Window Function Examples for SQL Server
Window (or Windowing) functions are a great way to get different perspectives on a set of data without having to make repeat calls to the server for that data. For example, we can gather the sum of a column and display it side-by-side with the detail-level data, such that “SalesAmount” and “SUM(SalesAmount)” can appear in the same row. We can also do analytical functions like PERCENT_RANK and ranking functions like ROW_NUMBER, all without altering the granularity of the result set or making additional trips to get the same source data again and again.
Window functions all use the OVER() clause, which is used to define how the function is evaluated. The OVER() clause accepts three different arguments:
- PARTITION BY: Resets its counter every time the stated column(s) changes values.
- ORDER BY: Orders the rows the function will evaluate. This does not order the entire result set, only the way the function proceeds through the rows.
- ROWS BETWEEN: Specifies how to further limit the rows evaluated by the function.
Let’s pretend we’re looking at simplified data from a weight-lifting competition. Here are some code examples (we’ll do them all in one SELECT statement because adding/removing window functions in no way changes the number of rows we get back):
/* ROW_NUMBER will list the number of the row, ordered by LiftID.
The counter will reset with every new combination of LiftDate and LiftPersonID */
, ROW_NUMBER() OVER (PARTITION BY LiftDate, LiftPersonID ORDER BY LiftID) AS LiftNumForToday
/* SUM will add up the weights lifted.
The first SUM will show the grand total for the entire result set.
The second SUM will show the total lift weight for that row's lift date.
The third SUM will show the total lift weight for that row's lift date and person. */
, SUM(LiftWeight) OVER () AS WeightGrandTotal
, SUM(LiftWeight) OVER (PARTITION BY LiftDate) AS WeightTotal
, SUM(LiftWeight) OVER (PARTITION BY LiftDate, LiftPersonID) AS PersonWeightTotal
/* AVG will show the average weight lifted.
The first AVG will show the average lift weight for that row's lift date.
The second AVG will show the average lift weight for that row's lift date and person. */
, AVG(LiftWeight) OVER (PARTITION BY LiftDate) AS PersonWeightAvg
, AVG(LiftWeight) OVER (PARTITION BY LiftDate, LiftPersonID) AS PersonDayWeightAvg
/* LAG and LEAD allow the current row to report on data in rows behind or ahead of it.
This LAG function will return the LiftWeight from 1 row behind it (in order of LiftID) and if no value is found, it will return 0 instead of NULL.
The LEAD function will get the LiftWeight from 3 rows ahead. Since we didn't specify the optional default value (like the "0" we gave the LAG function, it will return NULL if there is no row 3 rows ahead. */
, LAG(LiftWeight, 1, 0) OVER (ORDER BY LiftID) AS PrevLift
, LEAD(LiftWeight, 3) OVER (ORDER BY LiftID) AS NextLift
/* FIRST_VALUE AND LAST_VALUE will return the specified column's first and last value in the result set.
This FIRST_VALUE function will return the first LiftWeight in the result set.
This LAST_VALUE function will return the last LiftWeight in the result set.
***WARNING: without the ROWS BETWEEN in the LAST_VALUE, you may get unexpected results.***
, FIRST_VALUE(LiftWeight) OVER (ORDER BY LiftDate) AS FirstLift
, LAST_VALUE(LiftWeight) OVER (ORDER BY LiftDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastLift
/* SUM using ROWS BETWEEN will narrow the scope evaluated by the window function.
The function will begin and end where the ROWS BETWEEN specify.
The first SUM will add all the LiftWeight values in rows up to and including the current row.
The second SUM will add all the LiftWeight values in rows between the current row and the 3 rows before it.
, SUM(LiftWeight) OVER (ORDER BY LiftDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS WeightRunningTotal
, SUM(LiftWeight) OVER (ORDER BY LiftDate ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS WeightSumLast4
Considerations for Window Functions
If you don’t have SQL Server 2012 or later, your window function cupboard is pretty bare; SQL Server 2005 through 2008 R2 only allowed PARTITION BY in the OVER clause of an aggregate function, and you got RANK() and ROW_NUMBER(). That was about it. If you’re a developer still on one of those earlier versions, this is a compelling case for moving to 2012 or later. Think how much time you could be saving not writing multiple CTEs and how much faster your queries will go.
Speaking of fast…
By avoiding round-trips to the server for the same data, we reduce I/O on those tables. If we’re hitting indexes, we can really whittle down the reads involved. There is a trade-off, but it’s usually a very favorable one. Window functions require SQL Server to construct the window and compute the function (shown as tasks like Window Spool, Segment, Sequence Project, and Compute Scalar). In doing so, it adds reads to the Worktable. Still, this is generally less expensive than going back to get the source data multiple times, aggregating if necessary, and joining it all together. Also, Worktable exists in tempdb, which — ideally — is on your fastest storage tier.
Finally, remember that the limitations you put on a window function — PARTITION BY, ORDER BY, or ROWS BETWEEN — are there to apply context to the window function and in no way apply to the result set as a whole. In other words, your SELECT statement will be unaffected by anything you tell a window function to do.
More Window Function Resources
Doug Lane’s webcast and scripts on window functions
Windowing Function Examples for SQL Server
Rolling Averages in SQL Server
Itzik Ben-Gan on How to Use Microsoft SQL Server Window Functions
Books Online: Using the OVER() Clause