My First Post Here…
Well, my first technical post, was about how the default index creation method is OFFLINE. If you want that sweet, sweet Enterpri$e Edition ONLINE goodness, you need to specify it. It’s been a while since that one; almost six months to the day. So here’s another one!
But Window Functions Are Awesome
Heck yeah they are. And how. Boy howdy. Etc. You get the point. I’m enthusiastic. What can be cruel about them? Glad you asked!
Window Functions, according to the almighty ANSI Standard, have two ways of framing data: RANGE and ROWS. Without getting into the implementation differences between the ANSI Standard and Microsoft’s versions, or any performance differences between the two, there’s a funny difference in how they handle aggregations when ordered by non-unique values. A simple example using the Stack Overflow database follows.
1 2 3 4 5 6 7 8 9 10 |
SELECT OwnerUserId, CAST(CreationDate AS DATE) AS DumbedDownDate, Score, SUM(Score) OVER ( ORDER BY CAST(CreationDate AS DATE)) AS Not_Specified, SUM(Score) OVER ( ORDER BY CAST(CreationDate AS DATE)RANGE UNBOUNDED PRECEDING ) AS Range_Specified, SUM(Score) OVER ( ORDER BY CAST(CreationDate AS DATE)ROWS UNBOUNDED PRECEDING ) AS Rows_Specified FROM dbo.Posts WHERE OwnerUserId = 1 AND CAST(CreationDate AS DATE) BETWEEN '2008-08-01' AND '2008-08-31' ORDER BY CAST(CreationDate AS DATE); |
For the month of August, Year of Our Codd 2008, we’re getting a running total of the score for posts by UserId 1. Who is UserId 1? I’ll never tell. But back to the syntax! In the first SUM, we’re not specifying anything, for the next two we’re specifying RANGE and then ROWS. Why? REASONS! And why am I casting the CreateDate column as a date? MORE REASONS!
Before you scroll down, think for a second:
If I don’t specify RANGE or ROWS, which will SQL Server use?
If I left the CreateDate column as DATETIME, what eff aff difference would it make to the output?
Do you see a pattern forming here?

When we don’t specify RANGE or ROWS, well, SQL Server is nice enough to pick RANGE for us. “Nice”.

Deep breaths, Erik. Deep breaths.
You should also notice the difference in how each different method aggregates data. When the ordering column has duplicates, RANGE, and by extension, the default method, will SUM all the values for that group at once. When ROWS is specified as the framing method, you see the running total that most people are after.

And, of course, if all the values were unique, they’d do the same thing.
1 2 3 4 5 6 7 8 9 10 |
SELECT OwnerUserId, CreationDate, Score, SUM(Score) OVER ( ORDER BY CreationDate ) AS Not_Specified, SUM(Score) OVER ( ORDER BY CreationDate RANGE UNBOUNDED PRECEDING ) AS Range_Specified, SUM(Score) OVER ( ORDER BY CreationDate ROWS UNBOUNDED PRECEDING ) AS Rows_Specified FROM dbo.Posts WHERE OwnerUserId = 1 AND CAST(CreationDate AS DATE) BETWEEN '2008-08-01' AND '2008-08-31' ORDER BY CreationDate; |
Wrap. It. Up.
This one is pretty self explanatory. If you’re lucky enough to be on SQL Server 2012 or greater, and you’re using Window Functions to their full T-SQL potential, it’s was easier to calculate running totals. Just be careful how you write your code.
If you like this sort of stuff, Check out Doug’s new video series, T-SQL Level Up. There are next to zero fart jokes in it.
5 Comments. Leave new
” it’s was easier to calculate running totals”
I’m pretty sure you wanted to say ” it’s way easier to calculate running totals”.
Good article.
Cheers.
Aside from the change in behavior, there’s also a very serious performance impact when RANGE is used instead of ROWS.
Worst.Default.Ever.
Are there performance differences between RANGE and ROWS?
I remember a training course where us was told that RANGE will use spills to tempdb and ROWS not.
Following to statistics I/O RANGE produces several reads on worktable while ROWs has a worktable but shows me zero reads (don’t know if it means that there are really zero reads on worktable or if the statistics are lying).
This is particularly true, if you
CREATE INDEX idx_post_dummy ON dbo.Posts (OwnerUserId, CreationDate) INCLUDE (Score)
Yes, that’s true. RANGE needs a worktable in tempdb, while ROWS can use (under certain conditions) a specialized in-memory table (hence the big performance improvement).
That’s also why you see a worktable but with 0 reads I guess.
First time I’ve read through an article on windows functions that I have actually understood the concept. Thanks and nicely done!