When I’m tuning queries, the normal answer is to make the query perform better – either via changing the T-SQL, adding hints, or adding indexes so that the data’s better prepared for the query.
However, sometimes when I’m looking at the output of sp_BlitzCache, I scroll across to the Average Rows column and double-check that the query’s actually returning a reasonable number of rows out in the wild.
I’ve uncovered some amusing situations, like a back end process that was supposedly fetching the emails that were ready to be sent out. We noticed that it was one of the top resource-consuming queries, but no one had been complaining about it since it ran as a back end service. However, it was retrieving hundreds of thousands of rows every time it ran. Upon checking the query and the source data, it turned out that the source data had a ton of outgoing emails with null “to” addresses – which the back end process ignored each time it ran. The developers purged that invalid data, and the server’s CPU returned back down to normal.
Another situation was a typical date-range report, as in, “Show me the orders shipped for @SalespersonID between @StartDate and @EndDate.” However, the average number of rows was over 50,000! I had so many questions, and I blurted them all out at once on the call:
- Is there a salesperson that’s actually had 50,000 shipped orders, in total? (That seemed like a really high number for the kind of high-ticket items that this company sold.)
- What’s the shortest date range where someone actually had 50,000 shipped orders? (In this case, it was a 10-year date range.)
- Who would actually run this report for a 10-year date range?
- What would they learn from this report?
- Even if they needed it once – why would they continue to run it multiple times per hour, for a 10-year range?
After we all got done chuckling and shaking our heads, we did some research, and to the best of our abilities, we guessed that someone had this report on a scheduled task. I asked if we could implement one of my favorite query tuning techniques – so good that it inspired Sir Rod Stewart’s hit song from 40 years ago, “Pagination“.
Okay, well, actually Rod’s song is called Infatuation, but every time I say the term “pagination”, I can’t help but sing it to this tune, and my training class students will recognize it: “It’s pagination, WOO!”
In the case of this client, we changed the query to use pagination like I describe in this module of the Mastering Query Tuning class, adding new parameters to the stored procedure for @PageNumber (defaulted to 1) and @PageSize (defaulted to 1000). We figured out that most users would never need to see more than 1,000 rows, so we defaulted to 1,000 rows for the first page. We figured that if we got a support call asking for the second page, we’d cross that bridge when we came to it.
We never got that call.


6 Comments. Leave new
Hi Brent, I’ve never run into this problem in practice, so just asking more out of curiosity.
Any idea why performance of pagination seems to depend on the offset in certain cases? I was testing this query in SSMS:
SELECT * FROM v_ProjectList ORDER BY ProjectNo OFFSET 10000 ROWS FETCH NEXT 10 ROWS ONLY
Happy to provide the definition of the view, but at that point I’d actually just be asking for free work, which I don’t intend – just an “off the top of your head answer”, if you happen to have one.
Anyway – if I run that query with no offset, it takes about 4 seconds to execute, returning about 50000 rows. However, adding the offset of 10000 rows triples the execution time to 12 seconds. Making matters even more confusing, increasing the offset to 20000 rows makes the execution time 3 seconds. This seems absurd to me – surely returning 10 rows should be cheaper than 50000, especially if that 10 rows is a verbatim subset, right? And what reason could there be for the offset to so significantly affect the result?
Did you compare the execution plans? It may use slower Nested Lookups for the 12-second-execution. Or splils to disk because of bad estimations etc.
In general my “problem” with paginagion is that – when you are using a higher offset, the SQL server still needs to read the first x rows too.
So with OFFSET 10000 ROWS FETCH NEXT 10 ROWS ONLY it needs to read 10010 ROWS (or even more / the whole dataset, when you are using some sort of ORDER BY that is not supported by the used index).
Compared to a TOP 10010 the pagination just reduces the amount of rows that are transfered to the client (=network load).
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY on the other hand will usually be very fast (similar to at TOP 10) – if it is supported by the index.
Hi Thomas, thanks for your response. You’re right – the execution plans are different. I think this has to be the weirdest case of parameter sniffing I’ve ever seen.
Unfortunately I can’t give much more of an answer than that. The view is a ghastly nightmare of legacy code. Actually trying to analyse it is probably a fool’s errand.
In the first example (the mail job) – wouldn’t it be better to create a filtered index with
WHERE to_receipient IS NOT NULL AND send_date IS NULL
and modify the query to read just those rows? The index would usually contain just the few rows that still needs and can be sent, so it would not add much overhead. The query would then use this index and be blazing fast, since even with pagination it would still read 1000 (or whatever) rows with tons of ignored mails (because they have no receipient).
To use pagination (or a TOP) in this case seems to be a bad fit, theoretical the 1002nd mail could be one that still needs to be send because something created a whole bunch of new entries in that table.
Brent didn’t fix the mail issue with pagination, he fixed that by getting them to clean up their trash data. I think that story was included as an example of an “amusing situation” not a case for pagination.
[…] Brent Ozar recommends a strategy: […]