Indexing Strategies For Cross Apply

Last Updated June 8, 2018

Slicked Brad

The apply operator remains one of my favorite tools, but much like CTEs and windowing functions, they’re not totally magical unless you look at your indexes.

Starting with a dead simple query on the Users table, it’d be easy to come up with an ideal index for this one thing.

SELECT u.Id, u.DisplayName, u.Reputation
FROM dbo.Users AS u
WHERE u.Reputation >= 100000
ORDER BY u.Reputation DESC;

SELECT u.Id, u.DisplayName, u.Reputation

FROM dbo.Users AS u

WHERE u.Reputation >= 100000

ORDER BY u.Reputation DESC;

If you really cared the most about this, you’d probably create an index like this for it.

CREATE INDEX ix_users_whatever 
	ON dbo.Users (Reputation DESC) INCLUDE(DisplayName)
		WHERE Reputation >= 100000;

CREATE INDEX ix_users_whatever

ON dbo.Users (Reputation DESC) INCLUDE(DisplayName)

WHERE Reputation >= 100000;

It’s got everything, really. It’s even ordered correctly.

Lambs To The Sausage

Now what if we complicate things a bit? What if we want to find items in the Posts table for all of those very active users?

Say we wanted to find their top 10 questions by Score. We could add this Cross Apply to do just that.

SELECT      u.Id, u.DisplayName, u.Reputation, ca.*
FROM        dbo.Users AS u
CROSS APPLY (   SELECT   TOP 10
                         p.Title, p.Score
                FROM     dbo.Posts AS p
                WHERE    p.OwnerUserId = u.Id
		        AND p.PostTypeId = 1
                ORDER BY p.Score DESC ) AS ca
WHERE       u.Reputation >= 100000
ORDER BY    u.Reputation DESC;

SELECT u.Id, u.DisplayName, u.Reputation, ca.*

FROM dbo.Users AS u

CROSS APPLY ( SELECT TOP 10

p.Title, p.Score

FROM dbo.Posts AS p

WHERE p.OwnerUserId = u.Id

AND p.PostTypeId = 1

ORDER BY p.Score DESC ) AS ca

WHERE u.Reputation >= 100000

ORDER BY u.Reputation DESC;

But it would be slow. Real slow. This thing does a lot of work.

Table 'Worktable'. Scan count 550, logical reads 151090492
Table 'Posts'. Scan count 1, logical reads 10497889, read-ahead reads 10461004
Table 'Users'. Scan count 1, logical reads 6

 SQL Server Execution Times:
   CPU time = 196953 ms,  elapsed time = 214454 ms.

Table 'Worktable'. Scan count 550, logical reads 151090492

Table 'Posts'. Scan count 1, logical reads 10497889, read-ahead reads 10461004

Table 'Users'. Scan count 1, logical reads 6

SQL Server Execution Times:

CPU time = 196953 ms, elapsed time = 214454 ms.

There are some interesting things here!

There’s no missing index request here, yet…
The optimizer decided to create an index for us behind the scenes
The index it created still required sorting data

Sorted Affair

You may want to create an index like this. It has a lot of potential!

CREATE INDEX ix_posts_ennui
	ON dbo.Posts (Score DESC, OwnerUserId) INCLUDE (Title, PostTypeId)
		WHERE PostTypeId = 1;

CREATE INDEX ix_posts_ennui

ON dbo.Posts (Score DESC, OwnerUserId) INCLUDE (Title, PostTypeId)

WHERE PostTypeId = 1;

We’ll have the data sorted for our order by! Then it’s just a matter of grabbing the right users.

It’s filtered and everything.

What could go wrong?

Well, sit down for a spell, and lemme tell ya…

Table 'Posts'. Scan count 1, logical reads 49199261
Table 'Users'. Scan count 1, logical reads 6

 SQL Server Execution Times:
   CPU time = 178171 ms,  elapsed time = 183820 ms.

Table 'Posts'. Scan count 1, logical reads 49199261

Table 'Users'. Scan count 1, logical reads 6

SQL Server Execution Times:

CPU time = 178171 ms, elapsed time = 183820 ms.

This thing flat out lies to us.

See where it says the Posts table was scanned once?

There’s about 549 scans that we’re not being told about.

That top executed 550 times, and scanned the index for 10 rows.

Cool, huh?

Back To The Drawing Board

Our fear of needing to sort data may have gotten the better of us on that last one.

If we think back to an earlier post about how equality searches can support sorts with non-leading columns, we might try this index.

CREATE INDEX ix_posts_apathy
	ON dbo.Posts (OwnerUserId, Score DESC) INCLUDE (Title)
		WHERE PostTypeId = 1;

CREATE INDEX ix_posts_apathy

ON dbo.Posts (OwnerUserId, Score DESC) INCLUDE (Title)

WHERE PostTypeId = 1;

This finishes… yep. And hey, look, it’s back to being honest about the number of times we hit the Posts table.

Table 'Posts'. Scan count 550, logical reads 5623
Table 'Users'. Scan count 1, logical reads 6

 SQL Server Execution Times:
   CPU time = 62 ms,  elapsed time = 63 ms.

Table 'Posts'. Scan count 550, logical reads 5623

Table 'Users'. Scan count 1, logical reads 6

SQL Server Execution Times:

CPU time = 62 ms, elapsed time = 63 ms.

The query plan is a less offensive version of the last one, still with no sort.

But this time sporting a much more efficient index seek.

Why Does This Work?

For the Nested Loops Join, we pass in a unique list of Ids from the Users table, and for each of them, the Top operator performs a single seek.

Since they’re processed iteratively, when we match to an OwnerUserId in Posts, Score is already ordered for us.

This takes care of almost all the work that we were doing in previous plans.

Thanks for reading!

Building SQL ConstantCare®: Why People Aren’t Using Query Store

New Stack Overflow Public Database Available (2018-06)

7 Comments. Leave new

jadarnel27
June 12, 2018 11:55 am

Do you have any idea *why* the Scan Count reports 1 instead of 550 in the second plan?

Reply
- Erik Darling
  June 12, 2018 1:47 pm
  
  Josh,
  
  No, not really. I think there’s some ~~bug~~ disconnect in stats io when it comes to loops. For example:
  
  SET STATISTICS IO ON; SET NOCOUNT ON; CREATE INDEX ix_whatever ON dbo.Users (Reputation);
  SELECT * FROM dbo.Users AS u WHERE u.Reputation = 2 OPTION(RECOMPILE);
  
  Uses a key lookup, and returns this:
  
  Table 'Users'. Scan count 1, logical reads 346
  
  Which is just totally wrong.
  
  Reply
  - jadarnel27
    June 14, 2018 9:32 am
    
    Ew! Something should attach a debugger and figure out why that happens.
    
    Reply
Chuck
June 18, 2018 2:11 pm

Dumb question but, why would you WANT to join a anything using an APPLY operator except a TVF (which I think requires it). Doesn’t that severely limit the the optimizer’s options for joining the row source? Or was it just for the purpose of demonstration?

Speaking of CROSS APPLY and TVFs, do you have any suggestions on improving the performance of a query thats using CROSS APPLY with a TVF on the right side and a very large row (100k rows) source on the left? I’m trying to improve the query performance without having to refactor.

Reply
- Erik Darling
  June 18, 2018 2:16 pm
  
  Chuck — an inline TVF is just a query. There are no special powers there with Cross Apply, really. For a lot of great examples, go ahead and search for Itzik Ben-Gan cross apply. You’ll find some good resources.
  
  Your question is way too vague for me to even venture a guess. You should ask it with the appropriate level of detail at dba.stackexchange.com, or another Q&A site that’s better for getting query tuning help.
  
  Thanks!
  
  Reply
Chuck
June 18, 2018 2:20 pm

Sorry for the vagueness. I forgot to mention is a multi-statement TVF.

Reply
- Erik Darling
  June 18, 2018 2:28 pm
  
  Chuck — that really doesn’t add enough information to help. Please please please post your question with some actual information on a Q&A site.
  
  Reply

Get Free SQL Stuff

"*" indicates required fields

Company

This field is for validation purposes and should be left unchanged.

Name*

First Last

Email*

Things I want*

First Responder Kit (Scripts and Guides)

Monday Links (SQL & tech news)

Blog posts (2-4 per week)

Indexing Strategies For Cross Apply

Slicked Brad

Lambs To The Sausage

Sorted Affair

Back To The Drawing Board

Why Does This Work?

Related

7 Comments. Leave new

Leave a Reply Cancel reply

Hi! I’m Brent Ozar.

Get Free SQL Stuff