Indexing Strategies For Cross Apply

Slicked Brad

The apply operator remains one of my favorite tools, but much like CTEs and windowing functions, they’re not totally magical unless you look at your indexes.

Starting with a dead simple query on the Users table, it’d be easy to come up with an ideal index for this one thing.

If you really cared the most about this, you’d probably create an index like this for it.

It’s got everything, really. It’s even ordered correctly.

Lambs To The Sausage

Now what if we complicate things a bit? What if we want to find items in the Posts table for all of those very active users?

Say we wanted to find their top 10 questions by Score. We could add this Cross Apply to do just that.

But it would be slow. Real slow. This thing does a lot of work.

There are some interesting things here!

  1. There’s no missing index request here, yet…
  2. The optimizer decided to create an index for us behind the scenes
  3. The index it created still required sorting data
It took a lifetime.

Sorted Affair

You may want to create an index like this. It has a lot of potential!

We’ll have the data sorted for our order by! Then it’s just a matter of grabbing the right users.

It’s filtered and everything.

What could go wrong?

Well, sit down for a spell, and lemme tell ya…

This thing flat out lies to us.

See where it says the Posts table was scanned once?

Malarkey!

There’s about 549 scans that we’re not being told about.

That top executed 550 times, and scanned the index for 10 rows.

Cool, huh?

Back To The Drawing Board

Our fear of needing to sort data may have gotten the better of us on that last one.

If we think back to an earlier post about how equality searches can support sorts with non-leading columns, we might try this index.

This finishes… yep. And hey, look, it’s back to being honest about the number of times we hit the Posts table.

The query plan is a less offensive version of the last one, still with no sort.

But this time sporting a much more efficient index seek.

Weary

Why Does This Work?

For the Nested Loops Join, we pass in a unique list of Ids from the Users table, and for each of them, the Top operator performs a single seek.

Since they’re processed iteratively, when we match to an OwnerUserId in Posts, Score is already ordered for us.

This takes care of almost all the work that we were doing in previous plans.

Thanks for reading!

Previous Post
Building SQL ConstantCare®: Why People Aren’t Using Query Store
Next Post
New Stack Overflow Public Database Available (2018-06)

7 Comments. Leave new

  • Do you have any idea *why* the Scan Count reports 1 instead of 550 in the second plan?

    Reply
    • Erik Darling
      June 12, 2018 1:47 pm

      Josh,

      No, not really. I think there’s some bug disconnect in stats io when it comes to loops. For example:

      SET STATISTICS IO ON;
      SET NOCOUNT ON;
      CREATE INDEX ix_whatever ON dbo.Users (Reputation);

      SELECT *
      FROM dbo.Users AS u
      WHERE u.Reputation = 2
      OPTION(RECOMPILE);

      Uses a key lookup, and returns this:

      Table 'Users'. Scan count 1, logical reads 346

      Which is just totally wrong.

      Reply
  • Dumb question but, why would you WANT to join a anything using an APPLY operator except a TVF (which I think requires it). Doesn’t that severely limit the the optimizer’s options for joining the row source? Or was it just for the purpose of demonstration?

    Speaking of CROSS APPLY and TVFs, do you have any suggestions on improving the performance of a query thats using CROSS APPLY with a TVF on the right side and a very large row (100k rows) source on the left? I’m trying to improve the query performance without having to refactor.

    Reply
    • Erik Darling
      June 18, 2018 2:16 pm

      Chuck — an inline TVF is just a query. There are no special powers there with Cross Apply, really. For a lot of great examples, go ahead and search for Itzik Ben-Gan cross apply. You’ll find some good resources.

      Your question is way too vague for me to even venture a guess. You should ask it with the appropriate level of detail at dba.stackexchange.com, or another Q&A site that’s better for getting query tuning help.

      Thanks!

      Reply
  • Sorry for the vagueness. I forgot to mention is a multi-statement TVF.

    Reply
    • Erik Darling
      June 18, 2018 2:28 pm

      Chuck — that really doesn’t add enough information to help. Please please please post your question with some actual information on a Q&A site.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.