A Better Way To Select Star

Mindless Self Promotion

I liked writing this blog post so much that I wrote an entire presentation on it. If you’d like to see it at GroupBy, click the link and vote.

Update: You can watch the video for this session here.

Much has been written about this

It’s probably one of the lowest hanging items a performance tuner can deal with.

Don’t need all those columns?

Don’t select them.

But what if you do need them?

You’re left with pretty grim choices.

  1. Make a really wide nonclustered index: (some key columns) include (every other column)
  2. Rearrange your existing clustered index — maybe the wrong key column was chosen to begin with
  3. Create a narrow nonclustered index on just the (some key columns) and then hope that by some stroke of luck, no one ever conjures up a WHERE clause that pushes the optimzer past the Key Lookup tipping point and into the clustered index scan zone

Assuming that you don’t find any of those palatable: I’m 100% with you.

What if there’s another way?

Query Godmother

Using the Stack Overflow database (duh) as an example, let’s check out the Users table.

Not the worst, but…

There are some columns in there I’d be unhappy to index even as includes, especially AboutMe which is a MAX.

Right now, we have this query, and it has a cost of 156.8 Query Buckaroos.

Here’s the CPU and I/O profile from SET STATISTICS TIME, IO ON — I’ve abridged all the I/O output in the post so you don’t have to read that a bunch of things did 0 reads. If it looks a little funny to you, that’s why.

Table 'Posts'. Scan count 7, logical reads 25187
Table 'Users'. Scan count 7, logical reads 80834
Table 'Worktable'. Scan count 0, logical reads 0


SQL Server Execution Times:
CPU time = 4297 ms, elapsed time = 2324 ms.

The thing of it is, we created this nonclustered index

But it gets no use, sort of like my willpower.

Resist temptation!

Let’s pretend we care

This query only returns a couple thousand rows, so you’d think the optimizer would choose a key lookup plan.

A quick check forcing our index leaves us scratching our collective heads — this query has a ‘much’ higher cost, at 275.6 Query Bucks, but finishes much faster.

Here’s the stats output:

Table 'Users'. Scan count 7, logical reads 5479
Table 'Posts'. Scan count 7, logical reads 25279
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0

SQL Server Execution Times:
CPU time = 532 ms, elapsed time = 109 ms.

Here’s the query plan:

Doodles.

Recap so far: When we force our thoughtful nonclustered index, we get a more expensive plan that runs in 100 milliseconds vs 2.3 seconds.

But we hate hints

That key lookup plan might not always be awesome. Like I mentioned, this query only returns a couple thousand rows. If we expand our search, we may not want to do that key lookup a whole bunch of times. If we keep forcing the index, queries that return more rows will necessitate more key lookups, and that can really slow things down.

Study your math, kids

Organics

So how on earth do we get the optimizer to choose our nonclustered index, have it make sense to do so, and not do row-by-row Key Lookups when it shouldn’t?

One option is to use a CTE, or common table expression for those of you who have word quotas to fill.

How do we do? Here are the stats output results:

Table 'Users'. Scan count 7, logical reads 8340
Table 'Posts'. Scan count 7, logical reads 25279
Table 'Workfile'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0

SQL Server Execution Times:
CPU time = 577 ms, elapsed time = 109 ms.

We have very similar metrics to when we force the index with a hint. How does the plan look?

SOME

This could use some explanation, here! Why is this better?

We use two narrow nonclustered indexes to do our early joins and predicate filtering. Even though in the plan for the original query, the predicates on CreationDate and Reputation are easily pushed to the clustered index scan, they aren’t key columns there. That means we read every row in the CX and filter along the way. Using the narrow nonclustered index, read far fewer pages (5.2 million rows vs 283k rows).

The results of this join are passed on and finally joined back to the clustered index on Users. This gives us our display level columns, but only for the rows we need to show you.

We’re not dragging them around all throughout the query. This is a far more efficient use of, well everything.

I know what you’re thinking, though. Didn’t you just replace a Key Lookup with a Nested Loops Join?

YEAH I DID!

But check it out, the Nested Loops Join is a smarty pants, executes 6 times, and grabs about 407 rows per iteration. Remember our Key Lookup executed 1506 times to get 1506 rows. That’s, like… One. One at a time.

Bully for you

This can also be extended to Cross Apply, because of course it can.

This results in the same plan and the same stats output. No need to rehash it all.

Different, but valid

If the logical needs of your query change, it may be simpler to express things with EXISTS, but the same basic idea works.

The stats output is close enough to the CTE and Cross Apply plans. The query plan is a touch different, though. A Hash Match Aggregate on the Id column is inserted after the initial join.

BODY ONCE TOLD ME

Shut up already

While I’m not a fan of SELECT * queries, I realize that they may be necessary sometimes. I mean, why have all those columns if you’re not gonna show’em to anyone?

If you can, try to cut out unnecessary columns from queries. Richie has a good post about doing that with EF over here.

If you can’t, you can always defer the pain of scanning the clustered index until you’ve cut results down to a more reasonable bunch of rows, and you can do it in a way where you don’t have to rely on the optimizer choosing a Key Lookup plan, or forever forcing one where it might not be helpful.

Thanks for reading!

Previous Post
[Video] Office Hours 2017/06/28 (With Transcriptions)
Next Post
New Stack Overflow Public Database Available (2017-06)

12 Comments. Leave new

  • Håkan Nilsson
    July 6, 2017 3:51 am

    Very good article, Erik!
    I just wonder from where you get your query bucks? I may have missed that part, or another article about that.
    The big problem for most of us DBA:s having 50 developers writing/generating SQL-queries is that there are 50 developers writing/generating SQL-queries 🙂

    Reply
    • Erik Darling
      July 6, 2017 7:12 am

      When you look at an execution plan, you can see how much it “costs” by hovering over the SELECT operator and looking at the Estimated Subtree Cost.

      From there, each operator in the plan will have a cost that (usually) adds up to the cost in the SELECT operator.

      Reply
    • (For the record, Erik can make it rain query bucks.)

      Reply
  • Thanks for your article, I read somewhere you shouldn’t have to add the column of the clustered index (here the primary key, id) in a non clustered index, but here you add it to ix_users, so i’m a bit confused… am i wrong?

    Reply
    • Erik Darling
      July 6, 2017 6:35 pm

      It doesn’t matter either way. I define it in my index because my query uses it, and I want the intent of the index to be clear.

      Reply
  • Tony Fountain
    July 7, 2017 10:04 am

    Awesome article! I’ve never been able to successfully leverage a CTE as an “optimization fence”, but will need to try again! What version(s) of SQL have you tried this on?

    Reply
    • Erik Darling
      July 7, 2017 10:42 am

      I’m doing all my testing on 2016. Nothing against older versions, but…

      Actually, everything against older versions. They’re no fun.

      Reply
      • Tony Fountain
        July 10, 2017 9:35 am

        Ahh, I’m doing very little in 2016 right now. None of my clients have really adopted it yet. Most are running 2008, 2008R2, 2012, and a few 2014. The joys of consulting… even have some on 2000 and 2005 still!

        Reply
  • Tony Fountain
    August 17, 2017 10:33 am

    HA! Found a way to do this in SQL Server 2012… in my CTE I had to put a GROUP BY on the clustered index column being returned. The overall query plan remains the same in my case, however the key lookup is replaced with a clustered index seek, and the missing index recommendation of 200+ included columns goes away. Might add also all the metrics (STATISTICS IO, TIME) are basically identical in both cases.

    Reply
  • Juan Rodríguez R
    September 12, 2019 11:18 am

    Great article.
    I had no idea that one possible cure for SELECT * it would in CTE tables.
    The learning of the day.
    Thanks.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.