Mindless Self Promotion
I liked writing this blog post so much that I wrote an entire presentation on it. If you’d like to see it at GroupBy, click the link and vote.
Update: You can watch the video for this session here.
Much has been written about this
It’s probably one of the lowest hanging items a performance tuner can deal with.
Don’t need all those columns?
Don’t select them.
But what if you do need them?
You’re left with pretty grim choices.
- Make a really wide nonclustered index: (some key columns) include (every other column)
- Rearrange your existing clustered index — maybe the wrong key column was chosen to begin with
- Create a narrow nonclustered index on just the (some key columns) and then hope that by some stroke of luck, no one ever conjures up a WHERE clause that pushes the optimzer past the Key Lookup tipping point and into the clustered index scan zone
Assuming that you don’t find any of those palatable: I’m 100% with you.
What if there’s another way?
Query Godmother
Using the Stack Overflow database (duh) as an example, let’s check out the Users table.

There are some columns in there I’d be unhappy to index even as includes, especially AboutMe which is a MAX.
Right now, we have this query, and it has a cost of 156.8 Query Buckaroos.
1 2 3 4 5 6 7 |
SELECT u.*, p.Id AS [PostId] FROM dbo.Users AS u JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id WHERE u.CreationDate > '20160101' AND u.Reputation > 100 AND p.PostTypeId = 1; |
Here’s the CPU and I/O profile from SET STATISTICS TIME, IO ON
— I’ve abridged all the I/O output in the post so you don’t have to read that a bunch of things did 0 reads. If it looks a little funny to you, that’s why.
Table 'Posts'. Scan count 7, logical reads 25187
Table 'Users'. Scan count 7, logical reads 80834
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 4297 ms, elapsed time = 2324 ms.
The thing of it is, we created this nonclustered index
1 |
CREATE INDEX ix_Users ON dbo.Users (CreationDate, Reputation, Id); |
But it gets no use, sort of like my willpower.

Let’s pretend we care
This query only returns a couple thousand rows, so you’d think the optimizer would choose a key lookup plan.
A quick check forcing our index leaves us scratching our collective heads — this query has a ‘much’ higher cost, at 275.6 Query Bucks, but finishes much faster.
1 2 3 4 5 6 7 |
SELECT u.*, p.Id AS [PostId] FROM dbo.Users AS u WITH (INDEX = ix_Users) --Force Fed Index JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id WHERE u.CreationDate > '20160101' AND u.Reputation > 100 AND p.PostTypeId = 1 |
Here’s the stats output:
Table 'Users'. Scan count 7, logical reads 5479
Table 'Posts'. Scan count 7, logical reads 25279
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 532 ms, elapsed time = 109 ms.
Here’s the query plan:

Recap so far: When we force our thoughtful nonclustered index, we get a more expensive plan that runs in 100 milliseconds vs 2.3 seconds.
But we hate hints
That key lookup plan might not always be awesome. Like I mentioned, this query only returns a couple thousand rows. If we expand our search, we may not want to do that key lookup a whole bunch of times. If we keep forcing the index, queries that return more rows will necessitate more key lookups, and that can really slow things down.

Organics
So how on earth do we get the optimizer to choose our nonclustered index, have it make sense to do so, and not do row-by-row Key Lookups when it shouldn’t?
One option is to use a CTE, or common table expression for those of you who have word quotas to fill.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
WITH precheck AS ( SELECT u.Id, p.Id AS [PostId] --Don't select much here FROM dbo.Users AS u JOIN dbo.Posts AS p ON p.OwnerUserId = u.Id WHERE u.CreationDate > '20160101' --Filter! AND u.Reputation > 100 --Filter more! AND p.PostTypeId = 1 ) SELECT u.*, p.PostId --Save display level column selection for the end FROM precheck p JOIN dbo.Users AS u ON p.Id = u.Id; |
How do we do? Here are the stats output results:
Table 'Users'. Scan count 7, logical reads 8340
Table 'Posts'. Scan count 7, logical reads 25279
Table 'Workfile'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 577 ms, elapsed time = 109 ms.
We have very similar metrics to when we force the index with a hint. How does the plan look?

This could use some explanation, here! Why is this better?
We use two narrow nonclustered indexes to do our early joins and predicate filtering. Even though in the plan for the original query, the predicates on CreationDate and Reputation are easily pushed to the clustered index scan, they aren’t key columns there. That means we read every row in the CX and filter along the way. Using the narrow nonclustered index, read far fewer pages (5.2 million rows vs 283k rows).
The results of this join are passed on and finally joined back to the clustered index on Users. This gives us our display level columns, but only for the rows we need to show you.
We’re not dragging them around all throughout the query. This is a far more efficient use of, well everything.
I know what you’re thinking, though. Didn’t you just replace a Key Lookup with a Nested Loops Join?
YEAH I DID!
But check it out, the Nested Loops Join is a smarty pants, executes 6 times, and grabs about 407 rows per iteration. Remember our Key Lookup executed 1506 times to get 1506 rows. That’s, like… One. One at a time.

This can also be extended to Cross Apply, because of course it can.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SELECT u.*, ca.PostId FROM dbo.Users AS u CROSS APPLY( SELECT p.Id AS [PostId] FROM dbo.Users AS u2 JOIN dbo.Posts AS p ON p.OwnerUserId = u2.Id WHERE u2.Id = u.Id AND u2.CreationDate > '20160101' AND u2.Reputation > 100 AND p.PostTypeId = 1 ) ca; |
This results in the same plan and the same stats output. No need to rehash it all.
Different, but valid
If the logical needs of your query change, it may be simpler to express things with EXISTS, but the same basic idea works.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SELECT u.* FROM dbo.Users AS u WHERE EXISTS ( SELECT 1 AS Whatever FROM dbo.Users AS u2 JOIN dbo.Posts AS p ON p.OwnerUserId = u2.Id WHERE u2.Id = u.Id AND u2.CreationDate > '20160101' AND u2.Reputation > 100 AND p.PostTypeId = 1 ); |
The stats output is close enough to the CTE and Cross Apply plans. The query plan is a touch different, though. A Hash Match Aggregate on the Id column is inserted after the initial join.

Shut up already
While I’m not a fan of SELECT * queries, I realize that they may be necessary sometimes. I mean, why have all those columns if you’re not gonna show’em to anyone?
If you can, try to cut out unnecessary columns from queries. Richie has a good post about doing that with EF over here.
If you can’t, you can always defer the pain of scanning the clustered index until you’ve cut results down to a more reasonable bunch of rows, and you can do it in a way where you don’t have to rely on the optimizer choosing a Key Lookup plan, or forever forcing one where it might not be helpful.
Thanks for reading!
12 Comments. Leave new
Very good article, Erik!
I just wonder from where you get your query bucks? I may have missed that part, or another article about that.
The big problem for most of us DBA:s having 50 developers writing/generating SQL-queries is that there are 50 developers writing/generating SQL-queries 🙂
When you look at an execution plan, you can see how much it “costs” by hovering over the SELECT operator and looking at the Estimated Subtree Cost.
From there, each operator in the plan will have a cost that (usually) adds up to the cost in the SELECT operator.
Very good! I was afraid there was a more complicated measure (Like DTU:s) on it’s way 🙂
DTUs! What planet do I look like I’m from?
(For the record, Erik can make it rain query bucks.)
Thanks for your article, I read somewhere you shouldn’t have to add the column of the clustered index (here the primary key, id) in a non clustered index, but here you add it to ix_users, so i’m a bit confused… am i wrong?
It doesn’t matter either way. I define it in my index because my query uses it, and I want the intent of the index to be clear.
Awesome article! I’ve never been able to successfully leverage a CTE as an “optimization fence”, but will need to try again! What version(s) of SQL have you tried this on?
I’m doing all my testing on 2016. Nothing against older versions, but…
Actually, everything against older versions. They’re no fun.
Ahh, I’m doing very little in 2016 right now. None of my clients have really adopted it yet. Most are running 2008, 2008R2, 2012, and a few 2014. The joys of consulting… even have some on 2000 and 2005 still!
HA! Found a way to do this in SQL Server 2012… in my CTE I had to put a GROUP BY on the clustered index column being returned. The overall query plan remains the same in my case, however the key lookup is replaced with a clustered index seek, and the missing index recommendation of 200+ included columns goes away. Might add also all the metrics (STATISTICS IO, TIME) are basically identical in both cases.
Great article.
I had no idea that one possible cure for SELECT * it would in CTE tables.
The learning of the day.
Thanks.