How Much Can One Column Change A Query Plan? Part 2

What happened in Part 1?

Join Elimination, naturally. Until the end. My copy of the Stack Overflow database doesn’t have a single foreign key in it, anywhere.

If we go down the rabbit hole a couple steps, we end up at a very quotable place, with Rob Farley.

2. Duplicated rows

Not necessarily duplicated completely, but certainly a row in our original table may appear multiple times in our result set, if it matches with multiple rows in the new table. Notice that this doesn’t occur if the join-fields (those used in the ON clause) in the new table are known to be unique (as is the case with a Foreign-Key lookup).

When we select a distinct list from one column, or create a unique index on one column, the optimizer knows that that one column is unique and won’t produce multiples of a value. I’m assured by mathematicians that even if you left join two distinct lists, it won’t produce duplicates.

With more than one column involved in a DISTINCT/GROUP BY, there may be duplicates of a single value, which would change our results. There’s a little more information about this over here as well.

How does that apply to us?

The results are going to be every DisplayName in the Users table, but the way our left joins are written to DISTINCT/GROUP BY the list of Ids that each produces, we know that each would only occur once.

That isn’t true in the last join, where we messed with columns. That join may produce multiples of some Ids with the multi-column distinct, which means the join can’t safely be eliminated. You could end up needing to show some DisplayNames more than once, in other words.

Similarly

If I re-create all my joins by dumping them into temp tables, we get a similar effect. A difference I want to point out is that I’m not joining other temp tables to each other, like in the first query.  That’s why the “big” plan only has two joins. The multi-column-duplicate DISTINCT changed things up the whole tree of joins. Funny, right? Hysterical.

Here’s what happens.

Expected?

If I go back and add in the joins, the plan changes again. The duplicate producing join has a domino effect on the other joins — now they can’t be safely eliminated.

Dupes come out at night

Want a simple example?

If you’d like something a bit easier to follow along with, use this example.

Thanks for reading!

Previous Post
Creating Basic Indexes on the Stack Overflow Public Database
Next Post
How Much Can One Row Change A Query Plan? Part 1

3 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.