Brent buys lunch for the ladies
The purpose of this post is to show a bit of syntax that often gets overlooked in favor of using query hints to force joins to occur in a particular order. We’ll start by creating three tables. One for employees, one for orders, and one for items in the order.
1 2 3 4 5 6 |
/* An employees table! How novel! */ CREATE TABLE #Ozars (OzarID INT IDENTITY(1,1) NOT NULL, OzarName VARCHAR(30) NOT NULL) INSERT INTO #Ozars (OzarName) VALUES ('Brent'), ('Jeremiah'), ('Kendra'), ('Doug'), ('Jessica'), ('Erik') ALTER TABLE #Ozars ADD CONSTRAINT [PK_Ozars] PRIMARY KEY CLUSTERED (OzarID, OzarName) |
1 2 3 4 5 6 |
/* Luuuuuuuunch */ CREATE TABLE #Lunch (LunchID INT IDENTITY(1,1) NOT NULL, OzarID INT NOT NULL) INSERT INTO #Lunch (OzarID) VALUES (1),(1),(1),(3),(5) ALTER TABLE #Lunch ADD CONSTRAINT [PK_Lunch] PRIMARY KEY CLUSTERED (LunchID, OzarID) |
1 2 3 4 5 6 |
/* Brent called it in, so it's all under his ID. Because that's how restaurants work. By ID. Yep. */ CREATE TABLE #LunchOrders (LunchOrderID INT IDENTITY(1,1) NOT NULL, LunchID INT NOT NULL, Lunch VARCHAR(20)) INSERT INTO #LunchOrders (LunchID, Lunch) VALUES (1, 'Just Churros'), (1, 'Box of Wine'), (1, 'Kaled Kale') ALTER TABLE #LunchOrders ADD CONSTRAINT [PK_LunchOrders] PRIMARY KEY CLUSTERED (LunchOrderID, LunchID) |
A SQL celebrity gossip blog got a tip that someone from BOU ordered take-out. Not exactly an earth-shattering event, but querying minds want to know!
So they write a query, and then they look at the plan.
1 2 3 4 5 6 |
SELECT o.* , l.* , lo.* FROM #Ozars o LEFT JOIN #Lunch l ON l.OzarID = o.OzarID INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID |
And that’s way harsh. SQL went and changed our LEFT JOIN into an INNER JOIN. What was it thinking? Now we don’t know who Brent is having lunch with.
Who ordered the KALE?
Okay, we thought about it some. No more INNER JOIN.
We’ll get this done with another LEFT JOIN.
1 2 3 4 5 6 |
SELECT o.* ,l.* ,lo.* FROM #Ozars o LEFT JOIN #Lunch l ON l.OzarID = o.OzarID LEFT JOIN #LunchOrders lo ON lo.LunchID = l.LunchID |
Unless you’re Ernie, that’s wayyyyy too many Brents.
First of all, ew. But yeah, you can do this, and it will come back with the right results.
1 2 3 4 5 6 7 8 9 |
SELECT o.* , lol.* FROM #Ozars o LEFT JOIN ( SELECT l.*, lo.LunchOrderID, lo.Lunch FROM #Lunch l INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID ) lol ON o.OzarID = lol.OzarID |
We can even try an OUTER APPLY. That’s a little nicer looking as a query…
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT o.* , lol.* FROM #Ozars o OUTER APPLY ( SELECT l.LunchID , l.OzarID, lo.LunchOrderID, lo.Lunch FROM #Lunch l INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID WHERE o.OzarID = l.OzarID ) lol |
… But same yucky plan.
Hi, I’m a cool trick.
1 2 3 4 5 6 7 8 |
SELECT o.* , l.* , lo.* FROM #Ozars o LEFT JOIN #Lunch l --They see me LEFT JOIN... INNER JOIN #LunchOrders lo --Then INNER JOIN... ON lo.LunchID = l.LunchID --Then write both my ON o.OzarID = l.OzarID --ON clauses |
This is an interesting concept to play with. With more than a couple JOINs, you can start using parentheses to group them together, like so:
1 2 3 4 5 6 7 8 |
SELECT o.* , l.* , lo.* FROM #Ozars o LEFT JOIN (#Lunch l INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID) ON o.OzarID = l.OzarID |
If you switch the order of the ON clauses in the second to last query, you’ll get an error. Take a guess why in the comments!
1 2 3 4 5 6 |
/* Clean me up, buttercup. */ --DROP TABLE #Ozars; --DROP TABLE #Lunch; --DROP TABLE #LunchOrders; |
Brent says: Okay, first off, I don’t normally drink an entire box of wine for lunch, but I had to wash down all that kale. Second off, judging by these execution plans, SQL Server intercepted my wine delivery.
Kendra says: I’ve found parentheses join hints twice in production code in SQL Server. In both cases, nobody knew why they were there, what would happen if they rewrote the query, or even if they’d been used on purpose or if it was just an accident. If you use this technique, document your code heavily as to what you’re doing and why, or you’ll be that person everyone grumbles about.
18 Comments. Leave new
Hi,
This doesn’t really detract from the point of the post but threw me for a loop when I was testing it on my dev box:
In the first query, the join between #Lunch and #LunchOrders is on lo.LunchOrderID = l.LunchID
But in all the subsequent queries, the join is on lo.LunchID = l.LunchID
So the results are different, as the final query has all the ordered items against a single lunch as per the underlying data, rather than spread (erroneously) across all the individual lunches.
Neat trick all the same
ross
D’oh. I’ll fix it up in a bit. Thanks for pointing it out. The perils of coming back to a post two days later!
Slick, haven’t seen that before — as to why changing the JOIN order in the 2nd to last query might produce an error, I’d guess that SQL Server wants to process the JOIN between #Lunch and #LunchOrder before evaluating the JOIN between #Lunch and #Ozars (which is outside the parentheses). But that’s just a guess..
You’re posts are hilarious to read, keep ’em coming.
I did not know about parentheses join hints!
Do you if these are these guaranteed to enforce a join order, or is it something that the optimizer might choose to disregard in favour of a join order it expects to perform less work?
I’ve never seen it not do it, but I haven’t played with it enough to know for sure. And remember, you’re putting the optimizer in the corner here, so only use it when something is going completely awful and chasing you around the woods with a machete.
No, this syntax does not *guarantee* the execution plan will follow the written join order. For that, you’d also need an OPTION (FORCE ORDER) hint – something not to be used lightly because it has many subtle side-effects not limited to join ordering.
I would probably put the 2 lunch tables being inner joined in a table expression and avoid the awkward join syntax.
To add to my comment about table expressions. Either of these will produce the correct results and avoid dangling an ON clause.
WITH lo AS (
SELECT l.OzarID, lo.*
FROM #lunch l
INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID
)
SELECT o.*, lo.*
FROM #Ozars o
LEFT JOIN lo ON lo.ozarid= o.ozarid;
SELECT o.*, lo.*
FROM #Ozars o
LEFT JOIN (
SELECT l.OzarID, lo.*
FROM #lunch l
INNER JOIN #LunchOrders lo ON lo.LunchID = l.LunchID
) lo ON lo.ozarid= o.ozarid;
We use this pattern often. The key is always to remember that you don’t join tables, you join result sets. I think it helps greatly to simply wrap the inner join with the outer join (just a formatting change). But conceptually, you’re outer joining the result set of two inner joined tables.
LEFT…
INNER…
ON…
ON…
Thanks for providing the internals to show what’s happening.
Yup. That’s what I’ve been doing for years. Though I never thought of it that way, nor used that terminology. I just call them “nested joins”—like when people say “nested queries” (aka subqueries).
The parenthesis are actually useless. You can’t change order of operations with them; you have to re-order the join clauses.
One extremely important thing to note about nested joins is scoping of variables. While you can select/project everything from down at the deepest join, you are restricted what columns you can bind to at the various join levels.
(Spacing forced with underscores)
LEFT…
_____INNER…
_____ON…
ON…
That is fairly readable. I’ve just developed a strong distaste for this style of join syntax after having had to deal with MS Access generated SQL in the past. In goes wild with parans and nested ONs.
@Tony – Yes, they produce the results but also have the incorrect plan with all the scalars being computed when I run it.
As a pure style point (and in blind obedience to Codd’s laws) I think I would have designed the tables differently and only had one #Lunch entry for Brent (after all he only ate one lunch, even if it was sub-hic-stantial), but even playing with that didn’t seem to make a difference.
Disappointing that the OUTER APPLY and the CTE didn’t produce the right plan. That’s a fail in the optimizer for me.
Thank-you for sharing this.
Out of curiosity, how would you have one row for an order with multiple items?
This reminds me of the dumb joke: Try to find out 2 + 2 * 2 in calc.exe/physical calculator vs MS Excel — 6 or 8?
OUTER and INNER JOIN operators both have the same precedence level, so the analogy is more like 2 – 2 + 2 and you do need parenthesis to “push” addition before subtraction.
I always write outer joins *after* all inner joins and nowadays instead of LEFT JOIN (a JOIN b ON ..) ON .. always prefer LEFT JOIN a ON .. LEFT JOIN b ON .. with no parenthesis. This might produce sub-optimal query plan but readability is worth it.
@Erik
It’s the same as the classical Customer – OrderHeader – OrderLine.
You aren’t actually using #Lunch records with LunchID 2 and 3. The #LunchOrders are all attached to LunchID 1
Hey, this blew my mind! One thing I spotted (after showing everyone in the office this page) is that for this particular query you could have expressed it as a right outer join, e.g.
SELECT o.*
, l.*
, lo.*
FROM #LunchOrders lo
INNER JOIN #Lunch l ON l.LunchID = lo.LunchID
RIGHT OUTER JOIN #Ozars o ON o.OzarID = l.OzarID;
This produces the exact same plan as your funky nested join syntax, which makes sense given that you’re using nesting to reverse the join order. There will be other, more complex, scenarios that aren’t suited to a right outer join (also, who uses right outer joins?!).
I read you post with interest, but the point you want to make is not clear to me. You clam “Forcing Join Order Without Hints”, but then you start with a functional incorrect query, you do not prove that its incorrectness will be solved by OPTION(FORCE ORDER) or other hints, and of cause it will not fix the issue, because the incorrectness is at functional level.
The the functional mistake is a common one for beginners and I appreciate that you clarify it.
One can argue that SQL has always an functional join order. The SQL optimizer has the freedom to rearrange the technical join order without breaking the functional join order. This technical join order can be frozen by the use of hints. So I get impression that different levels are mixed up here.
Leave me the remark that Join Order hinting can be useful when SQL server has difficulties to make a good estimate of the cardinality, but that is an other topic.