Just Another Way
No matter how you delve into missing index requests — whether it’s the plan level, DMV analysis, or (forgive me for saying it), DTA, the requests will generally be the same.
They’ll prioritize equality predicates, the columns may or not may be in the right order, the columns may or may not be in the right part of the index, and the impact…
Oh, that impact.
It’s all just a cry for help, anyway.
Like a teenager watching anime and buying intricate parasols.
Salted Grains
If I run these three queries with different join types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SELECT b.UserId, b.Name, b.Date, p.CreationDate FROM dbo.Badges AS b JOIN dbo.Posts AS p ON b.UserId = p.OwnerUserId WHERE b.Date = '20170101' OPTION(LOOP JOIN, MAXDOP 1, RECOMPILE) SELECT b.UserId, b.Name, b.Date, p.CreationDate FROM dbo.Badges AS b JOIN dbo.Posts AS p ON b.UserId = p.OwnerUserId WHERE b.Date = '20170101' OPTION(MERGE JOIN, MAXDOP 1, RECOMPILE) SELECT b.UserId, b.Name, b.Date, p.CreationDate FROM dbo.Badges AS b JOIN dbo.Posts AS p ON b.UserId = p.OwnerUserId WHERE b.Date = '20170101' OPTION(HASH JOIN, MAXDOP 1, RECOMPILE) |
They’re all going to ask for the same missing index:
1 2 3 4 5 |
USE [StackOverflow] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[Badges] ([Date]) INCLUDE ([Name],[UserId]) |
Kinda weird already, that a join column is an INCLUDE, but hey.
What’s even weirder is that they have diminishing estimated impacts based on join type.
- Loop: 99.1954%
- Merge: 42.7795%
- Hash: 28.7901%
It gets a bit stranger if I force parallelism!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SELECT b.UserId, b.Name, b.Date, p.CreationDate FROM dbo.Badges AS b JOIN dbo.Posts AS p ON b.UserId = p.OwnerUserId WHERE b.Date = '20170101' OPTION(LOOP JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE) SELECT b.UserId, b.Name, b.Date, p.CreationDate FROM dbo.Badges AS b JOIN dbo.Posts AS p ON b.UserId = p.OwnerUserId WHERE b.Date = '20170101' OPTION(MERGE JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE) SELECT b.UserId, b.Name, b.Date, p.CreationDate FROM dbo.Badges AS b JOIN dbo.Posts AS p ON b.UserId = p.OwnerUserId WHERE b.Date = '20170101' OPTION(HASH JOIN, USE HINT('ENABLE_PARALLEL_PLAN_PREFERENCE'), RECOMPILE) |
Now the estimated impacts look like this:
- Loop: 98.9569%
- Merge: 29.1982%
- Hash: 44.2455%
The Hash and Merge join impacts have just about changed places.
The funny thing is…
I totally agree.
As far as indexes go, that’s a crappy index for the Merge and Hash Join plans. But no better one is being offered, not even sneakily.
For the nested loops plans, it’s super easy to grab the UserIds for that date, and dig into the Posts table for just those.
For the merge join plans, it’s less helpful. While it’s nice that we can easily filter the date predicate, we still have to order our data for the merge join. There are further complications in the parallel version.
For the hash join plans, it’s a similar situation. We need to create a hash table on UserId. It being in the leaf of an index on Date doesn’t help us much, or rather as much as it would if it were in the key.
In short, both the merge and hash join plans have cost-increasing operators thrown into the mix that the index as requested just wouldn’t help.
Thanks for reading!
Brent says: in the Mastering Index Tuning class, these types of examples are why I tell students that you should look at one-key-column index suggestions carefully. In most cases, SQL Server really needs one (or more) of the included fields to be in the key. The hard part is figuring out which one(s) without looking at the execution plans.