I was helping a client with a query, and I’m going to rework the example to use the Stack Overflow database for easier storytelling.
Say we need to:
- Find all the locations where users have logged in since a certain date, then
- Return the total count of people who live in those locations
One way to write the query would be:
1 2 3 4 5 |
SELECT Location, COUNT(*) AS recs FROM dbo.Users GROUP BY Location HAVING MAX(LastAccessDate) > '2022-01-01' ORDER BY Location; |
And in order to expedited it, I’ve added a couple of indexes:
1 2 3 4 5 6 7 |
CREATE INDEX Location ON dbo.Users(Location) INCLUDE (LastAccessDate); GO CREATE INDEX LastAccessDate ON dbo.Users(LastAccessDate) INCLUDE (Location); |
When the query runs, its actual execution plan is a scan of all the locations:
Because to SQL Server, it has to group the locations together, finding the max LastAccessDate in every location, before it begins the filtering process.
The client said, “Wait, that doesn’t make any sense – why isn’t SQL Server using the index on LastAccessDate? It has the Location on there too. SQL Server could just jump to 2022-01-01 on that index, make a list of the Locations that match, and then use those for the next part of the query.”
Well, it could. But the query optimizer wasn’t written that way.
Instead, if you want that behavior, you have to rewrite the query yourself:
1 2 3 4 5 6 7 8 9 10 |
WITH LocationsWithRecentLogins AS ( SELECT DISTINCT Location FROM dbo.Users WHERE LastAccessdate > '2022-01-01' ) SELECT l.Location, COUNT(*) AS recs FROM LocationsWithRecentLogins l INNER JOIN dbo.Users u ON l.Location = u.Location GROUP BY l.Location ORDER BY l.Location; |
The query runs nearly instantaneously because its actual execution plan is much more efficient in cases where few locations match:
It only does 3 logical reads instead of about 30,000.
A few things to keep in mind:
- I used a CTE, but other ways can work as well, like subqueries or temp tables.
- This technique only works well when a minority of rows match the filter.
- If the filter is parameterized, and its value changes dramatically, this is a recipe for parameter sniffing.
- Temp tables can be a solution to that, but you have to watch out for temp table stats reuse.
11 Comments. Leave new
I see it easier this way:
SELECT Location, COUNT(*) AS recs
FROM dbo.Users
WHERE Location IN (SELECT DISTINCT Location FROM dbo.Users WHERE LastAccessdate > ‘2022-01-01’)
GROUP BY l.Location;
Cavaliery – sure, make sure to read the entire blog post, right down to that last paragraph. I actually mention that approach in there. Cheers!
Read the entire post… Nobody comes here to do that. Just need to find the snippet to copy and paste, then I’m out. 😉
Do you think a columnstore index might get good results here?
That’s the cool thing about all my blog posts – if you’ve got an idea on how performance might change, you can download the database, use the scripts, and find out for yourself! Go for it.
Thanks Brent and also the friends in the comments
Seems like …
SELECT Location, COUNT(*) AS recs
FROM dbo.Users
WHERE LastAccessdate > ‘2022-01-01’)
GROUP BY l.Location;
Would do the trick … am I missing something in my quick read of things?
Yep, you sure are. That doesn’t list the people who live in those top locations. Cheers!
Not to be argumentative … but I will be with a smile of course! .. 🙂
HA. yep, you’re right … no list of people .. none of the queries do that! .. so I am missing as much as everyone else …
1. Find all the locations where users have logged in since a certain date, then
2. Return the total count of people who live in those locations
You’re still missing something, chief. Try your query against the others and you’ll see how the results don’t match.
Thank you! — I got it! … sometimes the details are as elusive as the ketchup location in the fridge!