Slow “Having” Query? Try Pre-Filtering.

T-SQL
11 Comments

I was helping a client with a query, and I’m going to rework the example to use the Stack Overflow database for easier storytelling.

Say we need to:

  1. Find all the locations where users have logged in since a certain date, then
  2. Return the total count of people who live in those locations

One way to write the query would be:

And in order to expedited it, I’ve added a couple of indexes:

When the query runs, its actual execution plan is a scan of all the locations:

Because to SQL Server, it has to group the locations together, finding the max LastAccessDate in every location, before it begins the filtering process.

The client said, “Wait, that doesn’t make any sense – why isn’t SQL Server using the index on LastAccessDate? It has the Location on there too. SQL Server could just jump to 2022-01-01 on that index, make a list of the Locations that match, and then use those for the next part of the query.”

Well, it could. But the query optimizer wasn’t written that way.

Instead, if you want that behavior, you have to rewrite the query yourself:

The query runs nearly instantaneously because its actual execution plan is much more efficient in cases where few locations match:

It only does 3 logical reads instead of about 30,000.

A few things to keep in mind:

  • I used a CTE, but other ways can work as well, like subqueries or temp tables.
  • This technique only works well when a minority of rows match the filter.
  • If the filter is parameterized, and its value changes dramatically, this is a recipe for parameter sniffing.
  • Temp tables can be a solution to that, but you have to watch out for temp table stats reuse.
Previous Post
[Video] Office Hours: Speed Round Edition
Next Post
Who’s Hiring in the Microsoft Data Platform Community? December 2022 Edition

11 Comments. Leave new

  • I see it easier this way:

    SELECT Location, COUNT(*) AS recs
    FROM dbo.Users
    WHERE Location IN (SELECT DISTINCT Location FROM dbo.Users WHERE LastAccessdate > ‘2022-01-01’)
    GROUP BY l.Location;

    Reply
    • Cavaliery – sure, make sure to read the entire blog post, right down to that last paragraph. I actually mention that approach in there. Cheers!

      Reply
      • summerfondness
        December 7, 2022 2:15 pm

        Read the entire post… Nobody comes here to do that. Just need to find the snippet to copy and paste, then I’m out. 😉

        Reply
  • Do you think a columnstore index might get good results here?

    Reply
    • That’s the cool thing about all my blog posts – if you’ve got an idea on how performance might change, you can download the database, use the scripts, and find out for yourself! Go for it.

      Reply
  • Thanks Brent and also the friends in the comments

    Reply
  • Brent Spaulding
    December 10, 2022 3:30 pm

    Seems like …

    SELECT Location, COUNT(*) AS recs
    FROM dbo.Users
    WHERE LastAccessdate > ‘2022-01-01’)
    GROUP BY l.Location;

    Would do the trick … am I missing something in my quick read of things?

    Reply
    • Yep, you sure are. That doesn’t list the people who live in those top locations. Cheers!

      Reply
      • Brent Spaulding
        December 12, 2022 2:58 am

        Not to be argumentative … but I will be with a smile of course! .. 🙂

        HA. yep, you’re right … no list of people .. none of the queries do that! .. so I am missing as much as everyone else …

        1. Find all the locations where users have logged in since a certain date, then
        2. Return the total count of people who live in those locations

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.