Performance Tuning

Hash Join Memory Grant Factors

By Erik Darling · March 29, 2018 · 8 comments

Buskets

Much like Sorts, Hash Joins require some amount of memory to operate efficiently — without spilling, or spilling too much.

And to a similar degree, the number of rows and columns passed to the Hashing operator matter where the memory grant is concerned. This doesn’t mean Hashing is bad, but you may need to take some extra steps when tuning queries that use them.

The reasons are pretty obvious when you think about the context of a Hash operation, whether it’s a join or aggregation.

All rows from the build side have to arrive at the operator (in parallel plans, usually after a bitmap filter)
The hashing function gets applied to join or grouping columns
In a join, the hashed values from the build side probe hashed values from the outer side
In some cases, the actual values need to be checked as a residual

During all that nonsense, all the columns that you SELECT get dragged along for the ride.

Here’s a quick example!

This query doesn’t return any rows, because Jon Skeet hadn’t hit 1 million rep in the data dump I’m using (Stack Overflow 2010).

/*Returns nothing*/
SELECT u.DisplayName
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 1000000;

1

2

3

4

5

6

/*Returns nothing*/

SELECT u.DisplayName

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 1000000;

Despite that, the memory asks for about 7 MB of memory to run. This seems to be the lowest memory grant I could get the optimizer to ask for

If we drop the Reputation filter down a bit so some rows get returned, the memory grant stays the same.

SELECT u.DisplayName
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 500000;

1

2

3

4

5

SELECT u.DisplayName

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 500000;

That’s why I’m calling 7MB the “base” grant here — that, and if I drop the Reputation filter lower to allow more people in, the grant will go up.

SELECT u.DisplayName
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 400000;

1

2

3

4

5

SELECT u.DisplayName

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 400000;

But we can also get a grant higher than the base by requesting more columns.

SELECT u.DisplayName, p.Title
FROM   dbo.Users AS u
JOIN   dbo.Posts AS p
    ON u.Id = p.OwnerUserId
WHERE  u.Reputation >= 500000;

1

2

3

4

5

SELECT u.DisplayName, p.Title

FROM dbo.Users AS u

JOIN dbo.Posts AS p

ON u.Id = p.OwnerUserId

WHERE u.Reputation >= 500000;

This is more easily accomplished by selecting string data. Again, just like with Sorts, we don’t need to actually sort by string data for the memory grant to go up. We just need to make it pass through a memory consuming operator.

Thanks for reading!

Brent says: you remember how, in the beginning of your career, some old crusty DBA told you to avoid SELECT *? Turns out they were right.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

8 comments

David

March 29, 2018 at 3:30 pm

Had an interesting problem with a hash operator on a batch mode query where the spills to tempdb were hilarious. TF 9389 (dynamic memory grant) helped but the real crux of the problem seems to be bad estimates – it was just massively underestimating the amount of data that was going to flow through the operator. I think targeted stats (and updated) will be a better fix, but that can wait until after Easter!

Reply
1. Erik Darling
  
  March 29, 2018 at 4:08 pm
  
  David — interesting. Did you figure out why the estimates are so bad?
  
  Reply
Hugo Kornelis

April 2, 2018 at 8:39 am

I am a bit surprised that you say that 7MB is the lowest you can get the memory grant when the execution plan shows an estimated 4266.6 rows. If you change the predicate so that the estimate is brought down to 1, surely the memory grant will be reduced as well.

Based on my own testing, the minimum memory grant appear to be 1056KB. But I will be the first to admit that my testing is still limited – I used just a single query / plan and changed only the estimates and the number and size of the columns.

Reply
1. Erik Darling
  
  April 2, 2018 at 9:25 am
  
  Hugo — I think you misread. I’m saying that’s the lowest memory grant I get when there’s a hash join. And this is limited to my server, not a general rule.
  
  The lowest memory grant you can get (by default) is 1024, but generally the next increment up is 1056.
  
  Thanks!
  
  Reply
  1. Mr Lester
    
    April 26, 2018 at 2:22 am
    
    How about if you used the inner hash join query hint? Would that not help get you a lower mem grant whilst retaining the hash join or were you deliberately avoiding that?
    
    Reply
    1. Hugo Kornelis
      
      April 26, 2018 at 2:44 am
      
      I would expect to see the exact same execution plans if you add the HASH join hint.
      
      This hint does two things:
      1. Force a hash join – the optimizer was already picking a hash join so this will not change the query plan.
      2. For the order (so the table mentioned first will always be the build input) – the optimizer was already selecting the User table to be the build input so this will not change the query plan either.
      
      Reply
David

April 5, 2018 at 12:20 pm

I still need to properly go over this but at the moment I’m convinced it’s stats related, it was happening on a staging/workings table which is truncated every time we run this process so you’re rolling your luck a little bit and hoping the sampling is going to get it right every time. This table ultimately gets partitioned switched into a final table which is very large so I think that getting the stats to sample more (maybe even FULLSCAN) on the work table will will be beneficial for that particular query and then the stats will follow along when switched so you get the benefit on queries hitting the final table as well.

Reply
David

April 7, 2018 at 8:30 am

Looks like stats on the non-partitioned table don’t get pushed to the final table (even though the stats on the final table are flagged as incremental..) when you switch the partition in. Bummer.

Reply

Hash Join Memory Grant Factors

Buskets

Here’s a quick example!

Get my new posts by email

Keep digging

8 comments

Leave a comment Cancel reply