A Quirk In Parallel Hash Join Plans

Last Updated July 18, 2018

Bailing, Takes Me Away

Both of these queries will return zero rows, in the 2010 version of the Stack Overflow database.

SELECT COUNT(*) AS records
FROM dbo.Votes AS v
JOIN dbo.Posts AS p
ON v.UserId = p.OwnerUserId
WHERE v.CreationDate >= '2011-01-01';

SELECT COUNT(*) AS records
FROM dbo.Votes AS v
JOIN dbo.Posts AS p
ON v.UserId = p.OwnerUserId
WHERE v.CreationDate >= '2011-01-01'
OPTION(MAXDOP 1);

SELECT COUNT(*) AS records

FROM dbo.Votes AS v

JOIN dbo.Posts AS p

ON v.UserId = p.OwnerUserId

WHERE v.CreationDate >= '2011-01-01';

SELECT COUNT(*) AS records

FROM dbo.Votes AS v

JOIN dbo.Posts AS p

ON v.UserId = p.OwnerUserId

WHERE v.CreationDate >= '2011-01-01'

OPTION(MAXDOP 1);

Querying a couple of the larger tables, with no indexes on the join columns leads to hash join plans.

If you’re wondering why I’m forcing the second plan serial, using the MAXDOP 1 hint…

All Or Nothing

This is kind of a cool optimization that has been around for quite a while.

In the serial hash join plan, only one thread is working. When it figures out that the scan of Votes doesn’t return any rows, it can quit doing work there. The hash table is empty, so it doesn’t have to touch the Posts table.

In the parallel query, with multiple threads at work, both branches of the query plan can be active at the same time. That means threads may start up and do some work on the Posts table, only to get shut down when other threads signal that no rows got produced to work on.

In this scenario, the work is minimal. Of course, this doesn’t mean you should go around slapping MAXDOP 1 hints on your queries on the off chance that you run into this. There’s nothing particularly actionable here, only explainable.

I Know What You’re Thinking

Maybe it’s not the hash join. Maybe it’s a different optimization.

After all, hashes (both match and join), along with sorts, require all rows to arrive before they can begin processing them. They’re often called blocking, or stop and go operators because of this.

It might seem logical that the optimizer could do something similar when a blocking operation before a join produces zero rows.

Unfortunately, that’s not the case. Altering the queries slightly to force a merge join like below…

SELECT v.UserId, COUNT(*) AS records
FROM dbo.Votes AS v
INNER MERGE JOIN dbo.Posts AS p
ON v.UserId = p.OwnerUserId
WHERE v.CreationDate >= '2011-01-01'
GROUP BY v.UserId;

SELECT v.UserId, COUNT(*) AS records
FROM dbo.Votes AS v
INNER MERGE JOIN dbo.Posts AS p
ON v.UserId = p.OwnerUserId
WHERE v.CreationDate >= '2011-01-01'
GROUP BY v.UserId
OPTION(MERGE JOIN, MAXDOP 1);

SELECT v.UserId, COUNT(*) AS records

FROM dbo.Votes AS v

INNER MERGE JOIN dbo.Posts AS p

ON v.UserId = p.OwnerUserId

WHERE v.CreationDate >= '2011-01-01'

GROUP BY v.UserId;

SELECT v.UserId, COUNT(*) AS records

FROM dbo.Votes AS v

INNER MERGE JOIN dbo.Posts AS p

ON v.UserId = p.OwnerUserId

WHERE v.CreationDate >= '2011-01-01'

GROUP BY v.UserId

OPTION(MERGE JOIN, MAXDOP 1);

The plans end up sorting all over the place:

And both queries appear to read fully from the Posts table, despite the sort operator after the Votes table producing 0 rows.

Thanks for reading!

A New tempdb Trace Flag Appears!

Announcing “Great Post, Erik” – A Book of Erik’s Best Work

Get Free SQL Stuff

"*" indicates required fields

Instagram

This field is for validation purposes and should be left unchanged.

Name*

First Last

Email*

Things I want*

First Responder Kit (Scripts and Guides)

Monday Links (SQL & tech news)

Blog posts (2-4 per week)

A Quirk In Parallel Hash Join Plans

Bailing, Takes Me Away

All Or Nothing

I Know What You’re Thinking

Related

Leave a Reply Cancel reply

Hi! I’m Brent Ozar.

Get Free SQL Stuff