A Quirk In Parallel Hash Join Plans

Bailing, Takes Me Away

Both of these queries will return zero rows, in the 2010 version of the Stack Overflow database.

Querying a couple of the larger tables, with no indexes on the join columns leads to hash join plans.

Whizbang!

If you’re wondering why I’m forcing the second plan serial, using the MAXDOP 1 hint…

Wait No More

All Or Nothing

This is kind of a cool optimization that has been around for quite a while.

In the serial hash join plan, only one thread is working. When it figures out that the scan of Votes doesn’t return any rows, it can quit doing work there. The hash table is empty, so it doesn’t have to touch the Posts table.

In the parallel query, with multiple threads at work, both branches of the query plan can be active at the same time. That means threads may start up and do some work on the Posts table, only to get shut down when other threads signal that no rows got produced to work on.

In this scenario, the work is minimal. Of course, this doesn’t mean you should go around slapping MAXDOP 1 hints on your queries on the off chance that you run into this. There’s nothing particularly actionable here, only explainable.

I Know What You’re Thinking

Maybe it’s not the hash join. Maybe it’s a different optimization.

After all, hashes (both match and join), along with sorts, require all rows to arrive before they can begin processing them. They’re often called blocking, or stop and go operators because of this.

It might seem logical that the optimizer could do something similar when a blocking operation before a join produces zero rows.

Unfortunately, that’s not the case. Altering the queries slightly to force a merge join like below…

The plans end up sorting all over the place:

Crud.

And both queries appear to read fully from the Posts table, despite the sort operator after the Votes table producing 0 rows.

Posted up

Thanks for reading!

Menu
{"cart_token":"","hash":"","cart_data":""}