When most people think about computed columns, they don’t think about cardinality estimates.
Heck, I’m not sure most people think about cardinality estimates. At all. Ever.
One of the few people who has ever responded to my emails does, and I didn’t even have to threaten him.
Do you think about computed columns?
If you’re here, you might just be thinking about lunch, avoiding jail time, or how much you hate Access.
But seriously, they’re great for a lot of things. Just don’t put functions in them.
One thing they can help with, without you needing to persist or index them, is cardinality estimates.
If I run this query:
SELECT COUNT(*) AS records
FROM dbo.Users AS u
JOIN dbo.Posts AS p
ON p.OwnerUserId = u.Id
WHERE u.UpVotes + u.DownVotes = 0
AND p.Score >= 5;
I get this plan:
The first thing that jumps out to most people is the warning on the Hash Join — indeed, it’s grail overfloweth.
The root cause of this bad guess is from the Users table. Our WHERE clause is just awful. How on earth would SQL Server know how many Upvotes + Downvotes = 0? In fact, it’s guess is just plan goofy.The guess is off by about 470k rows. If your concern is with cardinality estimates, you may try the Advanced Dead End known as multi-column statistics.
CREATE STATISTICS users_upv_dv ON dbo.Users (UpVotes, DownVotes) WITH FULLSCAN;
You might even try Advanced-Advanced Dead End known as filtered multi-column statistics.
CREATE STATISTICS users_upv_dv ON dbo.Users (UpVotes, DownVotes) WHERE UpVotes = 0 AND DownVotes = 0 WITH FULLSCAN;
And that still won’t help.
Even if you RECOMPILE. Even if you free the proc cache. Even if you rebuild indexes. Even if you restart SQL.
Magically, I can add this computed column:
ALTER TABLE dbo.Users ADD UpvotesPlusDownvotes AS UpVotes + DownVotes;
I’m not persisting it, and I’m not even indexing it. But my plan changes! Kind of.
The Hash Join is no longer spilling!
Before you go accusing me of taking advantage of Batch Mode Memory Grant Feedback — I’m not using any cOLU MNstor3 indexes in here. Everything is plain old row mode.
It’s just a result of the cardinality estimate improving.
The much improved cardinality estimate leads the optimizer to ask for a larger memory grant.
Rather than a small, crappy memory grant (for this query).
Is this as good as it gets?
Obviously not. We could make some other changes to improve things, but I think this is pretty cool.
We didn’t have to change our query, even, for the optimizer to make this adjustment.
If you’ve got this kind of stuff in your queries, a computed column might be a good way to improve performance.
Thanks for reading!