If you’ve been following along with this week’s posts on DATETRUNC and STRING_SPLIT, you’re probably going to think the answer is no, but bear with me. It’s Christmas week, right? The news can’t all be bad.
GREATEST and LEAST are kinda like MAX and MIN, but instead of taking multiple rows as input, they take multiple columns. For example:
1 2 |
SELECT GREATEST(1,2,3) AS TheGreatest, LEAST(1,2,3) AS TheLeastest; |
Produces 3 and 1. This actually has really useful real-world implications.
Let’s take the Stack Overflow database, and let’s say I want to find any posts (questions or answers) that had recent activity. This is surprisingly difficult because Posts has 2 date columns: LastEditDate, and LastActivityDate. You would think that LastActivityDate would be the most recent, but you would be incorrect – when posts are edited, the LastEditDate is set, but LastActivityDate is not.
So if I was looking for any Posts that were active – either via reader activity, or edits – in a date range, I used to have to build supporting indexes, and then write queries like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
/* Create supporting indexes */ CREATE INDEX LastActivityDate_LastEditDate ON dbo.Posts(LastActivityDate, LastEditDate); CREATE INDEX LastEditDate_LastActivityDate ON dbo.Posts(LastEditDate, LastActivityDate); GO /* The old way: */ SELECT TOP 200 * FROM dbo.Posts WHERE LastActivityDate >= '2018-05-27' OR LastEditDate >= '2018-05-27' ORDER BY Score DESC; /* The SQL Server 2022 way: */ SELECT TOP 200 * FROM dbo.Posts WHERE GREATEST(LastActivityDate, LastEditDate) >= '2018-05-27' ORDER BY Score DESC; |
I’m using 2018-05-27 because my copy of the Stack Overflow database’s last activity is in 2018-06-03. Depending on which version you’re using, if you’re trying to reproduce these results, pick a date that’s within the last week of activity
So, what’s better – the old way or the new way? Like your hips, the actual execution plans don’t lie:
The old way smokes the SQL Server 2022 way. I mean, it’s not even close. The old way splits up the work into two index seeks, one on LastActivityDate and one on LastEditDate. It finds the recent stuff, does the appropriate key lookups, and it’s done in one second.
The new way does a table scan and takes a minute and a half.
Its estimates and its index usage are just garbage, and I don’t mean like Shirley Manson.
Again – they’re not necessarily bad functions, but they have no business in the FROM clause and below. Don’t use them in WHERE, don’t use them in JOINS, just use them to construct output, variables, and strings.