For over a decade, SQL Server’s handling of table variables has been legendarily bad. I’ve long used this Stack Overflow query from Sam Saffron to illustrate terrible cardinality estimation:
declare @VoteStats table (PostId int, up int, down int)
up = sum(case when VoteTypeId = 2 then 1 else 0 end),
down = sum(case when VoteTypeId = 3 then 1 else 0 end)
where VoteTypeId in (2,3)
group by PostId
select top 100 p.Id as [Post Link] , up, down
join Posts p on PostId = p.Id
where down > (up * 0.5) and p.CommunityOwnedDate is null and p.ClosedDate is null
order by up desc
It puts a bunch of data into a table variable, and then queries that same table variable. On the small StackOverflow2010 database, it takes almost a full minute, and does almost a million logical reads. Here’s the plan:
See how the top query has a cost relative to the batch of 100%, and the second query says 0%? Yeah, that’s a vicious lie: it’s based on the estimated cost of the queries, even when you’re looking at an actual plan. The root cause is the number of rows SQL Server expects to find in the table variable in the second query – hover your mouse over that:
The estimated number of rows is only 1 – way low, because we just got done inserting 2.7 MILLION rows into it. As a result, SQL Server says, “I’ll just get that one row out of the table variable, do one matching index seek in the Posts table, and then I don’t have to bother sorting this data until later in the plan.”
In the past, we told people to fix it by replacing table variables with temp tables, or slapping an OPTION RECOMPILE on the relevant portions of a batch. However, both of those require changing the query – something that’s not always feasible.
Now, let’s try it in SQL Server 2019.
Same query, no changes, just running it in a database with compatibility level 150:
- Duration: before was 57 seconds, now 7 seconds with 2019
- Logical reads: before was ~1 million, now about 70k with 2019
Here’s the 2019 query plan:
Now, SQL Server 2019 estimates 800K rows will come out of the table variable – still not exactly accurate, but a hell of a lot more accurate than estimating 1 row. As a result, it decides to sort the contents of the table variables to find the top 100 rows FIRST before doing the seeks on the Posts table.
Suddenly, 2019 produces better table variable estimates out of the box. It’s all cake and no vegetables, right?
Now we have a new problem: parameter sniffing.
I’m going to create an index on Users.Location to help SQL Server understand how many rows will come back when we filter for a given location. Then, I’m going to create a stored procedure with 2 queries in it:
CREATE INDEX IX_Location ON dbo.Users(Location);
CREATE OR ALTER PROC dbo.usp_TableVariableTest @Location NVARCHAR(40) AS
DECLARE @NotATable TABLE (ID INT);
INSERT INTO @NotATable
WHERE Location = @Location;
FROM @NotATable t
INNER JOIN dbo.Users u ON t.ID = u.Id
ORDER BY u.DisplayName, u.Location, u.WebsiteUrl;
The first query loads the table variable with all of the users in a location.
The second query fetches the rows back out of that table variable, does a lookup, and sorts the data. This means SQL Server needs to estimate how much memory it needs to grant for the sort operation.
Let’s free the plan cache, and then call it for a really big location:
Here, I’m hovering my mouse over the table variable scan to show the estimated and actual number of rows – they’re absolutely perfect! This is awesome! That means SQL Server allocates enough memory to sort those 1,902 rows, too.
Now let’s run it for a really small location:
Just like every parameter sniffing issue from the dawn of time, SQL Server cached the execution plan for the first set of parameters that got sent in. Now, it’s estimating 1,902 rows – from India – every time the query runs. That’s not a big deal here where the memory over-estimation for this query isn’t large, but now let’s try it in the opposite order.
Free the plan cache, run it for My Backyard first, and SQL Server 2019 caches the execution plan for a 1-row estimate:
And then run it for India, and presto, the sort spills to disk:
Most of the time, SQL Server 2019’s table variable handling will be WAY faster.
Before, table variables generally produced bad plans ALL of the time. It was black and white. Performance just sucked.
Starting with SQL Server 2019, we stand a better chance of getting good plans at least some of the time, if not most of the time. We just have a new problem – parameter sniffing – and hey, you can learn how to fix that pretty easily. That seems like a pretty good tradeoff to me!
To learn more about this feature, check out Aaron Bertrand’s writeup on how it works in Azure SQL DB, and for other new goodies in 2019, check out what’s new (so far) in SQL Server 2019.