You know when you think you know something, and the obviousness of it makes you all the more confident that you know it?
That’s usually the first sign that there’s a giant gotcha waiting for you.
And that’s what happened to me over the weekend.
And after I answered it, I got to thinking… Would partitioning help? After all, smart people agree. Partitioning is ColumnStore’s friend.
So I started looking at it a little bit more differenter.
First, I had to set up a partitioning function and scheme. I used dynamic SQL to create the function because no way in Hades am I typing numbers from 1-999.
USE tempdb DROP TABLE IF EXISTS dbo.t1 --2016+ only CREATE TABLE t1 (Id INT NOT NULL, Amount INT NOT NULL) --TRUNCATE TABLE t1 SET NOCOUNT ON SET STATISTICS TIME, IO OFF DECLARE @sql NVARCHAR(MAX) = N'', @i INT = 0, @lf NCHAR(4) = NCHAR(13) + NCHAR(10) SET @sql = N'CREATE PARTITION FUNCTION Pfunc (INT) AS RANGE RIGHT FOR VALUES (' WHILE @i < 1000 BEGIN SET @sql += @lf + '(' + CAST(@i AS NVARCHAR) + N')' + N',' SET @i += 1 END SET @sql += @lf + '(' + CAST(@i AS NVARCHAR) + N')' + + N');'; EXEC sp_executesql @sql; PRINT @sql GO --I don't have a ton of filegroups, I just want everything on Primary CREATE PARTITION SCHEME Pyramid AS PARTITION Pfunc ALL TO ( [PRIMARY] );
Next, I loaded in a billion rows.
Yes, a billion.
And you know, it turns out generating a billion rows at once is terribly slow. So I wrote an awful loop to insert 1 million rows 1000 times.
DECLARE @c INT = 0 DECLARE @msg NVARCHAR(1000) = N'' WHILE @c < 1000 BEGIN ;WITH T (N) AS ( SELECT X.N FROM ( VALUES (NULL), (NULL), (NULL), (NULL), (NULL), (NULL), (NULL), (NULL), (NULL), (NULL), (NULL), (NULL) ) AS X (N) ), NUMS (N) AS ( SELECT TOP ( 1000000 ) ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) AS N FROM T AS T1, T AS T2, T AS T3, T AS T4, T AS T5, T AS T6, T AS T7, T AS T8, T AS T9, T AS T10,T AS T11,T AS T12) INSERT dbo.t1 WITH ( TABLOCK ) ( Id, Amount ) SELECT NUMS.N % 999 AS Id, NUMS.N % 9999 AS Amount FROM NUMS; SET @c += 1 SET @msg = 'Insert loop: ' + CONVERT(NVARCHAR(5), @c) + '.' RAISERROR(@msg, 0, 1) WITH NOWAIT END
Then, you know, I needed some indexes.
Lemme be straight with you here: don’t create the indexes first. After about the 4th loop, things grind to a halt.
This is why other smart people will tell you the fastest way to load data is into a HEAP.
Lemme be straight with you about something else: I couldn’t figure out how to just create the ColumnStore index on the partitioned table. I had to create a regular clustered index, and then the ColumnStore index over it with
CREATE CLUSTERED INDEX CX_WOAHMAMA ON dbo.t1 (Id) WITH (DATA_COMPRESSION = PAGE) ON Pyramid (Id) GO CREATE CLUSTERED COLUMNSTORE INDEX CX_WOAHMAMA ON dbo.t1 WITH (DROP_EXISTING = ON) ON Pyramid (Id) GO
So there I had it, my beautiful, billion-row table.
What could go wrong?
Let’s look at one query with a few variations.
SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total] FROM dbo.t1 AS t WHERE t.Id > 0 AND t.Id < 3;
The plan for it is alright. It’s fairly straightforward and the query finishes in about 170ms.
We can see from the graphical execution plan that it’s been Simple Parameterized. SQL Server does this to make plan caching more efficient.
With parameters in place rather than literals, the plan is accessible to more queries using the same predicate logic on the Id column.
We can also see the plan is Trivial. If you’re following along, hit F4 while the
SELECT operator is highlighted in the query plan , and a Properties pane should open up magically on the right side of the screen.
Well, I can change that! I know a trick! I learned it from Paul White!
If I add
1 = 1 somewhere in my
WHERE clause, I can get around Simple Parameterization and the Trivial plan.
SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total] FROM dbo.t1 AS t WHERE t.Id > 0 AND t.Id < 3 AND 1 = 1;
Heh heh heh, that’ll show you.
Oh but, no. The plan is still trivial, and it still runs in about 172ms.
So how do you beat the trivial plan?
You use yet another trick from Paul White.
— The rest of us
If you stick a goofy subquery in, the optimizer will regard it with contempt and suspicion, and give it a full pat down.
SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total] FROM dbo.t1 AS t WHERE t.Id > 0 AND t.Id < 3 AND 1 = (SELECT 1);
Here’s the plan for this query.
And, amazingly, it gets FULL optimization.
Why is this better?
Astute observers may have picked up that the plan changed a little bit. Earlier plans that got Trivial optimization used a Stream Aggregate operator, where the Full optimization plan uses a Hash Match Aggregate.
What’s the deal with that?
To make matters more interesting, the Full optimization query finishes in 11ms.
That’s down from 172ms for the Trivial plan.
What else was different?
There’s a really important difference in the two plans.
The Trivial plan was run in RowStore execution mode. This is the enemy of ColumnStore indexes.
It’s essentially using them the old fashioned way.
The plan that gets Full optimization runs in Batch execution mode, and this makes everyone very happy.
This is why sp_BlitzCache will warn you about both Trivial plans, and ColumnStore indexes being executed in RowStore mode.
Thanks for reading!