Query Plans: Trivial Optimization vs Simple Parameterization

Last Updated 7 years ago

Facemaking

You know when you think you know something, and the obviousness of it makes you all the more confident that you know it?

That’s usually the first sign that there’s a giant gotcha waiting for you.

And that’s what happened to me over the weekend.

The setup

I answered a question.

And after I answered it, I got to thinking… Would partitioning help? After all, smart people agree. Partitioning is ColumnStore’s friend.

So I started looking at it a little bit more differenter.

First, I had to set up a partitioning function and scheme. I used dynamic SQL to create the function because no way in Hades am I typing numbers from 1-999.

USE tempdb

DROP TABLE IF EXISTS dbo.t1 --2016+ only

CREATE TABLE t1	(Id INT NOT NULL, Amount INT NOT NULL)

--TRUNCATE TABLE t1

SET NOCOUNT ON 
SET STATISTICS TIME, IO OFF

DECLARE @sql NVARCHAR(MAX) = N'', @i INT = 0, @lf NCHAR(4) = NCHAR(13) + NCHAR(10)
  SET @sql = N'CREATE PARTITION FUNCTION Pfunc (INT) AS RANGE RIGHT FOR VALUES ('
    WHILE @i < 1000
    BEGIN
      SET @sql += @lf + '(' + CAST(@i AS NVARCHAR) + N')' + N','
      SET @i += 1
    END
  SET @sql += @lf + '(' + CAST(@i AS NVARCHAR) + N')' +  + N');';
    EXEC sp_executesql @sql;
    PRINT @sql
GO

--I don't have a ton of filegroups, I just want everything on Primary
CREATE PARTITION SCHEME Pyramid
    AS PARTITION Pfunc
    ALL TO  ( [PRIMARY] );

USE tempdb

DROP TABLE IF EXISTS dbo.t1 --2016+ only

CREATE TABLE t1 (Id INT NOT NULL, Amount INT NOT NULL)

--TRUNCATE TABLE t1

SET NOCOUNT ON

SET STATISTICS TIME, IO OFF

DECLARE @sql NVARCHAR(MAX) = N'', @i INT = 0, @lf NCHAR(4) = NCHAR(13) + NCHAR(10)

SET @sql = N'CREATE PARTITION FUNCTION Pfunc (INT) AS RANGE RIGHT FOR VALUES ('

WHILE @i < 1000

BEGIN

SET @sql += @lf + '(' + CAST(@i AS NVARCHAR) + N')' + N','

SET @i += 1

END

SET @sql += @lf + '(' + CAST(@i AS NVARCHAR) + N')' + + N');';

EXEC sp_executesql @sql;

PRINT @sql

--I don't have a ton of filegroups, I just want everything on Primary

CREATE PARTITION SCHEME Pyramid

AS PARTITION Pfunc

ALL TO ( [PRIMARY] );

Next, I loaded in a billion rows.

Yes, a billion.

And you know, it turns out generating a billion rows at once is terribly slow. So I wrote an awful loop to insert 1 million rows 1000 times.

DECLARE @c INT = 0
DECLARE @msg NVARCHAR(1000) = N''

WHILE @c < 1000

BEGIN

;WITH T (N)
AS ( SELECT X.N
     FROM ( 
      VALUES (NULL), (NULL), (NULL),
             (NULL), (NULL), (NULL),
             (NULL), (NULL), (NULL), 
             (NULL), (NULL), (NULL) ) AS X (N) 
           ), NUMS (N) AS ( 
            SELECT TOP ( 1000000 ) 
                    ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) AS N
            FROM   T AS T1, T AS T2, T AS T3, 
                   T AS T4, T AS T5, T AS T6, 
                   T AS T7, T AS T8, T AS T9, 
                   T AS T10,T AS T11,T AS T12)
INSERT dbo.t1 WITH ( TABLOCK ) (
    Id, Amount )
SELECT NUMS.N % 999 AS Id, NUMS.N % 9999 AS Amount
FROM   NUMS;

SET @c += 1

SET @msg = 'Insert loop: ' + CONVERT(NVARCHAR(5), @c) + '.'
RAISERROR(@msg, 0, 1) WITH NOWAIT 

END

DECLARE @c INT = 0

DECLARE @msg NVARCHAR(1000) = N''

WHILE @c < 1000

BEGIN

;WITH T (N)

AS ( SELECT X.N

FROM (

VALUES (NULL), (NULL), (NULL),

(NULL), (NULL), (NULL),

(NULL), (NULL), (NULL) ) AS X (N)

), NUMS (N) AS (

SELECT TOP ( 1000000 )

ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL )) AS N

FROM T AS T1, T AS T2, T AS T3,

T AS T4, T AS T5, T AS T6,

T AS T7, T AS T8, T AS T9,

T AS T10,T AS T11,T AS T12)

INSERT dbo.t1 WITH ( TABLOCK ) (

Id, Amount )

SELECT NUMS.N % 999 AS Id, NUMS.N % 9999 AS Amount

FROM NUMS;

SET @c += 1

SET @msg = 'Insert loop: ' + CONVERT(NVARCHAR(5), @c) + '.'

RAISERROR(@msg, 0, 1) WITH NOWAIT

END

Then, you know, I needed some indexes.

Lemme be straight with you here: don’t create the indexes first. After about the 4th loop, things grind to a halt.

This is why other smart people will tell you the fastest way to load data is into a HEAP.

Lemme be straight with you about something else: I couldn’t figure out how to just create the ColumnStore index on the partitioned table. I had to create a regular clustered index, and then the ColumnStore index over it with DROP_EXISTING.

CREATE CLUSTERED INDEX CX_WOAHMAMA ON dbo.t1 (Id) WITH (DATA_COMPRESSION = PAGE) ON Pyramid (Id) 
GO 

CREATE CLUSTERED COLUMNSTORE INDEX CX_WOAHMAMA ON dbo.t1 WITH (DROP_EXISTING = ON) ON Pyramid (Id)
GO

CREATE CLUSTERED INDEX CX_WOAHMAMA ON dbo.t1 (Id) WITH (DATA_COMPRESSION = PAGE) ON Pyramid (Id)

CREATE CLUSTERED COLUMNSTORE INDEX CX_WOAHMAMA ON dbo.t1 WITH (DROP_EXISTING = ON) ON Pyramid (Id)

So there I had it, my beautiful, billion-row table.

Partitioned, ColumnStore-d.

What could go wrong?

Plantastic

Let’s look at one query with a few variations.

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]
FROM   dbo.t1 AS t
WHERE  t.Id > 0
       AND t.Id < 3;

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]

FROM dbo.t1 AS t

WHERE t.Id > 0

AND t.Id < 3;

The plan for it is alright. It’s fairly straightforward and the query finishes in about 170ms.

We can see from the graphical execution plan that it’s been Simple Parameterized. SQL Server does this to make plan caching more efficient.

With parameters in place rather than literals, the plan is accessible to more queries using the same predicate logic on the Id column.

We can also see the plan is Trivial. If you’re following along, hit F4 while the SELECT operator is highlighted in the query plan , and a Properties pane should open up magically on the right side of the screen.

Well, I can change that! I know a trick! I learned it from Paul White!

If I add 1 = 1 somewhere in my WHERE clause, I can get around Simple Parameterization and the Trivial plan.

Right?

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]
FROM   dbo.t1 AS t
WHERE  t.Id > 0
       AND t.Id < 3
       AND 1 = 1;

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]

FROM dbo.t1 AS t

WHERE t.Id > 0

AND t.Id < 3

AND 1 = 1;

Heh heh heh, that’ll show you.

Stringy

Oh but, no. The plan is still trivial, and it still runs in about 172ms.

So how do you beat the trivial plan?

You use yet another trick from Paul White.

Dear Paul,

Thanks.

— The rest of us

If you stick a goofy subquery in, the optimizer will regard it with contempt and suspicion, and give it a full pat down.

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]
FROM   dbo.t1 AS t
WHERE  t.Id > 0
       AND t.Id < 3
       AND 1 = (SELECT 1);

SELECT COUNT(*) AS [Records], SUM(CONVERT(BIGINT, t.Amount)) AS [Total]

FROM dbo.t1 AS t

WHERE t.Id > 0

AND t.Id < 3

AND 1 = (SELECT 1);

Here’s the plan for this query.

Still not Simple!

And, amazingly, it gets FULL optimization.

Why is this better?

Astute observers may have picked up that the plan changed a little bit. Earlier plans that got Trivial optimization used a Stream Aggregate operator, where the Full optimization plan uses a Hash Match Aggregate.

What’s the deal with that?

To make matters more interesting, the Full optimization query finishes in 11ms.

That’s down from 172ms for the Trivial plan.

What else was different?

There’s a really important difference in the two plans.

The Trivial plan was run in RowStore execution mode. This is the enemy of ColumnStore indexes.

It’s essentially using them the old fashioned way.

The plan that gets Full optimization runs in Batch execution mode, and this makes everyone very happy.

This is why sp_BlitzCache will warn you about both Trivial plans, and ColumnStore indexes being executed in RowStore mode.

Because that’s why.

Thanks for reading!

New #SQLPASS Summit Pre-Con: Expert Performance Tuning for SQL Server 2016 & 2017

[Video] Office Hours 2017/06/14 (With Transcriptions)

4 Comments. Leave new

Emanuele Meazzo
June 16, 2017 8:46 am

Damn you trivial optimization for columnstore!
In which version of SQL Server did you test this? Seems like something that needs to be fixed by M$

Great post btw!

Reply
- Erik Darling
  June 16, 2017 9:30 am
  
  Hi there! Thanks! This would have been on 2016, though I don’t recall which CU of SP1 my server was up to at the time.
  
  Reply
  - Emanuele Meazzo
    June 16, 2017 10:31 am
    
    I just tried on the latest CU (13.0.4435.0) same results.
    This is definitely something to keep in mind if working with columnstore
    
    Reply
Chris Morris
January 7, 2019 8:29 am

“If you stick a goofy subquery in, the optimizer will regard it with contempt and suspicion, and give it a full pat down.”

This behavior appears to be quite erratic, or at least it is in SQL Server 2012:
— constants
SELECT COUNT_BIG(*) — 2.1 TRIVIAL, fast 10ms
FROM Orders ct
WHERE ct.OrderDate BETWEEN CAST(‘2017-12-01’ AS DATETIME) AND CAST(‘2017-12-31’ AS DATETIME)

— cross apply values
SELECT COUNT_BIG(*) — 2.2 TRIVIAL, fast 10ms
FROM Orders ct
CROSS APPLY (
VALUES (CAST(‘2017-12-01’ AS DATETIME), CAST(‘2017-12-31’ AS DATETIME))
) x (StartDate, EndDate)
WHERE ct.OrderDate BETWEEN x.StartDate AND x.EndDate

— cross apply select
SELECT COUNT_BIG(*) — 2.3 TRIVIAL, fast 10ms
FROM Orders ct
CROSS APPLY (
SELECT StartDate = ISNULL(CAST(‘2017-12-01’ AS DATETIME),GETDATE()), EndDate = ISNULL(CAST(‘2017-12-31’ AS DATETIME),GETDATE())
) x
WHERE ct.OrderDate BETWEEN x.StartDate AND x.EndDate

— “goofy subquery” constants
SELECT COUNT_BIG(*) — 2.4 FULL, slow 230ms
FROM Orders ct
WHERE ct.OrderDate BETWEEN (SELECT CAST(‘2017-12-01’ AS DATETIME)) AND (SELECT CAST(‘2017-12-31’ AS DATETIME))

If you were to follow this through with a partitioned table, the effects are even more unpredictable: the “full pat down” appears to be more likely to result in a disastrous plan:

SELECT COUNT_BIG(*) — Q9 — SUM(Amount)
FROM MySixtyBillionRowTableWith70Partitions ct
WHERE ct.TranDate BETWEEN (SELECT CAST(CAST(‘2017-12-01’ AS DATETIME) AS DATE)) AND (SELECT CAST(CAST(‘2017-12-31’ AS DATETIME) AS DATE))

Reply