ColumnStore Indexes And Recursive CTEs

Last Updated July 26, 2017

Columnstore Indexes, Indexing, SQL Server

Alone Together

When I think about SQL Server features, I often picture a high school cafeteria.

AGs are making fun of Mirroring, index rebuilds are walking around taking everyone’s lunch, dta is making a glue sandwich, and no one knows who Service Broker is even though they’ve been going to school together since Kindergarten.

At the artsy table are two oft-misunderstood and unpopular features: ColumnStore indexes and Recursive CTEs. The oldest person on Stack Overflow, Joe Obbish, recently blogged about a deficiency in ColumnStore indexes when it came to doing string aggregation. When someone says “X is bad at Y”, my first reaction is generally to try to find other things X is bad at, and kicking X while it’s down.

Lace Up Your Boots

I know what you’re thinking: WHY WOULD YOU EVER DO THIS?

Well, ladies and gentlemen, that’s what consultants specialize in.

Stuff no one else would ever do.

There’s a really interesting relationship in the Stack Overflow database both within the Posts table, and connecting the User and Comment tables. You can build up pretty interesting hierarchies with them.

The Posts table has both questions and answers in it. You can tell them apart using the PostTypeId column, and you can also look at the ParentId column. If a post is an answer to a question, it’ll have the Id of the question populated.

Likewise, the Comments table connects comments to posts and answers, and both tables track the Ids of users. This comes in Handy for grabbing DisplayName from the Users table.

Having spent an amount of time only a consultant can looking at stuff in the database, I know that post 184618 currently (2016-03 data dump) has the most answers in the Posts table, at 518. So that’ll be our performance bar.

Gangland

Let’s start by indexing and querying rowstore tables.

Here are our awesome nonclustered indexes.

CREATE UNIQUE NONCLUSTERED INDEX ix_HierarychyHelper ON dbo.Posts (ParentId, Id, OwnerUserId);

CREATE UNIQUE NONCLUSTERED INDEX ix_HierarychyHelper ON dbo.Comments (PostId, UserId, Id);

CREATE UNIQUE NONCLUSTERED INDEX ix_HierarychyHelper ON dbo.Users (Id) INCLUDE (DisplayName);

CREATE UNIQUE NONCLUSTERED INDEX ix_HierarychyHelper ON dbo.Posts (ParentId, Id, OwnerUserId);

CREATE UNIQUE NONCLUSTERED INDEX ix_HierarychyHelper ON dbo.Comments (PostId, UserId, Id);

CREATE UNIQUE NONCLUSTERED INDEX ix_HierarychyHelper ON dbo.Users (Id) INCLUDE (DisplayName);

And here’s our extra fun recursive query.

WITH postparent AS 
  (
  SELECT p.Id, p.ParentId, p.OwnerUserId , hierarchyid::GetRoot() AS PostPath
  FROM dbo.Posts AS p
  WHERE p.Id = 184618

    UNION ALL

  SELECT p2.Id, p2.ParentId, p2.OwnerUserId, TRY_CONVERT(HIERARCHYID, pp.PostPath.ToString() + CONVERT(VARCHAR(11), p2.Id) + '/')
  FROM postparent pp
  JOIN dbo.Posts AS p2
  ON pp.Id = p2.ParentId

    UNION ALL

  SELECT c.Id, c.PostId, c.UserId , TRY_CONVERT(HIERARCHYID, pp.PostPath.ToString() + CONVERT(VARCHAR(11), c.Id) + '/')
  FROM postparent pp
  JOIN dbo.Comments AS c
  ON pp.Id = c.PostId
  AND pp.OwnerUserId = c.UserId

  )
  SELECT pp.Id, pp.ParentId, pp.OwnerUserId, u.DisplayName, pp.PostPath, pp.PostPath.GetAncestor(1) AS PostPathAncestor, pp.PostPath.GetLevel() AS PostPathLevel
  FROM postparent pp
  JOIN dbo.Users AS u
  ON u.Id = pp.OwnerUserId
  ORDER BY PostPathLevel
  OPTION (MAXRECURSION 0);

WITH postparent AS

(

SELECT p.Id, p.ParentId, p.OwnerUserId , hierarchyid::GetRoot() AS PostPath

FROM dbo.Posts AS p

WHERE p.Id = 184618

UNION ALL

SELECT p2.Id, p2.ParentId, p2.OwnerUserId, TRY_CONVERT(HIERARCHYID, pp.PostPath.ToString() + CONVERT(VARCHAR(11), p2.Id) + '/')

FROM postparent pp

JOIN dbo.Posts AS p2

ON pp.Id = p2.ParentId

UNION ALL

SELECT c.Id, c.PostId, c.UserId , TRY_CONVERT(HIERARCHYID, pp.PostPath.ToString() + CONVERT(VARCHAR(11), c.Id) + '/')

FROM postparent pp

JOIN dbo.Comments AS c

ON pp.Id = c.PostId

AND pp.OwnerUserId = c.UserId

)

SELECT pp.Id, pp.ParentId, pp.OwnerUserId, u.DisplayName, pp.PostPath, pp.PostPath.GetAncestor(1) AS PostPathAncestor, pp.PostPath.GetLevel() AS PostPathLevel

FROM postparent pp

JOIN dbo.Users AS u

ON u.Id = pp.OwnerUserId

ORDER BY PostPathLevel

OPTION (MAXRECURSION 0);

With my indexes in place, here are my stats Time and IO results. I’ve lopped off all the zero-read operations. for readability.

Table 'Worktable'. Scan count 2, logical reads 3194
Table 'Users'. Scan count 0, logical reads 1926
Table 'Comments'. Scan count 642, logical reads 1930
Table 'Posts'. Scan count 642, logical reads 1933

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 24 ms.

Table 'Worktable'. Scan count 2, logical reads 3194

Table 'Users'. Scan count 0, logical reads 1926

Table 'Comments'. Scan count 642, logical reads 1930

Table 'Posts'. Scan count 642, logical reads 1933

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 24 ms.

I even have what most would consider a reasonable execution plan for such a query.

Methinks the lady doth compute too many scalars.

Inflection Point

With that in mind, let’s hop on over to my ColumnStore version of the Stack Overflow data dump. All of the tables here either have a clustered ColumnStore index on them, or a nonclustered one if they have columns with datatypes that make that impossible (Users and Posts both have columns that are MAX types, which makes CS puke).

Running the same query, here are the results of stats Time and IO. What you’re looking at are the results of the second run with a warm cache. The first run took about 4 minutes and did a bunch of physical reads, and that didn’t seem fair.

Table 'Worktable'. Scan count 1286, logical reads 283557113
Table 'Users'. Scan count 0, logical reads 1975
Table 'Comments'. Scan count 1, lob read-ahead reads 105411.
Table 'Comments'. Segment reads 377, segment skipped 0.
Table 'Posts'. Scan count 2, logical reads 4, lob logical reads 77526, lob physical reads 2, lob read-ahead reads 118902.
Table 'Posts'. Segment reads 48, segment skipped 0.

 SQL Server Execution Times:
   CPU time = 197515 ms,  elapsed time = 197675 ms.

Table 'Worktable'. Scan count 1286, logical reads 283557113

Table 'Users'. Scan count 0, logical reads 1975

Table 'Comments'. Scan count 1, lob read-ahead reads 105411.

Table 'Comments'. Segment reads 377, segment skipped 0.

Table 'Posts'. Scan count 2, logical reads 4, lob logical reads 77526, lob physical reads 2, lob read-ahead reads 118902.

Table 'Posts'. Segment reads 48, segment skipped 0.

SQL Server Execution Times:

CPU time = 197515 ms, elapsed time = 197675 ms.

If you’re wondering what those milliseconds are in minutes, it’s 3:17. So uh, there’s our first problem.

And here’s the query plan, which bears some interpretive dance. It looks a whole heck of a lot like the previous plan. Prior plan? One of those. Except two index spools are introduced.

Funny things you should know about Index Spools: They’re single threaded, and I’m going to go out on a limb and say SQL doesn’t build ColumnStore Index Spools. They cache data in tempdb. When they’re Eager, they cache all the rows coming to them, and when they’re Lazy they only cache rows as required. This makes Lazy Spools much more benign than Eager Spools (there’s a Lazy Spool in both of the query plans).

They also make up the majority of the work our query is doing here in ColumnStore land. There are other obvious problems with it, like it running in Row mode rather than Batch mode. I stuck the query plan in Paste The Plan for anyone interested in poking around more.

I’m not getting any prettier, here

Stuff like this is part of the reason why we’ve started warning about Index Spools in sp_BlitzCache and sp_BlitzQueryStore. We started warning about ColumnStore indexes operating in Row Mode a while back because it’s basically a death knell for performance.

This post should have a point, right? ColumnStore! Not good at everything. Pretty spiffy for aggregations, but there are plenty of times when traditional row store indexes will out-perform them.

For more information about Spools, check out these links:

Fabiano Amorim: Part 1 and Part 2

Paul White: Part 1, Part 2, Part 3

Rob Farley: Part 1

A most excellent dba.stackexchange.com question, in which Joe Obbish gets prescribed some Spool Softener

Thanks for reading!

Other People’s Blog Posts I Talk About the Most

SQL Server 2017: Less xp_cmdshell?

7 Comments. Leave new

BigPhil
July 27, 2017 11:37 am

Adding OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) is useful when you want to compare performance of your queries on a table with a NC columnstore index.

Reply
- Erik Darling
  July 27, 2017 11:39 am
  
  Yeah, unfortunately there’s not a similar hint for clustered ColumnStore indexes.
  
  Reply
mmiike
July 27, 2017 11:50 am

“AGs are making fun of Mirroring, index rebuilds are walking around taking everyone’s lunch, dta is making a glue sandwich, and no one knows who Service Broker is even though they’ve been going to school together since Kindergarten.”

this is quite possibly the best sentence ever to appear on this blog. tears of joy, sir. tears of joy.

plus, i now have a solution to a problem which plagues me on a near-daily basis: next time i can’t put a name to a face it’s: “Hey hey, Service Broker! Long time no see!”

Reply
- Erik Darling
  July 27, 2017 12:06 pm
  
  ? I feel glad when you’re glad ??
  
  Reply
  - Erik Darling
    July 27, 2017 12:07 pm
    
    Those musical notes really didn’t pan out here.
    
    Reply
Dale Hirt
July 28, 2017 4:08 pm

I was just working on this very issue this week. It helped me identify what was going on with a recursive CTE that was utterly failing for me. Thanks Erik. The timing was perfect.

Reply
- Erik Darling
  July 28, 2017 4:12 pm
  
  Woohoo! Glad to hear it! I knew of these would finally pay off.
  
  Reply

Get Free SQL Stuff

"*" indicates required fields

Company

This field is for validation purposes and should be left unchanged.

Name*

First Last

Email*

Things I want*

First Responder Kit (Scripts and Guides)

Monday Links (SQL & tech news)

Blog posts (2-4 per week)

ColumnStore Indexes And Recursive CTEs

Alone Together

Lace Up Your Boots

Gangland

Inflection Point

I’m not getting any prettier, here

Related

7 Comments. Leave new

Leave a Reply Cancel reply

Hi! I’m Brent Ozar.

Get Free SQL Stuff