Want to use columnstore indexes? Take the ColumnScore test.

By Brent Ozar · October 1, 2020 · 27 comments

When columnstore indexes first came out in SQL Server 2012, they didn’t get a lot of adoption. Adding a columnstore index made your entire table read-only. I often talk about how indexing is a tradeoff between fast reads and slow writes, but not a lot of folks can make their writes quite that slow.

Thankfully, Microsoft really poured a ton of work into improving columnstore indexes over time. The What’s New in Columnstore Indexes page of the SQL Server documentation keeps getting more “yes” answers over newer versions.

Today, with widespread SQL Server 2016 adoption, at least some of your databases could safely implement columnstore indexes to make reports go faster. But should you? And will columnstore indexes backfire and kill non-reporting-query performance? And will your data loading techniques come back to haunt you over time?

Take the 7-question test at ColumnScore.com to find out.

Free, 3× a week

Get my new posts by email

Three posts a week, plus a Monday roundup of the best database news from around the web.

27 comments

Eitan Blumin

October 1, 2020 at 9:08 am

A survey? Why? What’s wrong with a single T-SQL script that does the same for you?

Reply
1. Brent Ozar
  
  October 1, 2020 at 9:45 am
  
  Eitan – if you’d like to publish a T-SQL script that does what that survey does, you’re welcome to. Otherwise, it might be seen as a little impolite to insult someone’s free work and suggest that they do something different, eh?
  
  Reply
  1. Eitan Blumin
    
    October 1, 2020 at 10:23 am
    
    My job description is to be petty, Brent ?
    
    Reply
    1. Eitan Blumin
      
      October 1, 2020 at 10:28 am
      
      Oh that question mark was supposed to be a smiley
      
      Reply
      1. Brent Ozar
        
        October 1, 2020 at 10:56 am
        
        Hahaha, I hate how that works. (sigh) But yes, as a data professional, I should give you full marks for that! Cheers.
      2. Eitan Blumin
        
        October 1, 2020 at 11:02 am
        
        Thanks!
        Any chance you know whether/where one could find the scoring algorithm behind that page? I mean, I could try and reverse engineer it, but I got stuff to do, man
      3. Brent Ozar
        
        October 1, 2020 at 11:10 am
        
        Eitan – can you email me at help@brentozar.com? Thanks!
Curtis Browne VIP Student since 2020

October 1, 2020 at 1:22 pm

Is Brent Ozar Unlimited the force behind ColumnScore? The website has no info that would allow someone to know if it’s trustworthy or a humor site!

Reply
1. Brent Ozar
  
  October 1, 2020 at 2:12 pm
  
  Curtis: are you saying that anything Brent Ozar Unlimited builds is trustworthy, or…
  
  are you saying anything Brent Ozar Unlimited builds is a humor site?
  
  Reply
  1. Curtis Browne VIP Student since 2020
    
    October 1, 2020 at 3:31 pm
    
    Maybe I’m just looking for someone to be the fall guy
    
    Reply
Koen Verbeeck

October 2, 2020 at 3:48 am

I’d argue columnstore indexes can be useful even if your data isn’t over 1 terabyte and over 1 billion records. I’d set the limits rather at 100 million rows and 100GB. Also, in the survey, there’s no option for 10-100 million rows.

Reply
1. Brent Ozar
  
  October 2, 2020 at 3:53 am
  
  Koen – each answer gets different numbers of points. 100 million rows and 100GB gets positive points too – just not as many points as the larger volumes.
  
  Thanks!
  
  Reply
  1. Koen Verbeeck
    
    October 2, 2020 at 3:54 am
    
    Ah ok. I just saw a big red cross next to my answer and I completely started freaking out 😀
    
    Reply
    1. Brent Ozar
      
      October 2, 2020 at 3:55 am
      
      I HAD THE SAME REACTION, hahaha.
      
      Reply
Teja R Student since 2021

October 3, 2020 at 11:47 am

One more unintended benefits of creating even nonclustered columnstore is you start getting more batch mode operators in your execution plans (in 2016 and 2017 I believe). We’ve created nonclustered columnstore on some of the fact tables ranging from 50 M to 150 M rows. This got us improvements in more places than we were hoping for

Reply
1. Brent Ozar
  
  October 3, 2020 at 2:59 pm
  
  Yes, but on the flip side, that also has unintended side effects. Your OLTP-style queries can go dramatically slower if they get batch mode where it’s not appropriate, especially in 2016-2017. I demo that in my Mastering Query Tuning class.
  
  Reply
  1. Teja R Student since 2021
    
    October 4, 2020 at 7:16 am
    
    Ah, I hadn’t considered the OLTP scenario. Great point!
    
    Reply
Ed Pollack

April 29, 2021 at 12:22 pm

The ColumnScore quiz is a cool idea – but it’s a bit aggressive. I put in a handful of scenarios where Columnstore indexes were 100% a good idea, and consistently got Cs and Ds.

The biggest factors I’ve used in this decision are the size of the table and how it is read (ordered by some potentially partitionable key column, ie a date). For writes, I mostly care about there being zero update statements and little/no unordered deletion. I’ve found that actual storage size has impacted my decision making less than most other considerations. Rowgroup elimination occurs based on row counts, not gigabytes, and segment elimination cares less either way.

Weighting the questions a bit differently would help. The quiz may inadvertently tell someone to not use a columnstore index when it may be a superior solution to a classic OLTP solution. Future growth is also worth a callout as a table with 500mil rows that will add 500mil a year is essentially a 1bil row table in training : )

That was a long comment, but thanks for listening : )

Reply
1. Brent Ozar
  
  April 29, 2021 at 12:24 pm
  
  Rowgroup elimination also flat out doesn’t work over time as indexes get rebuilt – but I’m sure you knew that from my Fundamentals of Columnstore class, right? 😉
  
  Reply
  1. Ed
    
    May 16, 2022 at 12:41 pm
    
    A well-maintained columnstore index with no updates rarely requires rebuilds. There are some I’ve build that have never been rebuilt and are allowing for excellent rowgroup elimination, even will tens of billions of rows over many years. I have not taken your class on Columnstore indexes, but I have published a book on the topic, if that helps : )
    
    Reply
    1. Brent Ozar
      
      May 16, 2022 at 12:43 pm
      
      Interesting! In my entire life, I don’t think I’ve ever actually seen a table with no updates. That’s awesome that you’ve been lucky enough to exclusively work with those, though!
      
      I’d be curious if you think that out in the wider population, tables usually never have updates.
      
      Reply
      1. Ed
        
        May 16, 2022 at 12:51 pm
        
        Building a CI without updates usually requires some creative architecture to avoid them. For example, staging data in temp tables to modify it rather than inserting and then going back later to update. This 100% is not always possible, but setting the bar there generally results in better design/architecture.
        
        The challenge there isn’t rebuilds and general maintenance…it is data order.
        
        Columnstore index rebuilds and reorgs do NOT fix/change data order, which is what rowgroup elimination relies on. For a clustered columnstore, fixing data order means setting aside some time occasionally to either reorder the existing table (with a clustered rowstore) or do a side-by-side swap. partition swapping can be used to cheat a bit if downtime must be minimized.
        
        Depending on table size and usage patterns, this may very well be something that can be done annually or monthly and automated away to avoid the need for tedious manual work.
        
        I do recommend replacing updates against columnstore indexes with the delete/insert combo. That obviously does not fix the data order issue as under the covers it is all the same, but it does execute faster, which is nice.
      2. Brent Ozar
        
        May 16, 2022 at 12:55 pm
        
        Right, yep, we’re restating the same kinds of things that are covered in class. At the risk of trying to teach the entire class in blog post comments, I’ll gracefully step aside here. Sounds like you’ve got things all under control!
Tim

May 14, 2022 at 9:05 pm

Hi Brent, I took the quiz, but there were a couple of questions where I didn’t know how to answer. I’m planning a large fact table which will have approximately 90M rows per calendar month, partitioned by date into monthly partitions (202201,202202, etc.), and I plan on using either a SWITCH PARTITION or a TRUNCATE TABLE WITH (PARTITIONS(# TO #)) statement instead of a DELETE statement. So, on the test I chose “Never Update It” and “Never Delete it.” Additionally I may have to re-process up to 12 months of partitions daily due to post dated transactions where the accounting date is yesterday, but the transaction effective date is 12 months in the past. On the test everything came up with green check marks, but I’m wondering if the SWITCH or TRUNCATE statements, or reprocessing 12 partitions would have changed the test results. Thank you. Tim.

Reply
1. Brent Ozar
  
  May 14, 2022 at 9:07 pm
  
  Sure, for fine grained questions like that, check out my Fundamentals of Columnstore class.
  
  Reply
  1. Tim
    
    May 16, 2022 at 1:09 pm
    
    Thank you. I will check that out.
    
    Reply
2. Eitan Blumin
  
  May 15, 2022 at 4:00 am
  
  Hi Tim.
  Deletions and truncate are interchangeable in terms of impact on statistics and indexes, including columnstore indexes. It should work just fine in the use case you describe.
  
  Reply

Want to use columnstore indexes? Take the ColumnScore test.

Get my new posts by email

Keep digging

27 comments

Leave a comment Cancel reply