When columnstore indexes first came out in SQL Server 2012, they didn’t get a lot of adoption. Adding a columnstore index made your entire table read-only. I often talk about how indexing is a tradeoff between fast reads and slow writes, but not a lot of folks can make their writes quite that slow.
Thankfully, Microsoft really poured a ton of work into improving columnstore indexes over time. The What’s New in Columnstore Indexes page of the SQL Server documentation keeps getting more “yes” answers over newer versions.
Today, with widespread SQL Server 2016 adoption, at least some of your databases could safely implement columnstore indexes to make reports go faster. But should you? And will columnstore indexes backfire and kill non-reporting-query performance? And will your data loading techniques come back to haunt you over time?
Take the 7-question test at ColumnScore.com to find out.
A survey? Why? What’s wrong with a single T-SQL script that does the same for you?
Eitan – if you’d like to publish a T-SQL script that does what that survey does, you’re welcome to. Otherwise, it might be seen as a little impolite to insult someone’s free work and suggest that they do something different, eh?
My job description is to be petty, Brent ?
Oh that question mark was supposed to be a smiley
Hahaha, I hate how that works. (sigh) But yes, as a data professional, I should give you full marks for that! Cheers.
Any chance you know whether/where one could find the scoring algorithm behind that page? I mean, I could try and reverse engineer it, but I got stuff to do, man
Eitan – can you email me at firstname.lastname@example.org? Thanks!
Is Brent Ozar Unlimited the force behind ColumnScore? The website has no info that would allow someone to know if it’s trustworthy or a humor site!
Curtis: are you saying that anything Brent Ozar Unlimited builds is trustworthy, or…
are you saying anything Brent Ozar Unlimited builds is a humor site?
Maybe I’m just looking for someone to be the fall guy
I’d argue columnstore indexes can be useful even if your data isn’t over 1 terabyte and over 1 billion records. I’d set the limits rather at 100 million rows and 100GB. Also, in the survey, there’s no option for 10-100 million rows.
Koen – each answer gets different numbers of points. 100 million rows and 100GB gets positive points too – just not as many points as the larger volumes.
Ah ok. I just saw a big red cross next to my answer and I completely started freaking out 😀
I HAD THE SAME REACTION, hahaha.
One more unintended benefits of creating even nonclustered columnstore is you start getting more batch mode operators in your execution plans (in 2016 and 2017 I believe). We’ve created nonclustered columnstore on some of the fact tables ranging from 50 M to 150 M rows. This got us improvements in more places than we were hoping for
Yes, but on the flip side, that also has unintended side effects. Your OLTP-style queries can go dramatically slower if they get batch mode where it’s not appropriate, especially in 2016-2017. I demo that in my Mastering Query Tuning class.
Ah, I hadn’t considered the OLTP scenario. Great point!
The ColumnScore quiz is a cool idea – but it’s a bit aggressive. I put in a handful of scenarios where Columnstore indexes were 100% a good idea, and consistently got Cs and Ds.
The biggest factors I’ve used in this decision are the size of the table and how it is read (ordered by some potentially partitionable key column, ie a date). For writes, I mostly care about there being zero update statements and little/no unordered deletion. I’ve found that actual storage size has impacted my decision making less than most other considerations. Rowgroup elimination occurs based on row counts, not gigabytes, and segment elimination cares less either way.
Weighting the questions a bit differently would help. The quiz may inadvertently tell someone to not use a columnstore index when it may be a superior solution to a classic OLTP solution. Future growth is also worth a callout as a table with 500mil rows that will add 500mil a year is essentially a 1bil row table in training : )
That was a long comment, but thanks for listening : )
Rowgroup elimination also flat out doesn’t work over time as indexes get rebuilt – but I’m sure you knew that from my Fundamentals of Columnstore class, right? 😉
A well-maintained columnstore index with no updates rarely requires rebuilds. There are some I’ve build that have never been rebuilt and are allowing for excellent rowgroup elimination, even will tens of billions of rows over many years. I have not taken your class on Columnstore indexes, but I have published a book on the topic, if that helps : )
Interesting! In my entire life, I don’t think I’ve ever actually seen a table with no updates. That’s awesome that you’ve been lucky enough to exclusively work with those, though!
I’d be curious if you think that out in the wider population, tables usually never have updates.
Building a CI without updates usually requires some creative architecture to avoid them. For example, staging data in temp tables to modify it rather than inserting and then going back later to update. This 100% is not always possible, but setting the bar there generally results in better design/architecture.
The challenge there isn’t rebuilds and general maintenance…it is data order.
Columnstore index rebuilds and reorgs do NOT fix/change data order, which is what rowgroup elimination relies on. For a clustered columnstore, fixing data order means setting aside some time occasionally to either reorder the existing table (with a clustered rowstore) or do a side-by-side swap. partition swapping can be used to cheat a bit if downtime must be minimized.
Depending on table size and usage patterns, this may very well be something that can be done annually or monthly and automated away to avoid the need for tedious manual work.
I do recommend replacing updates against columnstore indexes with the delete/insert combo. That obviously does not fix the data order issue as under the covers it is all the same, but it does execute faster, which is nice.
Right, yep, we’re restating the same kinds of things that are covered in class. At the risk of trying to teach the entire class in blog post comments, I’ll gracefully step aside here. Sounds like you’ve got things all under control!
Hi Brent, I took the quiz, but there were a couple of questions where I didn’t know how to answer. I’m planning a large fact table which will have approximately 90M rows per calendar month, partitioned by date into monthly partitions (202201,202202, etc.), and I plan on using either a SWITCH PARTITION or a TRUNCATE TABLE WITH (PARTITIONS(# TO #)) statement instead of a DELETE statement. So, on the test I chose “Never Update It” and “Never Delete it.” Additionally I may have to re-process up to 12 months of partitions daily due to post dated transactions where the accounting date is yesterday, but the transaction effective date is 12 months in the past. On the test everything came up with green check marks, but I’m wondering if the SWITCH or TRUNCATE statements, or reprocessing 12 partitions would have changed the test results. Thank you. Tim.
Sure, for fine grained questions like that, check out my Fundamentals of Columnstore class.
Thank you. I will check that out.
Deletions and truncate are interchangeable in terms of impact on statistics and indexes, including columnstore indexes. It should work just fine in the use case you describe.