Warning: I’m about to overly simplify a whole lot of topics to make things easy. Armchair architects, warm up your flamethrowers.
Your table has rows and columns kinda like a spreadsheet:
In most applications, your users care about all of the rows, and all of the columns. However, they put certain columns in the where clause more often than others, so you design indexing strategies around those. You may also get fancy with indexing for group by, order by, windowing functions, etc.
Their queries are vaguely predictable, and they don’t change too often, so you can design indexes every now and then, and you’re good.
That’s how normal tables work.
In some apps, your queries only care about a very specific set of rows.
They constantly – and I mean CONSTANTLY, like 99% of your queries – filter for a very specific set of rows, like under 5% of the table, and these rows are easily identified by specific values in a given column:
This is a great candidate for a filtered index – an index with a where clause.
Filtered indexes make the most sense when they’re highly selective. In the above example, if 99% of our rows had matched the filter we were looking for, then a filtered index isn’t usually going to dramatically improve performance.
In some apps, data is loaded and deleted in big groups.
The classic example is a big (say, 1TB+) sales table in a data warehouse where every row has a SaleDate:
At first glance, you’d say, “Ah, this data is clearly grouped together! I should partition this data by SaleDate, and it will make my queries faster!”
In some cases, it does – but partitioned tables and partitioned views can often make queries slower rather than faster. If your query doesn’t filter by that partitioning column, SQL Server has to reassemble the rows from the different partitions before moving on to the other parts of your query – and this can involve some painful re-sorting depending on how your joins work.
Where partitioned tables and partitioned views make the most sense is where you need to load an entire partition at a time, or drop an entire partition at a time, in the fastest time possible.
In narrow tables, clustering key design is really important.
If your table only has a couple/few columns:
And if you always filter for equalities on just one or two fields, then you might be able to get away with just a clustered index and nothing else.
When your table is really wide, nonclustered index design becomes more important – and harder.
The more columns you decide to pack into a table:
The harder it is to design enough nonclustered indexes to support your queries – without simultaneously slowing down delete/update/insert operations to an unacceptable degree.
That’s where columnstore indexes can come in handy. If you have a table where you can’t possibly predict what people are going to query on, group by, and order by, and especially if they run a lot of running totals, then columnstore indexes can help.
All of the index types I just covered have huge drawbacks and implementation gotchas. This is just meant as a starting point for your index design journey. Start with regular nonclustered indexes, and then when you hit one of these unusual designs, you can start looking at more niche features.
6 Comments. Leave new
If you tend to query millions of rows of data by a single column (e.g., date), a combination of Clustered Columnstore indexes and table partitions can result in substantial query performance improvements (albeit with poor OLTP performance). This technique is officially documented ( https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-design-guidance#use-table-partitions-for-data-management-and-query-performance ).
You can attain similar benefits by intentionally generating the columnstore data in an order which lends itself well to segment elimination. Personally, this technique strikes me as a bit too reliant on implementation details. However, Microsoft has contemplated official support for this (e.g., https://feedback.azure.com/forums/908035-sql-server/suggestions/32896345 ).
Thanks for the explanation, it was awesome.
I thought clustered indexes don’t perform that well and can slow things down if there are a lot of updates or deletes to a table, with the updates/deletes generally taking longer than a wide table with a number of non-clustered indexes.
Scott – you are gonna love this free course: https://www.brentozar.com/training/think-like-sql-server-engine/
It would be awesome to give a detailed description with visualisations of how columnstore indexes work. I’ve never seen one and I suspect that’s why many people don’t understand how they work and therefore what they are good for.
Psst: click Training at the top of the site.