A Visual Guide to Choosing an Index Type


Warning: I’m about to overly simplify a whole lot of topics to make things easy. Armchair architects, warm up your flamethrowers.

Your table has rows and columns kinda like a spreadsheet:

In most applications, your users care about all of the rows, and all of the columns. However, they put certain columns in the where clause more often than others, so you design indexing strategies around those. You may also get fancy with indexing for group by, order by, windowing functions, etc.

Their queries are vaguely predictable, and they don’t change too often, so you can design indexes every now and then, and you’re good.

That’s how normal tables work.

In some apps, your queries only care about a very specific set of rows.

They constantly – and I mean CONSTANTLY, like 99% of your queries – filter for a very specific set of rows, like under 5% of the table, and these rows are easily identified by specific values in a given column:

This is a great candidate for a filtered index – an index with a where clause.

Filtered indexes make the most sense when they’re highly selective. In the above example, if 99% of our rows had matched the filter we were looking for, then a filtered index isn’t usually going to dramatically improve performance.

In some apps, data is loaded and deleted in big groups.

The classic example is a big (say, 1TB+) sales table in a data warehouse where every row has a SaleDate:

Partitioning candidate

At first glance, you’d say, “Ah, this data is clearly grouped together! I should partition this data by SaleDate, and it will make my queries faster!”

In some cases, it does – but partitioned tables and partitioned views can often make queries slower rather than faster. If your query doesn’t filter by that partitioning column, SQL Server has to reassemble the rows from the different partitions before moving on to the other parts of your query – and this can involve some painful re-sorting depending on how your joins work.

Where partitioned tables and partitioned views make the most sense is where you need to load an entire partition at a time, or drop an entire partition at a time, in the fastest time possible.

In narrow tables, clustering key design is really important.

If your table only has a couple/few columns:

And if you always filter for equalities on just one or two fields, then you might be able to get away with just a clustered index and nothing else.

When your table is really wide, nonclustered index design becomes more important – and harder.

The more columns you decide to pack into a table:

The harder it is to design enough nonclustered indexes to support your queries – without simultaneously slowing down delete/update/insert operations to an unacceptable degree.

That’s where columnstore indexes can come in handy. If you have a table where you can’t possibly predict what people are going to query on, group by, and order by, and especially if they run a lot of running totals, then columnstore indexes can help.

All of the index types I just covered have huge drawbacks and implementation gotchas. This is just meant as a starting point for your index design journey. Start with regular nonclustered indexes, and then when you hit one of these unusual designs, you can start looking at more niche features.

Previous Post
Two Important Differences Between SQL Server and PostgreSQL
Next Post
Why Does My Select Query Have An Assert?

4 Comments. Leave new

Leave a Reply

Your email address will not be published.

Fill out this field
Fill out this field
Please enter a valid email address.