This week, I’m sharing some of my favorite papers that I’ve read. Sometimes they’re about future technologies that haven’t shipped yet – and may never ship! Sometimes, like this one, they’re not from Microsoft at all, but from someone else in the industry who’s explaining a problem that we face in SQL Server.
SQL Server 2012 introduced non-clustered columnstore indexes, but they didn’t catch on much because they forced the entire table to become read-only. Thankfully, Microsoft kept investing in columnstore indexes, and by SQL Server 2016-2017, they became pretty doggone good defaults for data warehouses.
If you’re a performance tuning internals geek who’s just starting your columnstore learning journey today, start with this: PDF: Query Execution in Column-Oriented Database Systems by Daniel J. Abadi.
It’s a ~130-page monster from 2008, before SQL Server’s implementation shipped (but probably around the same time Microsoft was designing it.) You don’t really need Microsoft specifics in here – this white paper is just a thought-provoking foundation that covers:
- The differences between rowstore and columnstore indexes
- How query execution is different with columnstore indexes
- How performance can be better if you can preserve batch mode longer through the query plan
- When data should be turned from columns into rows, like for joins
- How Abadi implemented columnstore himself to understand the implementation and tradeoffs
If you understand these concepts, you can transfer this knowledge pretty well over to SQL Server’s columnstore indexes and execution plans.
It’s not a light read, but it doesn’t require familiarity with columnstore indexes at all. Abadi takes you from zero to hero in here, but you’re probably not going to knock it out in one sitting. I certainly didn’t – it’s been on my iPad for a couple of years, and I’ve repeatedly revisited it, learning new things each time.
Folks who liked this white paper will also like these: