SQL Server 2019
Introduces something pretty cool: the ability for row store indexes to be processed in batch mode. Batch mode is, of course, the preferred processing realm of column store indexes.
In a nutshell, it can be really helpful to reporting-style, and analytic queries, where large amounts of data are processed. Is processed?
There’s a Great Post® by Dima Pilugin that explains some of the things that the optimizer considers when making the choice to use batch mode for row store. Actually, all of Dima’s posts are great. He’s like the Russian Paul White. Or maybe Paul White is like the New Zealandish Dima Pilugin. I don’t know.
I’m just a bouncer, after all.
Why Batch Mode For Row Store?
Well, column store indexes are tricky. If you have a data warehouse where everything is one big shove to get data in, and then queries run, some of the trickiness is alleviated. But if you have data slowly trickling in and out via inserts, updates, deletes, and (forgive my language) merges, they can all play some nasty tricks on column store indexes. It’s sort of like how row store index fragmentation can impact large
read ahead scans, except, you know, it’s not just a problem that you saw in a demo where fill factor was set to 1%.
While we spend a lot of time having a giggle at The Cult Of Rebuild for row store indexes, column store indexes are a bit different. If data isn’t loaded in optimally, or it gets altered later, performance can really suffer. If you wanna learn a whole lot about that, and you’re going to PASS Summit, definitely attend this session. From, again, Joe Obbish. Funny how the same names keep popping up.
All of this makes adding column store indexes to existing OLTP applications tough. You probably can’t outright replace your current clustered indexes with clustered column store indexes, and adding nonclustered columnstore indexes can be confusing.
- Should I add some of the columns?
- All of the Columns?
- Does column order matter?
- What impact will this have on locking?
- How do I maintain them? (Ola’s scripts don’t support column store yet)
- Which queries will use my column store index?
- Will batch mode even help me?
- Will it cause problems for existing workload queries?
Now you can worry about that stuff a bit less.
Important Numbers and Opportunities
Dima notes in his post the hard coded value 131,072 as a built-in magical number (at least as of CTP 2 — who knows if this’ll change by another CTP, or RTM). That’s the minimum number of rows that a query will have to process before batch mode for row store will kick in, among other considerations.
You guessed it — most OLTP queries don’t touch that many rows, unless you’re running a report, or you’ve done the worst possible job indexing for your queries. There are also some query constructs that aren’t supported for batch mode, not all operators can run in batch mode, and some don’t run well in batch mode (I’m looking at you, Sort).
However, this opens up more queries to a couple new optimizer tricks introduced in SQL Server 2017: Adaptive Joins, and Memory Grant Feedback. Definitely hit the links for more details, but now you just might see the optimizer changing run-time join types based on row thresholds, and memory grants being altered between query executions.
This is definitely cool for people running reports, though much like the new Cardinality Estimator introduced in 2014, it’s not an across the board improvement. You’ll still have to compare queries to make sure batch mode is an improvement.
But what about other query types?
- What about stored procedures that do varying amounts of work?
- What about queries with implicit conversions, non-SARGable predicates, functions, etc?
- What about queries with good, bad, or no indexes?
- What about mistakes made during cardinality estimation?
In short, will batch mode for row store eat bad T-SQL for breakfast?