Logical Reads Aren’t Repeatable on Columnstore Indexes. (sigh)

Sometimes I really hate my job.

Forever now, FOREVER, it’s been a standard thing where I can say, “When you’re measuring storage performance during index and query tuning, you should always use logical reads, not physical reads, because logical reads are repeatable, and physical reads aren’t. Physical reads can change based on what’s in cache, what other queries are running at the time, your SQL Server edition, and whether you’re getting read-ahead reads. Logical reads just reflect exactly the number of pages read, no matter where the data came from (storage or cache), so as long as that number goes down, you’re doing a good job.”

To illustrate it, we’ll start with the large version of the Stack Overflow database, and count the number of rows in the Users table.

Statistics io output shows that the first execution has to read pages up from disk because they’re not in cache yet:

The first execution has 4 physical reads and 329,114 read-ahead reads. Those were all read up off disk, into memory. But the whole time, logical reads stays consistent, so it’s useful for measuring performance tuning efforts regardless of what’s in cache.

The same thing is true if we create a nonclustered rowstore index too:

Statistics io output shows physical reads & readahead reads on the first execution, but logical reads stays consistent throughout:

But with columnstore indexes on SQL Server 2017 & newer…

On SQL Server 2017 or newer (not 2016), create a nonclustered columnstore index:

And watch lob logical reads while we run it 3 times:

Lob logical reads shows 22,342 for the first execution, then 10,947 for the next two passes.

This isn’t true on SQL Server 2016, which produces the same logical read numbers every time the columnstore query runs, clean buffer pool or not. Just 2017 and newer.

Actual live Brent reaction to this issue

<sigh> This is why we can’t have nice things.

This is also one of those reasons why it’s so hard to teach training classes. Stuff changes inside the product, and then years later, a demo you wrote no longer produces exactly the same results. You have to try re-running the demo from scratch, thinking you just made a mistake, and then you have to narrow down the root cause, and then to do it right, you really need to check each prior version to understand when the thing changed, and Google trying to find out if anybody else shared this and you just didn’t read that particular post, and then update your own training and write a blog post so that nobody else gets screwed by the same undocumented change, which of course they will, because not everybody reads your blog posts.

You won’t, though, dear reader. At least I helped you out, hopefully. And that makes it all worthwhile. (Not really. I’m going to go have a shot of my office tequila, and it’s not even 10AM as I’m writing this.)

Previous Post
I’m Not Gonna Waste Time Debunking Crap on LinkedIn.

10 Comments. Leave new

  • Interesting. Do you happen to have a sense of why the number changes? I could speculate that there’s some sort of compression (maybe even async) going on when it’s storing them in the buffer pool but that seems possibly wasteful.

    Reply
    • Thomas Franz
      March 10, 2026 5:11 pm

      Its not the compression. I checked it on my server, all rowgroups were compressed before the first run.

      And one time I had the reduced read amount not at the second but the third execution – whyever.

      Reply
  • Loved the frowny face!

    Reply
  • Francesco Mantovani
    March 10, 2026 4:29 pm

    And this is why we read your blog.

    Reply
  • “Logical reads are always the same, doh.”

    Brent: “It depends…”

    Reply
  • Thomas Franz
    March 10, 2026 5:21 pm

    On the other hand I don’t really care about the lob logical reads on a columnstore, since it always depends on the data distribution and compression strategy (when you don’t create one with a fixed order, the SQL server tries to find the best compressable order depending on the existing / inserted data).

    I know, that
    – when a table has more than 100k rows
    – is rarely updated
    – often queried by aggregates (SUM(), MIN(), MAX(), AVG(), COUNT() …)
    – rarely queried with SELECT * (written as * or just a long list of columns)
    – usually not queried for specific keys (customer_id = 123) but more often queried for random columns, since the customer can filter for anything in the GUI

    a ColumnStore Index is usually the best index. In reality it is most time a compromiss, where not all the points above fits to the real workload, but hey – this is one of the reasons for my job 🙂

    Reply
  • Uhhhm… let me be that guy.

    If logical reads aren’t repeatable anymore when columnstore indexes are involved, what metric are we supposed to use to measure tuning success?

    Logical reads used to be a stable number we could rely on.

    So what’s the new yardstick?

    Query duration? CPU time? Or just fewer strange noises from the CPU fan?

    Just asking.

    Reply
  • Jeff Shervey
    March 10, 2026 5:47 pm

    I know… we should always try this ourselves.
    For us lazy DBA’s who don’t have “any” copy of the SO DB or the SO DB is on a home computer and I’m at work… are the CPU and run times in ms any different on the NC vs. CS index count query? LOBS have always burned me (expensive) in the past especially with Oracle. (IO, TIME on;)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.