How to Make SELECT COUNT(*) Queries Crazy Fast

When you run a SELECT COUNT(*), the speed of the results depends a lot on the structure & settings of the database. Let’s do an exploration of the Votes table in the Stack Overflow database, specifically the 2018-06 ~300GB version where the Votes table has 150,784,380 rows taking up ~5.3GB of space.

I’m going to measure each method 3 ways:

  • How many pages it reads (gauged with SET STATISTICS IO ON)
  • How much CPU time it uses (gauged with SET STATISTICS TIME ON)
  • How fast it runs

Don’t obsess over small differences between the operations – I’m writing this blog post fast & furious to show you the big-picture differences, and to show you how my thought process works when comparing the different operations. In your own environment, for the tables you’re trying to count and the hardware you’re using and the version you’re on and the phase of the moon, you’re going to get different results, and that’s fine. There are also other ways to measure these methods depending on your own performance requirements: memory grants, ability to run without blocking, and even the accuracy of the results under concurrency. For the sake of these tests, I’m not going to talk about isolation levels or blocking.

I’m running these tests on SQL Server 2019 (15.0.2070.41) on an 8-core VM with 64GB RAM.

1: Plain ol’ COUNT(*) with only a clustered rowstore index, compatibility level 2017 & prior

The Votes table is only ~5.3GB, so I’m able to cache the whole thing in my SQL Server. Even after the query runs the first time and the data’s cached in RAM, this still ain’t fast:

  • Pages read: 694,389
  • CPU time: 14.7 seconds of CPU time
  • Duration: 2 seconds

2: Compatibility level 2019 (batch mode on rowstore indexes)

SQL Server 2019 introduces batch mode operations on rowstore indexes, previously only available on columnstore indexes. The payoff here is pretty awesome, even though we’re still dealing with just the rowstore index:

  • Pages read: 694,379
  • CPU time: 5.2 seconds
  • Duration: 0.7 seconds

Presto change-o – CPU just instantly drops thanks to batch mode. This isn’t obvious in the execution plans until you start hovering your mouse over individual operators:

Batch mode is a great fit for a lot of reporting-style queries doing aggregates over a lot of data.

3: Add nonclustered rowstore indexes, but use 2017 & prior’s row mode

I’m going to create an index on each of the 5 columns of the Users table, and then compare their sizes with sp_BlitzIndex:

Check out the number of rows in each index versus its size. When SQL Server needs to count the number of rows in the table, it’s smart enough to look at which object is the smallest, and then use that one for the count. Indexes can have varying sizes depending on the datatypes of the contents they’re indexing, the size of each row’s contents, the number of nulls, etc:

I’ll go back to 2017 compat level (removing batch mode operations) and then run the count:

SQL Server chooses to use the BountyAmount index, one of the smaller 2GB ones:

Which pays off in reading less pages, but we’re still performing the same count of 150M rows, so the CPU time & duration don’t really change:

  • Pages read: 263,322
  • CPU time: 14.8 seconds
  • Duration: 2 seconds

If you want lower CPU time & duration, you really need to approach the count differently – and that’s where batch mode operation helps.

4: 2019’s batch mode with nonclustered rowstore indexes

So now let’s try batch mode operation with the indexes in place:

It still uses the BountyAmount index and does the same number of reads as #3, but we get the lower CPU time & duration from step #2:

  • Pages read: 694,379
  • CPU time: 4.3 seconds
  • Duration: 0.6 seconds

So far, that’s the winner. But remember that batch mode originally went live with columnstore indexes, which are awesome tools for reporting-style queries….

5: Nonclustered columnstore index with batch mode

I’m purposely running in 2017 compat mode here because I want to make it clear where the awesomeness is:

The execution plan has our fancypants new columnstore index scan operator, and all of the operators in the plan are in batch mode:

I have to change my units of measure here:

  • Pages read: 73,922
  • CPU time: 15 milliseconds
  • Duration: 21 milliseconds

Hubba hubba. Let’s put that in perspective: I know some developers who try to hit system tables in order to count rows quickly, and they can’t even generate speedy results like this.

So to make SELECT COUNT(*) queries fast, here’s what to do:

In descending order of preference & speed, with the best results first:

  1. Get on SQL Server 2017 or newer, and put a columnstore index on the table.
  2. Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have. To learn more about the specifics, read Niko’s series on columnstore indexes, specifically the posts with the word “batch” in the title.
  3. Get on SQL Server 2019 or newer, and put your database in compat level 150 (2019) – even with rowstore indexes, you can still cut your CPU usage dramatically thanks to batch mode on rowstore. This one’s really easy to do – it probably requires the least changes to your application and database schema – but it’s just that you won’t have the amazingly millisecond-fast responses that a columnstore index can get you.

Here are the demo scripts if you wanna play around with ’em, like adapting ’em to work on your own tables.

Previous Post
My Home Office Studio Hardware: Winter 2019 Edition
Next Post
Your Ten Favorite Blog Posts from 2019

27 Comments. Leave new

  • Great article,

    I check few weeks ago for some ways to do counts quickly on SQL 2014 (to monitor a large table status and found this article https://blogs.msdn.microsoft.com/martijnh/2010/07/15/sql-serverhow-to-quickly-retrieve-accurate-row-count-for-table/ , though is limitation is that I cannot do filtering, but was good enough to know the general count.

    Now that I know this , I am adding NONCLUSTERED COLUMNSTORE INDEX to my tool arsenal. thanks

    Reply
  • How wrong is sysindexes.rows ?
    It should be kinda fast.

    Reply
    • I’m not a fan of having users directly query system tables where accuracy, parallelism, and locking isn’t clear.

      Reply
      • Do you see a lot of users doing row count checks? I am tired atm so maybe I am missing an extrapolation to real world stuff. In my life the only people that care about row count much are dbas who are okay with ‘roughly how many rows this week compared to last’. The topic looks interesting and I remember mentally bookmarking columnstore indexing a few years back but it didn’t ever really crop up as a useful solution and now I am wondering if I missed something important…

        Reply
        • Simon – yep, thus the post. This stemmed from a client use case. It was a really neat one, but unfortunately due to those pesky NDAs, I can’t elaborate on what their app was doing. I try to share as much as I can here, though.

          Reply
  • How does this compare to: select count(1) from Votes ?

    Reply
  • The speed of columnstore is unparalleled, but feels like index creation time has to be included for a fair comparison.

    Reply
    • Yes, because every time you run a query, you also need to create a new index from scratch.

      Wait, what? 😉

      Reply
      • You’re assuming that there’s a recurring need to get a fast rowcount if you don’t include the index creation time – though to be fair, that’s probably the primary reason to care about count performance. Even then, I’d probably want to solve this differently through a recurring background job than take on the write overhead of a columnstore index just to get sub-100ms rowcounts.

        Given the tradeoff, compatiblity level 150 is the best option if SQL 2019 or Azure SQLDB is available.

        Reply
        • Yes, the point of this entire post – as one might guess from the title – was to make the relevant queries fast.

          If one did not have that need, then one might not embark on this journey.

          Not everyone might gather that, though. The folks who might not gather that from the title would have my condolences, though, because they’re the kind of people who would eventually electrocute themselves when sticking their naked fingers in an electric socket because they didn’t have a warning label on it.

          I sure am happy that I’m not friends with those kinds of morons.

          Reply
  • Branko Dimitrijevic
    December 26, 2019 2:11 pm

    Indexed view on COUNT_BIG(*) should be even faster. I’m not sure about consequences for concurrency though.

    Reply
  • Thanks Brent…learned something new.
    At first I thought “Oh Brent is surely going after APPROX_COUNT_DISTINCT”….
    Could you do another blog on this new way of counting as well?
    I would be interested in your opinion on use cases for it.

    Reply
    • Martin – ah no, counting distincts (even the new approx function) is way slower than just counting the number of rows. I don’t need a whole post to answer that one: that new function makes sense where you need distincts, but not exact accuracy, and you’re willing to sacrifice a little accuracy for speed.

      I haven’t personally run into that use case, but I trust that there are folks like analysts who have.

      Reply
      • Since I don’t know just how “approximate” that function is, I’ll probably never use it. I hate stuff like that.

        Reply
  • Hi Brent,I don’t have Stack Overflow db on my laptop to check the scripts, so i have a question.
    In step 4 you said “It still uses the BountyAmount index and does the same number of reads as #3”
    but in the result you wrote “Pages read: 694,379” ,it is like #2, is n’t it?

    Reply
  • For the sake of completeness: The fastest way to do a SELECT COUNT(*) without any limiting WHERE / JOIN would be to use the meta data:

    SELECT SUM(dps.row_count) Row_Count
    FROM sys.dm_db_partition_stats dps
    WHERE dps.object_id = OBJECT_ID(‘dbo.votes’)
    AND dps.index_id <= 1 — only heap or clustered index

    This will do only 6 reads / 1 ms CPU / 0 sek duration, (almost) regardless how big the table is (a few more reads if you are using partitioning).

    Because it is a bit harder to write / remember you could create a function / procedure / snipped / auto type text or whatever you prefer to make your daily work easier 🙂

    Reply
    • Thomas – I’m not a big fan of that because the DMVs aren’t guaranteed to honor isolation level requests. Over the years, I’ve hit several DMVs that just flat out didn’t honor READ UNCOMMITTED or NOLOCK, and if you tried to get data out of them, you’d cause blocking that then in turn blocked other user queries. For example, there’s this gem: https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/issues/2176

      Reply
      • BWAAA-HAAAA!!!! Sounds like a great way to find other, ummmmm… “troublesome code” (OMG! It actually caused physical pain to be so politically correct there!).

        Reply
      • SomeOldDBANamedDan
        January 4, 2023 6:51 pm

        It’s those sneaky functions embedded underneath some of these DMV’s that appear to run under a different isolation context no matter what you use. At least the one’s that have given me trouble blocking/getting blocked. I don’t see any functions used with sys.identity_columns at least as far as what’s shown in the plan so that one is a head scratcher.

        Reply
  • I think it’s funny…. most of the time, the count is out of date as soon as it completes.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.