Implied Predicate and Partition Elimination

>implying

Way back when, I posted about turning the Votes table in the Stack Overflow database into a Partitioned View.

While working on related demos recently, I came across something kind of cool. It works for both partitioned tables and views, assuming you’ve done some things right.

In this example, both versions of the table are partitioned in one year chunks on the CreationDate column.

That means when I run queries like this, neither one is eligible for partition elimination.

Why? Because the CreationDate column in the Posts table could have any range of dates at all in it, so we need to query every partition for matches.

How do we know that? Well, for the partitioned table, because all 12 partitions were scanned.

12! 12 years! Kinda.

For the partitioned view, well…

That’s not cool.

I think it’s obvious what’s gone on here.

Eliminationist

Are you ready for the cool part?

If I add a predicate to the JOIN (or WHERE clause) for the Posts table (remember that the Votes table is partitioned, and the Posts table isn’t), SQL Server is so smart, it can use that to trim the range of partitions that both queries need to access.

The partitioned table plan eliminates 8 partitions, and the seek predicate is converted to the Votes table.

GR8

And the partitioned view is also smart enough to pick up on that and only scan the partitions I need.

Pants: On

Expected?

Logically, it makes total sense for this to happen. The optimizer is pretty smart, so this works out.

Thanks for reading!

Previous Post
Coming in SQL Server 2019: Approximate_Count_Distinct
Next Post
How to Test Your Corruption Alerts

11 Comments. Leave new

  • Brendan Morgan
    November 9, 2017 9:22 am

    What is this, “>” in your queries? 🙂

    Reply
  • Henrik Staun Poulsen
    November 10, 2017 10:15 am

    I’ve learnt the hard way that SQL Server is very picky about the data type of your ‘20140101’ date/string.
    (If you use a variable instead of a constant)
    Do not think that date and datetime and smalldatetime are the same.

    Reply
  • OK I’m an admitted nube, but think ‘incremental stats’ is worth mentioning (SQL 2014 and up)…since your posts elaborates on ‘minimizing efforts and performing actions on only relevant partitions’.

    So a nugget for nubes like myself… ensure your indexes on your partitioned tables have ‘incremental statistics’ turned on: ALTER INDEX [] ON [dbo].[] REBUILD WITH (STATISTICS_INCREMENTAL = ON);

    so instead of recreating stats for 12 years worth of data, Erik can have SQL update the stats on only the applicable partitions when necessary, which most of the time will be on the current years…. and not so much on the ones 9, 10, 11 years ago. (Do I understand that correctly Erik?)

    I know you can manually update stats on the latest partition (based on using the 12 year partition in the example above) with: Update statistics dbo.tablename with resample on partitions (12)

    But here is a question for you Erik… If the default threshold for an autostats is 500 + 20 percent’, then when incremental stats is turned on, ***is the ‘500 + 20% default autostats rule’ now applicable at the partition level too ?***….. meaning will a partition update its stats at that 20% +500 mark, or does your whole table still have to hit that 20% + 500 mark before autostats gets invoked? If not, do you see any cons in having the autostats setting ‘update stats’ at the partition level?

    ….still learning, lots and lots to go.

    Reply
  • So where is the cool part ? This is just partition elimination.

    Reply
    • It’s a really big gif. You may have to wait for it to load. Try just staring at it for an hour or so. Maybe while you’re staring you can read the post a few more times to make sure you get it.

      Best of luck!

      Reply
  • Gerald Britton
    November 24, 2017 8:03 am

    Depending on how you set up partitioning, elimination may or may not work as expected. For example, I came across a shop where the partitioning column was computed. That eliminated partition elimination. However, adding the magic $Partition function restored the partition elimination. I had to do it this way. Make the partition column static, then:

    — Caller passes in a variable with the desired partition column data

    DECLARE @eliminator int;
    SET @eliminator = $Partition.MyPartitionFunction(@VarFromCaller)

    SELECT …
    WHERE $Partition.MyPartitionFunction(PartitionColumn) = @eliminator

    You can also do it with a range (BETWEEN) or list (though that requires dynamic SQL).

    Reply
  • I don’t know yet for sql version 2016 and above, but as for 2014 partition elimination is very tricky and always comes in the fight with parameter sniffing…
    I needed to use trick (I cast all parameter/value into the correct datatype in the where clause) to avoid sql sniffing the date as a parameter and so take the whole table instead of the correct partitions. When you have 12 partitions like here, this is peanuts, but when you begin to have above 5000 partitions, this becomes to be a problem (we maxed out the number of partitions… when you arrive near 15000, even the system tables, view,.. becomes to be very slow).
    Also when it comes to choose partition, sql tends to take one extra partition to the right (if partition right) because the boundary range even strictly < will turn to not cool )
    For the incremental statistics, if you have a big table, this will become a problem to add incremental statistics after it’s creation, so it needs to be added in the begin and of course, you need to update those statistics every time you work with your partitions. Update the statistics is time consuming (with high volume), but only happens once.
    In my experience, sql use them as a whole, but somehow keeps tracking of those numbers internally, you always can query the system tables to see how much record are stored for each partition.
    As Eric showed here, when working with partitions, you need to add the most precise predicate you can or you end up by having sql scan the whole thing… a simple select max(date) can be very slow, we ended up by calculating those numbers by querying the partition system objects instead of the table self…

    Reply
  • Something that drove me nuts and might help someone else trying to reproduce this exercise: partition elimination doesn’t work when using forced parameterization. I don’t know if that applies always (I’ll test it later) but just wanted to share this.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.