DBA Training Plan 17: Should You Partition Your Tables?

In the last episode, I talked about knowing when it’s time to scale out: identifying when our data is getting to be so large that we have to split it across multiple servers, and I explained why that is so challenging. But what about table partitioning – SQL Server’s ability to break up a single table into smaller ones on the same server?

Great volumes have been written about table partitioning. It’s a complex feature and you can read for days to just understand how you might apply it. But will it improve performance for you? Table partitioning produces great benefits for some applications, but causes giant headaches for others.

What Table Partitioning Does

Table partitioning allows tables or indexes to be stored in multiple physical sections: a partitioned index is like one large index made up of multiple little indexes. Each chunk, or partition, has the same columns – just a different range of rows. In theory, SQL Server handles this transparently for you: a partitioned heap, clustered index, or non-clustered index can be referenced as a single structure although it’s stored in independent physical partitions. In other words, in theory you don’t need to change any code in the calling applications. (Is this true in reality? More on this later.)

Let’s say we want to partition the Stack Overflow Posts table – that’s where the questions and answers are stored. Every question & answer is considered a Post. I’m going to use the 50GB 2013 version of the Stack Overflow database because the Posts table is starting to grow: it’s 17,142,169 rows taking up 37GB space on disk – that’s most of the database, so breaking it up will be a performance boost, right?

Let’s say we want to partition them by CreationDate, and we want to store each year’s data in a separate file. We’ll need to start by creating filegroups for each year:

You’ll notice in this example that I’m using the same drive (M) for all of the files – in partitioning, though, it’s common to put the different years of data in different drives, and even stripe the data across multiple files. Let’s keep it simple for this demo though.

Next, I need to create a partition function and a partition scheme to tell SQL Server which data belongs on which drive/file:

Partitioning pros will note that I’m violating all kinds of best practices. Partitioning pros should also note that I’m trying to teach the basics of partitioning in a really short amount of space. Partitioning pros should go write their own blog posts.

Finally, we move the table over onto the new storage structure:

To pull that off, I had to drop the primary key.

This table – like most tables – has a clustered primary key, and SQL Server doesn’t let me just move the primary key over to the new storage structure. Instead, I have to:

Time to partition my wine collection

Time to partition my wine collection

  • Remove any foreign key relationships that reference this table
  • Drop the primary key
  • Create a new clustered index on the partition scheme (which splits the data across multiple files)
  • Create a primary key again
  • Recreate the foreign key relationships

For more details about why I had to do all that, see this excellent Stack answer by Remus Rusanu, and make sure to read the linked Microsoft article, Special Guidelines for Partitioned Indexes.

See, partitioned indexes come with a whole lot of gotchas:

<sigh>

Partitioned tables work, but…they’re a lot of hard work.

So should you partition your tables?

If you’re dealing with a 1TB+ data warehouse that can do sliding window loads, possibly.

If you’re dealing with a smaller one, or you can’t do sliding window loads, partitioned views are usually a better fit.

If you’re doing transactional workloads, and you can’t guarantee that the partitioning key will be in most of your WHERE clauses, then no, partitioning isn’t a good fit. Plain ol’ index and query tuning will be a better match.

Previous Post
What Are Your Coworkers Learning Right Now?
Next Post
DBA Training Plan 18: Managing Concurrency

11 Comments. Leave new

  • I’ve only come across two real-world situations where partitioning actually made sense. It’s a TON of work, but in the end it significantly reduced my query times for that specific table.

    Both times involved a data warehouse with data spread over decades. Both tables were over a T of data. All queries within those specific tables utilized the date fields in the where clause. Query times dropped by orders of magnitude and partitioning was ultimately worth the price of admission, but it’s sure a pain when dealing with backups and restores now.

    Reply
  • I partition the index in the ReportServer catalog that stores the blob data for PowerBI reports and then store the file group on a different volume. Wish MS would let you put the blob data for the olap data sources in another database…

    Reply
  • I’ve also done this a few times. e.g. Using a smart insert/update/delete trigger. It works pretty good and can easily reduce performance issues. With the amount of big data out there today, this is an essential.

    Reply
  • John Daskalakis
    September 5, 2019 12:41 am

    Thank you for the excellent post!
    A common misconception is that partitioning is mostly intended for performance optimization. Actually, the most useful application of partitioning is for easy archiving of large sets of data (i.e. switch-out & switch-in of partitions).

    Reply
  • Thanks for making me crack up with this paragraph: “Partitioning pros will note that I’m violating all kinds of best practices. Partitioning pros should also note that I’m trying to teach the basics of partitioning in a really short amount of space. Partitioning pros should go write their own blog posts.”

    Reply
  • Marcelo Barbieri
    July 15, 2020 7:42 am

    On SQL Server 2017 Std, I have an SSIS package that runs once a day on a 24×7 mission critical operational database to archive the largest transactional tables. The data is moved to tables on a separate archive database and the process runs smoothly with minimum locking contention on the live db (deletes are spread through a long period of time). Having a separate archiving database gives us many advantages: backup databases separately; refresh testing environments with only current data; improve query performance on current data, amount others. The only drawback was code change (to add logic to union on archived data when requested by the user on the UI) and a slower cross-database query when needing to fetch archived data.
    We are upgrading to Enterprise edition and considering table partitioning for data archiving, but the more I read about it the more worried I get in relation to the implementation and maintenance overhead. The only real advantage I can see so far is the transparency to the application, not requiring code changes.
    Did any of you use table partitioning for data archiving? Is it really worth it on OLTP systems? I got the point mentioned above about doing it for OLAP. Thanks very much in advance.

    Reply
  • Can i suggest a 4th example here?

    “So should you partition your tables?
    If you’re dealing with a 1TB+ data warehouse that can do sliding window loads, possibly.

    If you’re dealing with a smaller one, or you can’t do sliding window loads, partitioned views are usually a better fit.

    If you’re doing transactional workloads, and you can’t guarantee that the partitioning key will be in most of your WHERE clauses, then no, partitioning isn’t a good fit. Plain ol’ index and query tuning will be a better match.

    If you’re dealing with a 1TB+ OLTP with transactional workload, and you can no longer maintain your indexes due to sheer size, and you don’t have an archiving solution just yet, then maybe…

    I’m in the process of copying billions of rows over to a partitioned version of a production table on a 6Tb database, without downtime (cos i don’t have that option), in order to have maintainable indexes.
    We need unicity and the partitioned column isn’t part of the unique columns, so switching in/out wont be possible.
    Burying our head in the sand didnt make it go away, so we’ve gone for partitioning.
    The DataWarehouse part will be offloaded to an Azure ‘copy’ in the nearish future so i’m not too concerned about WHERE clauses and partition elimination.

    Thank you for your dedication and fun value while sharing this knowledge. It makes our job that much more fun and really, really helps!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu