DBA Training Plan 17: Should You Partition Your Tables?

In the last episode, I talked about knowing when it’s time to scale out: identifying when our data is getting to be so large that we have to split it across multiple servers, and I explained why that is so challenging. But what about table partitioning – SQL Server’s ability to break up a single table into smaller ones on the same server?

Great volumes have been written about table partitioning. It’s a complex feature and you can read for days to just understand how you might apply it. But will it improve performance for you? Table partitioning produces great benefits for some applications, but causes giant headaches for others.

What Table Partitioning Does

Table partitioning allows tables or indexes to be stored in multiple physical sections: a partitioned index is like one large index made up of multiple little indexes. Each chunk, or partition, has the same columns – just a different range of rows. In theory, SQL Server handles this transparently for you: a partitioned heap, clustered index, or non-clustered index can be referenced as a single structure although it’s stored in independent physical partitions. In other words, in theory you don’t need to change any code in the calling applications. (Is this true in reality? More on this later.)

Let’s say we want to partition the Stack Overflow Posts table – that’s where the questions and answers are stored. Every question & answer is considered a Post. I’m going to use the 50GB 2013 version of the Stack Overflow database because the Posts table is starting to grow: it’s 17,142,169 rows taking up 37GB space on disk – that’s most of the database, so breaking it up will be a performance boost, right?

Let’s say we want to partition them by CreationDate, and we want to store each year’s data in a separate file. We’ll need to start by creating filegroups for each year:

You’ll notice in this example that I’m using the same drive (M) for all of the files – in partitioning, though, it’s common to put the different years of data in different drives, and even stripe the data across multiple files. Let’s keep it simple for this demo though.

Next, I need to create a partition function and a partition scheme to tell SQL Server which data belongs on which drive/file:

Partitioning pros will note that I’m violating all kinds of best practices. Partitioning pros should also note that I’m trying to teach the basics of partitioning in a really short amount of space. Partitioning pros should go write their own blog posts.

Finally, we move the table over onto the new storage structure:

To pull that off, I had to drop the primary key.

This table – like most tables – has a clustered primary key, and SQL Server doesn’t let me just move the primary key over to the new storage structure. Instead, I have to:

Time to partition my wine collection

Time to partition my wine collection

  • Remove any foreign key relationships that reference this table
  • Drop the primary key
  • Create a new clustered index on the partition scheme (which splits the data across multiple files)
  • Create a primary key again
  • Recreate the foreign key relationships

For more details about why I had to do all that, see this excellent Stack answer by Remus Rusanu, and make sure to read the linked Microsoft article, Special Guidelines for Partitioned Indexes.

See, partitioned indexes come with a whole lot of gotchas:

<sigh>

Partitioned tables work, but…they’re a lot of hard work.

So should you partition your tables?

If you’re dealing with a 1TB+ data warehouse that can do sliding window loads, possibly.

If you’re dealing with a smaller one, or you can’t do sliding window loads, partitioned views are usually a better fit.

If you’re doing transactional workloads, and you can’t guarantee that the partitioning key will be in most of your WHERE clauses, then no, partitioning isn’t a good fit. Plain ol’ index and query tuning will be a better match.

Previous Post
What Are Your Coworkers Learning Right Now?
Next Post
DBA Training Plan 18: Managing Concurrency

7 Comments. Leave new

  • I’ve only come across two real-world situations where partitioning actually made sense. It’s a TON of work, but in the end it significantly reduced my query times for that specific table.

    Both times involved a data warehouse with data spread over decades. Both tables were over a T of data. All queries within those specific tables utilized the date fields in the where clause. Query times dropped by orders of magnitude and partitioning was ultimately worth the price of admission, but it’s sure a pain when dealing with backups and restores now.

    Reply
  • I partition the index in the ReportServer catalog that stores the blob data for PowerBI reports and then store the file group on a different volume. Wish MS would let you put the blob data for the olap data sources in another database…

    Reply
  • I’ve also done this a few times. e.g. Using a smart insert/update/delete trigger. It works pretty good and can easily reduce performance issues. With the amount of big data out there today, this is an essential.

    Reply
  • John Daskalakis
    September 5, 2019 12:41 am

    Thank you for the excellent post!
    A common misconception is that partitioning is mostly intended for performance optimization. Actually, the most useful application of partitioning is for easy archiving of large sets of data (i.e. switch-out & switch-in of partitions).

    Reply
  • Thanks for making me crack up with this paragraph: “Partitioning pros will note that I’m violating all kinds of best practices. Partitioning pros should also note that I’m trying to teach the basics of partitioning in a really short amount of space. Partitioning pros should go write their own blog posts.”

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}