Blog

Great volumes have been written about table partitioning. It’s a complex feature and you can read for days to just understand how you might apply it. But will it improve performance for you? Table partitioning produces great benefits for some applications, but causes giant headaches for others.

How do you know if you should invest your time in table partitioning?

I can help.

SQL Server Table Partitioning: The Basics

Let’s nerd out for a bit on what table partitioning does in SQL Server. First of all, this is an Enterprise Edition feature. (Cha-ching! $$$.)  You can test it in developer edition, but if you want to use it in production, you gotta make sure it’s worth the licensing costs as well as your time.

Table partitioning allows tables or indexes to be stored in multiple physical sections— a partitioned index is like one large index made up of multiple little indexes. Each chunk, or partition, has the same columns– just a different range of rows. Table partitioning is “transparent”. This means a partitioned heap, clustered index, or non-clustered index can be referenced as a single structure although it’s stored in independent physical partitions. In other words, in theory you don’t need to change any code in the calling applications. (Is this true in reality? More on this later.)

An Textbook Example of Table Partitioning

Contoso Corporation’s Froyo Division has a 2TB database named FroyoReports. Each day, 10 million rows of sales data are loaded into a table named FroyoSales. Contoso Corp has employees worldwide who query the data using SQL Server Reporting Services. Reports are run against FroyoReports 24 x 7, although there is a two hour window each day where there is significantly lighter load. 95% of reports run are against the most recent two months of data, and the DBA team controls and can tune the queries run by each report. The Froyo DBA team needs to maintain only 13 months of data in the FroyoSales table. As a safety precaution, they prefer to keep three additional months of data online, but do not want reports to access the older data.

Life used to be tough for the Froyo DBA team. Each night as data was loaded, reports repeatedly blocked inserts. To help alleviate blocking, some reports were modified to have NOLOCK hints. This meant that sometimes reports contained partial data for the most recent day, which caused problems. Users were never really sure when data was finished loading and when it was safe to run reports. The Froyo DBA team deleted old data on weekends, and that process also had problems. There was additional blocking and deletes slowed performance significantly. To solve these problems, the Froyo team implemented table partitioning. They partitioned the FroyoSales table by date.

Each night, the Froyo team loads data with an automated process. First, it loads new fact data into a fresh, empty table named FroyoSalesStaging.

A staging table loaded with data

A staging table loaded with data

Next, it adds indexes and constraints to FroyoSalesStaging so its structure matches Froyo sales.

Preparing a staging table to be switched in to a fact table

Preparing a staging table to be switched in to a fact table

Then, they switch the single partition out of FroyoSalesStaging and into the partitioned table FroyoSales. (This involves a few commands to prepare the metadata for the partitioned table prior to the switch– we’re going for an overview here.) This switch takes a brief moment, then all the new data is visible to users.

Switching a partition from a staging table to a fact table

Switching a partition from a staging table to a fact table

The Froyo team has also automated how they remove old data. Each night they switch the oldest day– which is now past their 13 month limit– out from FroyoSales and into a table named FroyoSalesArchive. (Similarly, there’s a few commands to clean up metadata for FroyoSales after the switch out.)

Switching a table partition to an archive table

Switching a table partition to an archive table

Handy, isn’t it?

Check Out Our Table Partitioning Poster to See How Partitioning Works

It can be complicated to explain how all the components of partitioning can work together, so I’ve created a poster. This poster isn’t about best practices, it’s to help show how partition functions, partition schemes, filegroups, and files work together. For example, the poster shows two partition schemes. Not every implementation needs more than one partition scheme for a partition function– the poster just shows you the option.

SQL Server Table Partitioning Poster

Download the free “Table Partitioning” poster.

The Main Features of Table Partitioning— And the Gotchas

Here are the big attractions for table partitioning, along with the fine print.

SWITCH That Partition

As you can see above, a whole partition can be switched into the table or switched out, allowing for extremely fast loading and removal of large amounts of data. This is, in my opinion, the biggest benefit of partitioning.

There are a couple of gotchas to be aware of. Switching in and switching out partitions can be very fast, but an exclusive lock— Called ‘SCH-M’, or Schema Modification lock— is required. This means you can get blocked from loading or removing data from your table— potentially for a very long time. Also, all of your enabled non-clustered indexes must be “partition aligned” to switch a partition in. This means the partitioning key must be part of each of those indexes. If you need to maintain uniqueness on a set of columns that doesn’t include the partitioning key (which is often the case in OLTP environments), this can pose a problem.

Query Performance on Partitioned Tables: Partition Elimination and Beyond

SQL Server tries to identify when it can use limited parts of a partitioned table. The SQL Server query optimizer may direct a query to only a single partition, multiple partitions, or the whole table. Using fewer partitions than the entire table is called “partition elimination.”

Statistics are maintained for the entire partitioned table or index— you don’t get additional steps in your histogram for each partition. This means that the SQL Server Query optimizer may still have a very hard time knowing how much data is going to be returned by your query, and this difficulty will increase as your table grows. The result may be slow queries.

Queries will perform better when you specify the partitioning key in the criteria (aka the “where clause”). So, although partitioning is “transparent,” for existing applications, query tuning will almost always be required.

Partition Management

Individual partitions may:

  • Be rebuilt individually, for clustered and nonclustered indexes alike.
  • Be set to read-only, via their filegroup— gives you options to optimize backups
  • Live on different disk sub-systems — less frequently accessed data can sit on slow disk. Frequently accessed data can sit on faster disk. All this within the same table! You can move a partition to faster or slower disk online with some virtualization and SAN solutions.

There’s a few things to be aware of:

  • You want to be careful about splitting partitions— performance can be very slow.
  • In SQL Server 2005 and 2008, individual partitions may be rebuilt offline only. An entire partitioned index may be rebuilt online— but that’s a bummer if your database is 24×7.
  • Setting a filegroup to read-only doesn’t eliminate lock management overhead— that’s only true for a read-only database.

Columnstore Indexes and Table Partitioning

Columnstore indexes are a really hot feature in SQL Server 2012. These are columnar indexes optimized for blazing fast performance. Although these indexes will be read-only, partitions may be switched in to columnstore indexes.

When Is a Table Big Enough to Partition?

After covering the basics of table partitioning, this is usually the first question people have: “Is my table big enough?”  My response is: “Let’s talk about why you’re interested in table partitioning. What is the problem you’re experiencing?”

Tell Me Where Your Table Hurts

When people investigate table partitioning in SQL Server, usually they’re having a problem scaling up their database.  What you are experiencing may take many different forms. The problem can contain one or more of the following:

  • “Slow” queries that return small amounts of data
  • “Slow” queries that return large amounts of data
  • “Slow” loading of data
  • Blocking between readers and writers (inserts or updates)
  • Long-running index maintenance jobs (or an inability to run them at all because they would take so long)

“Slow” is of course highly relative. Here it means “my users are complaining” or “my webserver is timing out” or “something is failing and paging me in the middle of the night.”  Often, the tables in question are being used for a mixture of OLTP activity and reporting activity.

My approach is to talk to the team and find out what the experience of the problem is like. Literally, “What keeps you up at night about this table?”

How Is Your Overall Health?

Prescribing table partitioning is like recommending significant surgery— you don’t want someone to go under the knife unless it’s the best way to make things better. I look at the overall health of the system. Where are the current bottlenecks? What are we waiting on? How healthy are the individual components? How are the queries currently performing, and what do the query plans look like? What patterns are in use in the queries which are running? I also look at the structure of the tables and indexes in the context of the queries.

What Performance Characteristics Do You Need?

How many records do you want to load a day? How many records will you be deleting a day in six months? How many new clients is your business expecting to be bringing on, and what is the estimated impact that will have on reads and writes on your system? The number of expected clients can be tricky to translate to database activity. A SQL Server health check can produce some metrics for current activity that can be used for projections.

How’s The Health of Your Budget?

Table partitioning isn’t cheap— this feature is not available in SQL Server Standard Edition. Further, we’re losing the option to have CAL-based licenses for Enterprise Edition with SQL Server 2012. In some cases, Enterprise edition is already in place because of other feature requirements. In others, budgetary constraints make looking at non-Enterprise features attractive.

How Many Queries Can You Tune?

Whether or not you have the flexibility to tune queries is a big differentiator in how you choose to scale up your application. On the one hand, table partitioning is “transparent” because the name of the partitioned objects doesn’t change. On the other hand, you want to tune queries to get partition elimination and the best possible query plans after you partition— and sometimes you need to get a little creative. The structure of your tables and how queries are currently written will play a huge role if you have a limited (or no) ability to tune queries.

What’s The Best Approach to Scaling Your Application?

Here’s my secret: I don’t answer the question of “Should I use table partitioning?” Instead, I answer the question “What is the best way to scale this application?”

The right approach for your scalability problem may contain table partitioning— perhaps by itself, perhaps in combination with other technologies. In the right application, table partitioning can be truly awesome. But we also may be able to scale your application up in another way— perhaps more cheaply, perhaps more quickly, or perhaps in a way that includes built-in geo-diversity. It all depends on your database health, performance requirements, budget, and flexibility.

Still Interested?

If you got this far and you’re still interested in table partitioning, maybe it’s the right fit for you!

↑ Back to top
  1. There are application implications when a large transcriptional table is partitioned. The unique key is now partitioned align, and unique index calls and foreign key relationships need to include the partitioned aligned column. Even worse, you can no longer guarantee a unique identifier on the table just by itself, unless the unique identifier is the only unique index and is also the partition scheme. However, most partitioning strategies involve date+timestamp in order to exercise sliding window.

    I wish SQL Server enhanced their partitioned tables to include Global Indexes, like the ones supported in Oracle for over 10 years now. It would be completely transparent for the application.

    • For readers who are new to partitioning, I would just add that you can make a unique index which doesn’t include the partition key– but that index is “non-aligned” with the partitioned table.

      Having one or more non-aligned indexes enabled on a partitioned table means that swapping partitions in and out no longer works. And that’s one of the biggest features of partitioned tables, so that’s a bummer!

      This is where people get into the really hard choices when they implement partitioning on databases that serve an OLTP role, or a mixed OLTP role. I don’t personally think that’s always a bad idea– it really depends on the use case and looking at all the options— but there typically aren’t nearly as many hard choices to make when it comes to reporting/ warehousing style of databases.

      I expect to see partitioning usage increase with columnstore indexes becoming available. it’s definitely going to be interesting!

      • For EDWs and Data Marts the implementation of partitioning is a no-brainer. Staging to table to partition switch-in is one of the best methods of appending data to the facts. Same goes for sliding windows for data archival.

        Going back to OLTP, I have come across solutions that loaded in excess of 20 million rows per table per day that were part of well normalized schema. The application was making calls that read any rows of those table for the last N days. Because our backup windows increased and due to the fact that storage was limited, we needed to re-architect to support partitioning on these tables. It was a challenge, but once in place it made data archiving an easy task, and allowed us to tier the storage for Read-Only filegroups/partitions. Backups windows decreased as it was only taking care of Read-Write Filegroups.

        Partitioning for column-store indexes are a must IMO. It will allow the DBA to create them on a staging table before switching in into the master table. Columnar-store indexes are not updatable as you already know, and any addition to the master table via DML will require a drop and recreation of the column-store index. Not ideal when you have billions of rows.

  2. Kendra:

    Thanks for the good partitioning overview.

    Should “The Froyo DBA team needs to maintain only 13 of data in the FroyoSales table.” be “The Froyo DBA team needs to maintain only 13 months of data in the FroyoSales table.”?

  3. how partitioning table can solve read write lock issue you described in “An Textbook Example”, can u be more specific ?

    because as far as i know insert operations basiclly don’t block reads(on increasing clustered index), update operations block reads with key lock but same story after partitioning .delete operations wont block reads either if you avoid lock escalation.

    • Hi Zack,

      Partitioning doesn’t change the behavior of isolation levels. The trick is that when you’re loading data into the staging table, none of the reports are reading from it yet– that staging table is completely separate. The only issue with locking you’re going to hit on the partitioned table is when you “switch in”. That requires the schema modification lock, so there could be blocking there– but it is one small, atomic, fast operation.

      Hope this helps.

  4. Kendra: Excellent Article, for our oragnization database we were thinking and few discussions whether or not to go for Partitioning, this article helps

  5. Kendra what are the plusses and minuses to using Hash partitioning on OLTP tables with large numbers of inserts on a daily basis?

    • Hi KC,

      This is a great question– and the answer could be considerably lengthy. I think it’ll make a great blog post. To give the “biggest picture” one-size-fits-all-schemas/apps answer, the first thing I would think about is this:

      What are the application requirements in terms of reading? When I’m partitioning data, I have to pick a partitioning key. If speed of targeted reads is critical as well, then I will need to have a meaningful key that I can identify quickly and use to do partition eliminations on my reads. So, depending on what I need to do with the data after I write it, I may have a lot to consider there.

      Other things to consider are just how many writes you need to do over time. If we’re talking table partitioning alone, we’re talking about tables in the same database. That means we only get one transaction log file, and there’s only ever going to be so much we can do with one log, for example– logs have limits. But before we ever get to that limit, we might hit limits with our storage, with our processor power, etc– all depends on that hardware.

      So, long story short, things like this are why I like to step back and say, “What are our eventual goals?” and really dig into what the application needs to do and work from that direction. I might end up at a solution involving table partitioning, but I might end up with other designs as well.

      Thanks for the great comment– that’s a really rich question and considerations for pros and cons there will make a great post. Stay tuned.

  6. Pingback: SQL Server table partition switching | James Serra's Blog

  7. Pingback: SQL Server table partitioning and storage aligned indexes | KRMc

  8. Hi Kendra,

    You said in your article that “You want to be careful about splitting partitions— performance can be very slow.” That is definitely true and I find myself confronted with this problem, having to extract one year worth of data from a yearly partitioned table that hasn’t been maintain (and now the last partition contains 3 years instead of one). Is there anything that can be done to speed up this process? Would shrinking and / or reorganizing the filegroup before starting the split help? How about compressing the partition before the split?

    Thanks

    • How about dribbling the rows out, say 1 or 2% a day? If the table hasn’t been maintained for 2 years, another month or two shouldn’t kill you.

  9. Kendra, do you know about cheaper alternatives for partitioning other then upgrading to the expensive Enterprise Edition?

    I hear about Rewius (www.rewius.com) which claim to provide partitioning on Standard Edition database, Do you have experience with their software?

  10. Performance is very important with any application.If your database tables have millions of records then a simple SQL query will take 3-4 mins.but ideal time for a query should be at max 5 sec. Now the Question comes “How can improve performance with large databases.“
    See this article http://techathon.mytechlabs.com/performance-tuning-while-working-with-large-database/

    • It’s funny– performance is important with the databases we get called in to look at, but there’s lots of databases where it’s not a big deal. I also find that a few million rows is pretty small change these days!

  11. Table partitioning is best way to improve database performance with large databases.If your table contain millions of records then this is highly recommended you should use partitioning.In this article, I will explain what is partitioning and how can implement partitioning with database.
    see: http://techathon.mytechlabs.com/table-partitioning-with-database/

    • I can’t disagree with your leading sentence more. Table partitioning can also be the best way to DESTROY application performance in large databases. Please keep that in mind.

  12. You guys talk about partitioning a data warehouse fact table and using partition switch to load data into it. Almost always fact tables are loaded daily in a datawarehouse. Is that common to partition the fact table by day to be able to using switching?

    • Yep, a day is a common choice for the partitioning ‘grain’. The version of SQL Server will impact the number of partitions you can have at a given time. Originally only 1,000 partitions were allowed in a partitioned object.

      As of SQL Server 2008 SP2 / SQL Server 2008 R2 SP1 and higher versions, there is support for 15,000 partitions per object, but it needs to be enabled.

      So basically, yes, daily is common, but if you need to keep more than 1,000 days then you may require some special configuration. (Although of course that could be true at other grains as well.) More info is here: http://msdn.microsoft.com/en-us/library/gg981694.aspx

      • For a multi fact table data warehouse, if each partition is stored in its own file group with at least one file, wouldn’t that create thousands of data files for the database? It sounds kind of scary :)

        • If you followed that recipe, you could indeed end up with thousands of data files– and in turn that could create some very slow startup times for your database.

          You have options about where you want to put your partitions. Many partitions can be mapped to the same filegroup (and a filegroup can have one or many files). You could go to the other extreme and have thousands of partitions all on the primary filegroup on a single file, too. All those partitions could be from one or more partitioned objects.

          The “partition scheme” is where you map out how the partitions are laid out on filegroups. And it can certainly be tricky to figure out what’s going to perform best in an environment, depending on how the partitions are used and what kinds of storage you have available.

          You can have more than one object share a partition scheme, by the way– it’s pretty flexible which is great but it makes the choices complex!

          [Edit: added a couple more details about all the options]

          • You mentioned that having many file groups/data files will impact database start up. Is that the only disadvantage? What are some of the advantages of having multiple file groups when doing partitioning or even having one file group per partition. I say that because most examples I have come across, the table is partitioned by month and each month is stored in its own file group. The one advantage I see in that is the fact that you could make older file groups read-only and reduce backup times. Is that the only advantage?

          • Storage cost and performance also factor in. Often only the more recent portions of data are queried regularly — ideally I’ll keep those on faster storage. Much older data may need to be online, but if storage costs are an issue I might want to keep that on cheaper, slower storage.

            Depending on what type of storage is in play, getting this all to work without having lots of downtime can be tricky, but there are some cool tricks if you’re using a SAN or virtualization. (And sometimes having a bit of downtime on parts of the data to get this done is perfectly fine, too.)

            Filegroups also allow more benefits/options when it comes to backup and restore. If my older partitions are not actively being written to, I can mark their filegroups read-only and then back them up more infrequently– thereby reducing space, time and resources needed for backups.

            All of this stuff adds complexity on the DBA side to make sure that you’re still getting the perf you need and that backups are being done properly. But it can be very cool.

            Edit: I should say that the startup time issue with lots of files does take a bit of work. We’re not talking about 20 files causing that issue. On most production sized boxes it’s going to be thousands of files (across all the databases). But it can definitely happen.

  13. Thanks for the quick reply Kendra, much appreciated.

    So basically, the solution will really depend on specific requriments. Here is what we are looking at :)

    - We have multiple large fact tables in our data warehouse (over 100 GB each)
    - We need to apply a retention policy and drop/archive old data
    - We load fact tables daily
    - The retention could be applied monthly or quuarterly
    - We are hoping to reduce the backup window and index management tasks

    at a high level what sort of design would you recommend in terms of partition grain and number of filegroups/files. Based on your previous comments I was thinking of partitioing by day to improve loading of fact tables and creating monthly file groups for each fact tables so that older file groups could be marked as read-only therefore helping the backup process. In terms of retention, if it is done every month, we would need to switch all daily partition within that historical month to staging tables and drop/move them.

    • There’s just so much that goes into it that there’s no way I can make a recommendation based on a conversation or blog comments. I would want to start off before the assumption that table partitioning is the right fit here– there might be a different schema option that could work and be better for licensing and support.

      I certainly don’t mean to blow off your question in any way. This kind of architectural recommendation is something that typically takes a multi-day engagement to make because of all the factors that are involved.

  14. Hi Kendra,
    One general question about file groups when table partitioning is implemented: is it common to create/drop file groups and files dynamically using scripts which maintain table partitions? In other words are file groups normally pre-created and maintained when needed or dynamically created and dropped by scheduled scripts when new data comes in?

    • Most people pre-create a certain amount of partitions, and have an automated process that regularly checks and pre-creates more. That way if for some reason the process doesn’t run for a while you don’t end up with a bunch of data in the wrong place.

  15. Where have you posted your helper functions for partitioning? They look very useful.

    • Hi Jim!

      Glad the views look good!

      Those will get posted with the video from today’s webcast. Look for them later this week. (I’m my own Audio Visual team and processing the video takes a little time.)

      I’ll link everything up from our master partitioning page http://brentozar.com/go/partitioning when that goes live, too.

      Edit: I originally wrote “functions” instead of “views”. Maybe that coffee was TOO good this morning.

  16. Hey Kendra,

    Nice blog, but I got a question for you.

    Would you start with partitions in place if you know that your data is going to grow faster than you can respond?
    The system I am working on has the potential to grow into the Terabyte size in less than a year.

    It is a Mulit-Tennant system where the main partition key would be the “CompanyID”. That would then exend into customer and orders etc. And this is a transactional system.

    Just wonder what you think of this situation?

    Thanks!

    David

    • If your partition ID is CompanyID and you say it’s a transactional system, I’m guessing that you’re not implementing table partitioning for switching (loading data externally and moving in a whole partition quickly / moving a whole partition out quickly).

      If this is the case and you’re trying to partition for performance, I would step back and examine if it’s really the best fit for you architecturally. There may be other designs that accomplish your goals without making performance tuning so difficult over time. I would factor in not only application performance, but also RPO, RTO, and scalability across servers/licensing over time into the decision. There are lots of other options, many of which don’t lock you into EE on every SQL Server instance (including DR), and which might address your problems.

      But I absolutely agree that if you know data is going to grow fast, it’s important to ask these questions early on! I would just recommend evaluating lots of different options than table partitioning since it’s such a tricky beast for performance in OLTP.

      • Hi Kendra,

        Thanks for the response. I realize there are other methods. I am more in favor of distributing the data via filegroups and better indexing. The reason to look a partitioning is that the may be up tp 8,000 concurrent users from 15000 companies. Each of those companies may have large volumes of data in key tables. And we are tied to a response time SLA to populate the screen content.

        This is a Medical office type application on the cloud. Something like the appointment table can have hundreds of million rows. Other tables are as large or even larger.

        My original plan was to have the application read from a Data Mart and have all transactions pass through a separate OLTP process server that would valifdate the transaction and then pump it into the data mart.

        Then the idea came up for partitioning the tables for faster select perfomance from the data mart came into the discussion. I still am not convinced, as you aren’t, that we should partition, but I have to chase the idea down and see.

        While we are currently still on SQL Server 2008 R2, it occurs to me that moving to 2012 with the in-memory capability would be a better solution. But maybe not.

        David

  17. Pingback: Best Practices + Table Partitioning: Merging Boundary Points | SQLServerPedia

  18. Hi Kendra,

    I have a database that is taking up approximately 66% of space on a network drive that also hosts other databases. The culprit is one table which is currently 40GB in size. I was considering table partitioning as a solution to archive off some of this data onto a different file group with more space available. The table is primarily used for reads. The data is to be retained, so I don’t really have the option of purging historical records. If I have understood things correctly, am I correct in saying table partitioning would be a reasonable solution in this instance?

    Thanks.

    • I’d step back and take a really hard look at what problems you’re trying to solve– whether it’s performance, backup and restore, or manageability. There are ways to solve problems in all of these areas without using table partitioning that are much simpler to manage in the long term than table partitioning– and also don’t necessarily come with the Enterprise Edition requirement. I’d definitely evaluate all the options, especially for a table as small as 40GB. That’s not very large by modern standards, really.

      • Thanks for your prompt response Kendra. The problem I am trying to resolve is to find a way of managing the growth of a database, which is taking up a lot of space on our existing server. Not worried at all about performance, back up/restores etc at this current stage. We have Enterprise Edition in our Live environment, which is why I was initially considering this option. Do you mind suggesting any alternative as opposed to the perceived “sledge hammer” approach I was considering? Thanks again.

        • Sure. If the database is important and I wanted consistent performance, I’d consider moving the whole thing to alternate storage. I’d also look hard at the table and indexes and identify how much of that space may be in unused or duplicate nonclustered indexes— there might be a way to reduce the space significantly just by adjusting your indexing.

          You do have the ability to move entire tables or indexes to a new filegroup without partitioning them. Just be careful mixing storage sources if you need to ensure consistent performance. That holds true whether or not you’re using partitioning, of course.

          If you’re open to EE features, you could also look at data and row compression. Like Table Partitioning, using these features means that you can’t restore a backup to a Standard Edition instance, which can sometimes impact DR, so just include that in your planning if any of the EE features turn out to be the best solution.

  19. Pingback: Don’t Drive Your DBA Crazy! | Ms SQL Girl

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>