Great volumes have been written about table partitioning. It’s a complex feature and you can read for days to just understand how you might apply it. But will it improve performance for you? Table partitioning produces great benefits for some applications, but causes giant headaches for others.
How do you know if you should invest your time in table partitioning?
I can help.
SQL Server Table Partitioning: The Basics
Let’s nerd out for a bit on what table partitioning does in SQL Server. First of all, this is an Enterprise Edition feature. (Cha-ching! $$$.) You can test it in developer edition, but if you want to use it in production, you gotta make sure it’s worth the licensing costs as well as your time.
Table partitioning allows tables or indexes to be stored in multiple physical sections— a partitioned index is like one large index made up of multiple little indexes. Each chunk, or partition, has the same columns– just a different range of rows. Table partitioning is “transparent”. This means a partitioned heap, clustered index, or non-clustered index can be referenced as a single structure although it’s stored in independent physical partitions. In other words, in theory you don’t need to change any code in the calling applications. (Is this true in reality? More on this later.)
An Textbook Example of Table Partitioning
Contoso Corporation’s Froyo Division has a 2TB database named FroyoReports. Each day, 10 million rows of sales data are loaded into a table named FroyoSales. Contoso Corp has employees worldwide who query the data using SQL Server Reporting Services. Reports are run against FroyoReports 24 x 7, although there is a two hour window each day where there is significantly lighter load. 95% of reports run are against the most recent two months of data, and the DBA team controls and can tune the queries run by each report. The Froyo DBA team needs to maintain only 13 months of data in the FroyoSales table. As a safety precaution, they prefer to keep three additional months of data online, but do not want reports to access the older data.
Life used to be tough for the Froyo DBA team. Each night as data was loaded, reports repeatedly blocked inserts. To help alleviate blocking, some reports were modified to have NOLOCK hints. This meant that sometimes reports contained partial data for the most recent day, which caused problems. Users were never really sure when data was finished loading and when it was safe to run reports. The Froyo DBA team deleted old data on weekends, and that process also had problems. There was additional blocking and deletes slowed performance significantly. To solve these problems, the Froyo team implemented table partitioning. They partitioned the FroyoSales table by date.
Each night, the Froyo team loads data with an automated process. First, it loads new fact data into a fresh, empty table named FroyoSalesStaging.
Next, it adds indexes and constraints to FroyoSalesStaging so its structure matches Froyo sales.
Then, they switch the single partition out of FroyoSalesStaging and into the partitioned table FroyoSales. (This involves a few commands to prepare the metadata for the partitioned table prior to the switch– we’re going for an overview here.) This switch takes a brief moment, then all the new data is visible to users.
The Froyo team has also automated how they remove old data. Each night they switch the oldest day– which is now past their 13 month limit– out from FroyoSales and into a table named FroyoSalesArchive. (Similarly, there’s a few commands to clean up metadata for FroyoSales after the switch out.)
Handy, isn’t it?
Check Out Our Table Partitioning Poster to See How Partitioning Works
It can be complicated to explain how all the components of partitioning can work together, so I’ve created a poster. This poster isn’t about best practices, it’s to help show how partition functions, partition schemes, filegroups, and files work together. For example, the poster shows two partition schemes. Not every implementation needs more than one partition scheme for a partition function– the poster just shows you the option.
Download the free “Table Partitioning” poster.
The Main Features of Table Partitioning— And the Gotchas
Here are the big attractions for table partitioning, along with the fine print.
SWITCH That Partition
As you can see above, a whole partition can be switched into the table or switched out, allowing for extremely fast loading and removal of large amounts of data. This is, in my opinion, the biggest benefit of partitioning.
There are a couple of gotchas to be aware of. Switching in and switching out partitions can be very fast, but an exclusive lock— Called ‘SCH-M’, or Schema Modification lock— is required. This means you can get blocked from loading or removing data from your table— potentially for a very long time. Also, all of your enabled non-clustered indexes must be “partition aligned” to switch a partition in. This means the partitioning key must be part of each of those indexes. If you need to maintain uniqueness on a set of columns that doesn’t include the partitioning key (which is often the case in OLTP environments), this can pose a problem.
Query Performance on Partitioned Tables: Partition Elimination and Beyond
SQL Server tries to identify when it can use limited parts of a partitioned table. The SQL Server query optimizer may direct a query to only a single partition, multiple partitions, or the whole table. Using fewer partitions than the entire table is called “partition elimination.”
Statistics are maintained for the entire partitioned table or index— you don’t get additional steps in your histogram for each partition. This means that the SQL Server Query optimizer may still have a very hard time knowing how much data is going to be returned by your query, and this difficulty will increase as your table grows. The result may be slow queries.
Queries will perform better when you specify the partitioning key in the criteria (aka the “where clause”). So, although partitioning is “transparent,” for existing applications, query tuning will almost always be required.
Individual partitions may:
- Be rebuilt individually, for clustered and nonclustered indexes alike.
- Be set to read-only, via their filegroup— gives you options to optimize backups
- Live on different disk sub-systems — less frequently accessed data can sit on slow disk. Frequently accessed data can sit on faster disk. All this within the same table! You can move a partition to faster or slower disk online with some virtualization and SAN solutions.
There’s a few things to be aware of:
- You want to be careful about splitting partitions— performance can be very slow.
- In SQL Server 2005 and 2008, individual partitions may be rebuilt offline only. An entire partitioned index may be rebuilt online— but that’s a bummer if your database is 24×7.
- Setting a filegroup to read-only doesn’t eliminate lock management overhead— that’s only true for a read-only database.
Columnstore Indexes and Table Partitioning
Columnstore indexes are a really hot feature in SQL Server 2012. These are columnar indexes optimized for blazing fast performance. Although these indexes will be read-only, partitions may be switched in to columnstore indexes.
When Is a Table Big Enough to Partition?
After covering the basics of table partitioning, this is usually the first question people have: “Is my table big enough?” My response is: “Let’s talk about why you’re interested in table partitioning. What is the problem you’re experiencing?”
Tell Me Where Your Table Hurts
When people investigate table partitioning in SQL Server, usually they’re having a problem scaling up their database. What you are experiencing may take many different forms. The problem can contain one or more of the following:
- “Slow” queries that return small amounts of data
- “Slow” queries that return large amounts of data
- “Slow” loading of data
- Blocking between readers and writers (inserts or updates)
- Long-running index maintenance jobs (or an inability to run them at all because they would take so long)
“Slow” is of course highly relative. Here it means “my users are complaining” or “my webserver is timing out” or “something is failing and paging me in the middle of the night.” Often, the tables in question are being used for a mixture of OLTP activity and reporting activity.
My approach is to talk to the team and find out what the experience of the problem is like. Literally, “What keeps you up at night about this table?”
How Is Your Overall Health?
Prescribing table partitioning is like recommending significant surgery— you don’t want someone to go under the knife unless it’s the best way to make things better. I look at the overall health of the system. Where are the current bottlenecks? What are we waiting on? How healthy are the individual components? How are the queries currently performing, and what do the query plans look like? What patterns are in use in the queries which are running? I also look at the structure of the tables and indexes in the context of the queries.
What Performance Characteristics Do You Need?
How many records do you want to load a day? How many records will you be deleting a day in six months? How many new clients is your business expecting to be bringing on, and what is the estimated impact that will have on reads and writes on your system? The number of expected clients can be tricky to translate to database activity. A SQL Server health check can produce some metrics for current activity that can be used for projections.
How’s The Health of Your Budget?
Table partitioning isn’t cheap— this feature is not available in SQL Server Standard Edition. Further, we’re losing the option to have CAL-based licenses for Enterprise Edition with SQL Server 2012. In some cases, Enterprise edition is already in place because of other feature requirements. In others, budgetary constraints make looking at non-Enterprise features attractive.
How Many Queries Can You Tune?
Whether or not you have the flexibility to tune queries is a big differentiator in how you choose to scale up your application. On the one hand, table partitioning is “transparent” because the name of the partitioned objects doesn’t change. On the other hand, you want to tune queries to get partition elimination and the best possible query plans after you partition— and sometimes you need to get a little creative. The structure of your tables and how queries are currently written will play a huge role if you have a limited (or no) ability to tune queries.
What’s The Best Approach to Scaling Your Application?
Here’s my secret: I don’t answer the question of “Should I use table partitioning?” Instead, I answer the question “What is the best way to scale this application?”
The right approach for your scalability problem may contain table partitioning— perhaps by itself, perhaps in combination with other technologies. In the right application, table partitioning can be truly awesome. But we also may be able to scale your application up in another way— perhaps more cheaply, perhaps more quickly, or perhaps in a way that includes built-in geo-diversity. It all depends on your database health, performance requirements, budget, and flexibility.
If you got this far and you’re still interested in table partitioning, maybe it’s the right fit for you!