Blog

Getting started with table partitioning is difficult; there are many decisions to make after you’ve decided to partition data. Correctly configuring partitioning is critical to the long term performance and stability of the database from both a querying and data modification standpoint. There are three key considerations that database architects should devote time to before embarking on a partitioning implementation: the design of the partitioning key, a thorough study of querying practices, and an examination of data modification practices.

Partitioning Key

Keys are important

A critical component of database partitioning is the partitioning key; the partitioning key is used to identify the location of data in the partitioned table. The partitioning key is also used during query execution to avoid table scans by eliminating partitions. The difficulty comes in designing a good partitioning key.

SQL Server’s table partitioning only allows for a single column to be used as a partitioning key, but this is not as limiting you might think. Table partitioning is not limited in the type of column that is used; it is possible to use a computed column for the partitioning key. The advantage of using a computed column is that we can combine multiple columns and create a “smart key” to effectively identify a row’s partition. A smart key is a key that naturally occurs in your database; it could be something similar to the VIN of a car or it could be a combination of a date and a warehousing code (e.g. “ASDF:20120810″). While many database architects discourage the use of natural keys (and thus smart keys) in OLTP systems, natural keys play an important role in table partitioning because they can provide an easy way to split data into logically discrete chunks.

The design of the partitioning key is also important since once data has been partitioned to disk, the only way to change the partitioning scheme is to move the data into new tables. A significant amount of scripts and automation are typically built up around data partitioning and a change to the partitioning scheme can result in a significant amount of development work as well as a significant amount of I/O.

Querying Practices

The most important consideration for designing the partitioning key comes from how data is queried. A simple scan of sys.sql_modules and querying a combination of sys.dm_exec_cached_plans and sys.dm_exec_sql_text will reveal commonly used query criteria. If all queries feature a common set of search criteria (sales territory and date), then it becomes easy to select a partitioning key.

Understanding how data is queried is critical for the success of table partitioning. If queries do not explicitly query on something that amounts to the partitioning key, it will be important to re-write queries so that they will use the partitioning key. This will ensure that the smallest amount of data is touched during any single query’s execution. When using a computed column to create a partitioning key it may be necessary to update queries to include a search on the partitioning key through dynamic SQL or by generating the queries in the application tier.

It may not be possible to rewrite all queries to use the partitioning key. This is normal – when aggregating data across the entire business, it may not be desirable to confine queries to a single partition. While large queries are unavoidable, it’s best to confine queries to the smallest number of partitions possible to reduce I/O and potentially increase the accuracy of optimizations. It may even be necessary, in some cases, to force partition elimination using the $partition function.

Data Modification

Finally, special consideration should be given to how data is modified in the database. OLTP applications typically modify a small number of rows in a single transaction. Data warehouses, on the other hand, modify large amounts of data at once. Although these different data modification patterns do not directly change how data is partitioned, data modification patterns should influence the decisions around partitioning. For instance, partitioning is typically used for data warehousing because a partition swap is a metadata change – since a partition swap is a metadata change, an entire partition can reloaded and index fragmentation can be avoided.

The data warehouse scenario is the easiest approach to partitioning – a new partition is loaded and appended to a table while existing partitions are modified outside the current table and the newly partitions are swapped in place. For OLTP systems, partitioning may still be used as a way to spread write load across multiple drives.

Conclusions

Table partitioning is not a project that should be entered into lightly – it is a major structural change for the database and a major software change for database developers.

↑ Back to top
  1. Great Post. I’ve used partitioning in a data warehouse and the concern was more for rolling off partitions than for querying. I do have one question. You say that “the only way to change the partitioning scheme is to move the data into new tables,” but I have used the method of dropping the clustered index and adding a new clustered index with a different partition scheme. Is this not best practice and should I use the method of transferring to a table that has the new schema configured?

    • These are essentially two ways of talking about the same thing. Either solution works, it’s just a matter of how explicit you want to be about getting the work done. I prefer to be incredibly explicit in the steps that I write so that I know (and my predecessors/the auditors know) exactly what’s happening and what the potential consequences are.

  2. A further consideration is the fact that in a partitioned table the partition column has to be included in any unique or primary key.

    Either you accept that your unique/primary key now has the potential to be non-unique or you design your key generator so that a calculation on the primary key results in your partition key.

    To give an example, in an OLTP system I may have a rolling 30 days of data in the front-end systems. Being able to use partition switching to get rid of the obsolete data is a God send in terms of speed of operation and in the reduced IO footprint. The downside is that my PK for an order header goes from being simply OrderID to OrderID & OrderDate. Theoretically I could get the same OrderID on two separate dates.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>