Getting started with table partitioning is difficult; there are many decisions to make after you’ve decided to partition data. Correctly configuring partitioning is critical to the long term performance and stability of the database from both a querying and data modification standpoint. There are three key considerations that database architects should devote time to before embarking on a partitioning implementation: the design of the partitioning key, a thorough study of querying practices, and an examination of data modification practices.
A critical component of database partitioning is the partitioning key; the partitioning key is used to identify the location of data in the partitioned table. The partitioning key is also used during query execution to avoid table scans by eliminating partitions. The difficulty comes in designing a good partitioning key.
SQL Server’s table partitioning only allows for a single column to be used as a partitioning key, but this is not as limiting you might think. Table partitioning is not limited in the type of column that is used; it is possible to use a computed column for the partitioning key. The advantage of using a computed column is that we can combine multiple columns and create a “smart key” to effectively identify a row’s partition. A smart key is a key that naturally occurs in your database; it could be something similar to the VIN of a car or it could be a combination of a date and a warehousing code (e.g. “ASDF:20120810″). While many database architects discourage the use of natural keys (and thus smart keys) in OLTP systems, natural keys play an important role in table partitioning because they can provide an easy way to split data into logically discrete chunks.
The design of the partitioning key is also important since once data has been partitioned to disk, the only way to change the partitioning scheme is to move the data into new tables. A significant amount of scripts and automation are typically built up around data partitioning and a change to the partitioning scheme can result in a significant amount of development work as well as a significant amount of I/O.
The most important consideration for designing the partitioning key comes from how data is queried. A simple scan of
sys.sql_modules and querying a combination of
sys.dm_exec_sql_text will reveal commonly used query criteria. If all queries feature a common set of search criteria (sales territory and date), then it becomes easy to select a partitioning key.
Understanding how data is queried is critical for the success of table partitioning. If queries do not explicitly query on something that amounts to the partitioning key, it will be important to re-write queries so that they will use the partitioning key. This will ensure that the smallest amount of data is touched during any single query’s execution. When using a computed column to create a partitioning key it may be necessary to update queries to include a search on the partitioning key through dynamic SQL or by generating the queries in the application tier.
It may not be possible to rewrite all queries to use the partitioning key. This is normal – when aggregating data across the entire business, it may not be desirable to confine queries to a single partition. While large queries are unavoidable, it’s best to confine queries to the smallest number of partitions possible to reduce I/O and potentially increase the accuracy of optimizations. It may even be necessary, in some cases, to force partition elimination using the
Finally, special consideration should be given to how data is modified in the database. OLTP applications typically modify a small number of rows in a single transaction. Data warehouses, on the other hand, modify large amounts of data at once. Although these different data modification patterns do not directly change how data is partitioned, data modification patterns should influence the decisions around partitioning. For instance, partitioning is typically used for data warehousing because a partition swap is a metadata change – since a partition swap is a metadata change, an entire partition can reloaded and index fragmentation can be avoided.
The data warehouse scenario is the easiest approach to partitioning – a new partition is loaded and appended to a table while existing partitions are modified outside the current table and the newly partitions are swapped in place. For OLTP systems, partitioning may still be used as a way to spread write load across multiple drives.
Table partitioning is not a project that should be entered into lightly – it is a major structural change for the database and a major software change for database developers.