When SQL 2005 was in beta, I clapped my hands and jumped for joy the first time I played with partitioning. I’ve done a lot of data warehouse work, and partitioning lets the DBA break down massive tables into smaller, easy-to-manage chunks. We can put heavily accessed parts of the table onto fast storage, and less-accessed data onto slower, cheaper storage. The cool part is that it’s completely transparent to applications (as long as you don’t have to change primary & foreign keys): they don’t have to know the table is even partitioned.
We implemented it almost as soon as SQL 2005 came out, and I kept wondering why more people didn’t take advantage of this feature. Sure, it’s not particularly intuitive, and there’s no GUI for it in SQL Server Management Studio, but it does work.
“Can partitioning help us?”
Fast forward to yesterday: I met with Mark Schmulen and David Lyman of NutshellMail.com. It’s a slick solution for those of us who work behind corporate firewalls that prevent us from accessing personal email sites like Gmail and Yahoo, as well as personal networking sites like Facebook and Myspace. When I worked for JPMorgan Chase, I was frustrated that I couldn’t get my personal email because I had a network of DBA buddies who all knew and used my personal email address. We would ask each other for support from time to time, and I liked being able to help out. At JPMC, though, the only way I could do that was contact all my friends and have them switch to using my corporate email – which I couldn’t check when I was outside of the office, so that was a lose-lose situation for me. NutshellMail gets around that by checking your personal email accounts on a schedule you select, and then sending you digest emails to your work address. Makes perfect sense, and I’d use it in a heartbeat – even if they weren’t using SQL Server for storage. (Bonus!)
We talked scalability, and they’d heard partitioning may be a good answer for future growth. I was pleasantly surprised to hear that the word about partitioning is getting out, and that was the first time I said no, partitioning isn’t the right answer for this one particular case. Having talked through that, I wanted to touch base on some of the reasons here.
Partitioned tables require Enterprise Edition.
Standard Edition can’t do partitioning, and Enterprise is a few times more expensive than Standard. On a 2-CPU box, we’re talking about a $30-$50k difference in costs. If we’re only using Enterprise for the partitioning, then there’s a better return on investment if we put that $30-$50k into faster storage instead. This is especially relevant for servers with direct attached storage, even in a raid 10 config. I wouldn’t consider partitioning unless the server had at least 10-20 drives for the data array alone.
Partitioning best practices means using both fast and slow storage.
In the case of a data warehouse, a sales table is loaded every night with the previous day’s sales data. The most recent data is under heavy load because it’s constantly being accessed by both the ETL processes and the daily reports – because users really want to know what happened in the last few days to a month. We have to keep years and years of sales data on hand for curious users who want to check trends, though, and we don’t want to keep all of that data on extremely fast (and extremely expensive) storage. (Yes, partitioning gurus, I know there’s work involved in moving data from one array to another, but sometimes it’s worth it.) If we partitioned by date range, we could keep the new data on fast storage, and shuffle the older data off to slower storage.
Partition functions and schemes are easy to design incorrectly.
Before implementing partitioning for the first time, either get personalized advice from a DBA who’s done it several times, or get an identical development system for repeated testing. I had the luxury of an identical QA environment where I could repeatedly test different partitioning strategies and replay the end user load to see which strategy performed best. Not everybody’s that lucky (come to think of it, I should have played the lotto more) and it’s a mistake to try it on a production system first. When I first started using it, I thought I knew exactly the right strategy for our data, and we’re talking about a schema I knew forwards and backwards. I was wrong – really wrong – but at least I found the right answer before going into production.
In summary, outside of data warehouses, I like to think of partitioning as the nuclear bomb option. When things are going out of control way faster than you can handle with any other strategy, then partitioning works really well. However, it’s expensive to implement (Enterprise Edition plus a SAN) and you don’t want to see it in the hands of people you don’t trust.
If you’re considering partitioning, you might want to check out…
Partitioning is just one way to get performance out of your large database. Here’s a few more:
- SAN Multipathing – this rarely-covered concept is the key to getting fast performance out of a SAN.
- Should SQL Server Use Shared Drives? – a reader wrote in asking how to configure his SAN for SQL Server.
- Steel Cage Blogmatch: How to Configure a SAN – Jason Massie and I debate the shared-vs-dedicated-disks question.

Jeremy Marx June 12, 2008 | 8:58 am
Great post, Brent. I’ve been considering date-range partitioning one extremely large (28% of db) transaction table on an OLTP. We have no performance issues at present, but I was hoping to reduce the weight of the backups.
BrentO June 12, 2008 | 9:18 am
Thanks, sir! You mentioned something that I happen to be working on: I’m going to do a blog post on filegroup backups with partitioning. You’re exactly right that it’ll make backups faster with date-range partitioning based on dates where rows are inserted.
Jeremy Marx June 12, 2008 | 10:33 am
Awesome! Looking forward to reading the post.
RLynn January 8, 2009 | 7:28 am
Hello Brent. I'm at the last reason you posted….. and truly hoping not to do it wrong. I agree – partitions are fabulous in terms of performance and usability of VLDB's! But, I haven't touched them hands-on since early v2000 (very problematic, very high maintenance). Right now, I am trying to construct a daily partition strategy for v2008, where the data is massive and more than warrants it. Daily partitions… I'm reading everything i can get my hands on; ie., 'best practices suggest as many filegroups as you have partitions'…. curious, do you know where I may find an example of a daily strategy (function, scheme, etc.)?
Brent Ozar January 8, 2009 | 7:34 am
Hi! The best one I know isn't out yet – Ron Talmage and crew are writing a whitepaper for Microsoft on it, and it's supposed to be out pretty soon, but keep your eyes peeled for it. I'll blog about it the instant it hits virtual newstands.
RLynn January 8, 2009 | 7:42 am
Thanks, Brent. I will keep my eyes open…. in the interim, let's hope she doesn't trip up too badly.
Fortunately, i've got two high-end resources side by side… one for prod, and one to play in. Unfortunately, i need to deploy asap. cross your fingers…. and pls let me know if you hear of anything.
Thanks again.
hamid February 22, 2009 | 10:00 am
I'm being amazing for useing partitioning for my case or not I must to store about 1000 record per a second this data is about location of mobile nodes, these data make my database too huge do you think i must partitioning my databa
se or not(I have so much reporting in future).
Kevin Wong December 3, 2009 | 5:35 pm
Another reason to make use of partitioning is to take advantage of Microsoft’s cloud (SQL Azure), which has a 10GB per partition limit (as at Dec 2009).
Partitioning seems to be a core requirement actually, to using SQL Azure as I’d bet that most production databases for people reading your blog will be way more than 10GB in size. And in the cloud scenario the licensing cost for SQL Server Enterprise Edition is no longer an issue.
Also, you’ve highlighted the need from a disk performance perspective, but it’s nice as a way of implementing some sort of data archiving over time without burdening the application layer.
Brent Ozar December 3, 2009 | 5:57 pm
Kevin – ouch, sorry, it doesn’t work that way. SQL Azure doesn’t support SQL Server partitioning. You have to design differently – you have to do sharding. Sharding and SQL Server partitioning are similar in theory, but not in execution. You can’t code your app the same way to take advantage of both.
In addition, it’s not really useful for “some sort of data archiving” because as of the current SQL Server version, you can’t back up partitions separately. In addition, unless you specify the partitioned dates in the where clause, SQL still has to hit all of the partitions (current and archive), so you don’t really gain anything there either.
Thanks for your comment though!
Mike January 29, 2010 | 10:24 am
Our gigantic table doesn’t have a DATE column, but rather an INTERVAL column that is joined with an INTERVAL table for the date. Any thoughts on an effective partitioning scheme for our DATEless gigantic table? (e.g. Partitioning on ranges of values in and IDX column?)
Thanks in advance!
Brent Ozar January 29, 2010 | 10:25 am
Sure, absolutely, you can partition on the interval column using ranges. Your best bet is to try several different partitioning schemes/functions and benchmark each of them with your own query loads. That way you’ll find out what works best for your own queries. SQL Server’s engine doesn’t always do a good job of partition elimination based on the query, and your results will definitely vary, so internal testing is the way to go.
Venky Subramaniam April 29, 2010 | 4:44 pm
Hi Brent,
A wonderful article on partitioning…Continuing in the same vein, I have a few questions that you can throw some light on, or offer suggestions as necessary:
Scenario:
I have partitioned 5 large tables across 10 partitions (with 10 filegroups) and with a set of 5 – 6 users the performance seems to be adequate, with minimal conflicts (like deadlocks etc), but I want to implement this scheme that can scale to say 300+ users. With regard to this I have the following questions:
– What is the maximum number of partitions that works optimally? In other words, even though SQL Server allows 1000 partitions per table, it would become an administrative nightmare were we to deploy such a scheme.
Degradation Concerns:
– What would be the maximum (but practical) number of partitions that one can deploy before one has to worry about degradation in performance and concurrency?
– If the number of users access the same partition, what would be the negative impact on the overall scheme (in terms, again of performance)?
– Does a multi-CPU environment enhance the overall performace? Of course, we can suggest to the client to use RAID based storage with data striped across multiple drives?
– Any other aspects related to partitioning that can help me for implementation?
MORE DETAILS that might help you hone on the answers if any :
1. The largest table holds about 50 million rows as of now (but this is only a test db..) In other words it can grow to a billion.
2. As of now I have about 10 partitions defined out of which only the first 6 are occupied. The number 10 is not a holy number or restrictive in any sense. I have just chosen this number to be at a manageable level.
3. The data density in other 4 tables can vary from 1 – 2 million.
4. Since the tables involved participate in JOIN operations, I have currently used the same partition function for all and the data distribution is based on a ‘derived’ partition_column called partition number. This way the associated tables are in the same partition. Each partition is associated with one filegroup that comprises of many files.
5. I have also aligned the indexes for the tables across the same partitions.
6. As for the available hardware, I need to contact the client as to what they have and whether they can upgrade if needed. I needed a ballpark infrastructure that I can use to recommend. In other words, at this juncture, I cannot state with confidence how powerful their servers are, but given a recommendation about the requirements, I am sure they would comply. I DO know that their servers are multi-CPU.
7. As for the RAID configuration to be suggested, would RAID 1 and RAID 10 serve the purpose? Both configurations conform to redundancy
Does this information suffice or do I need to provide more?
Please come back with your suggestions or questions and I shall try to provide more information if needed. While I can find several articles of partitioning (even from Microsoft MVPs), I would like answers in particular, to address my concerns expressed above.
With best regards,
Venky
Brent Ozar April 29, 2010 | 4:47 pm
Venky – hi! Boy, have you got a lot of good questions there. Here’s the thing – this would require way more work than I can put into a comment. You’re asking for a level of detail that’s usually done by a consultant rather than put onto a public web page. If you’d like to talk to a consultant, I can give you a few good references if you’re interested. Drop me an email at brento@brentozar.com if that’s something you’d like to pursue.
Pete December 28, 2011 | 1:55 pm
Good response Brent. You must get that a lot. Funny how some people expect you to put in the time and effort to solve their problems… for free!… just because you happened to write a nice article.
Brent Ozar December 28, 2011 | 1:59 pm
Thanks man! Yeah, I do get that a lot, heh.
Adam Sottosanti June 1, 2010 | 12:19 pm
Hey Brent,
All the articles I have read on table partitioning use terms like “huge tables”, or “large tables”, but I have never read a definition of what a huge or large table is. Do you have any suggestions on data size or row count minimums on when to start considering partitioning a fact table? I manage a data warehouse with a handful of fact tables ranging from 5-10 million rows, (data size 5-8 GB each). I don’t consider these to be very large and definately not huge, and have never found very clear answers on whether or not tables of this size would benefit much from partitioning.
Thanks for any advice you may have,
Adam Sottosanti
Brent Ozar June 1, 2010 | 12:48 pm
Adam – good question. The Wikipedia entry on very large databases (VLDBs) suggests that it’s over 1 terabyte or several billion rows. When I’m working with databases and I see a table with over 10 million rows, I want to start paying closer attention to storage strategies, indexing strategies, and recovery strategies. 10 million rows isn’t much at all these days – but the thing is, you want to have your strategies in place before the table grows to 100 million rows. At 100 million rows, it’s still relatively easy to change strategies, but you want to have experience before you get there. At a billion rows, it’s too late to start thinking about these strategies – they need to be solidly in place.
Will January 5, 2011 | 3:19 pm
Hey Brent! Me again in a different context…
We have a really large database within which a single table predominates. The table is steadily inserted/updated over the current month and there are updates to previous months, but they are infrequent. This appears to be a typical case from what I’ve seen and pretty closely mirrors Kimberly Tripp’s example from here: http://msdn.microsoft.com/en-us/library/ms345146%28v=sql.90%29.aspx.
The composite key in my case is an integer id and an integer representing the month. For the sake of argument lets say time begins on Jan 2011. The first month is 1, Feb 2011 is 2…. The integer id’s are repeated each month, i.e. they represent the same thing with some changes for each successive month.
The challenge for me is wrapping my head around the idea of implementing possibly hundreds of files, one for each month. Another way I thought of dealing with this is having say 10 files and storing the first n/10 months in partition 1, the next n/10 in partition 2 and so on. What do you think?
Brent Ozar January 5, 2011 | 3:31 pm
Will – unfortunately, this is beyond something I can work on in the comments. It’d take me working directly with you to talk about your needs, the data access patterns, the size of the data, etc. This is the kind of thing we usually get involved with during a design project. If you’re interested in working with us, shoot me an email at BrentO@BrentOzar.com. Thanks!
Kasper Bengtsen January 5, 2011 | 4:37 pm
Since this is one of the top articles on SQL Server partitioning I’ll throw in this one:
Be aware that under certain conditions partitioning will not be an option due to the following bug in the query optimizer:
http://connect.microsoft.com/SQLServer/feedback/details/240968/partition-table-using-min-max-functions-and-top-n-index-selection-and-performance
The bug results in certain queries performing a complete index sort, rather than just using the known order of an index to determine the result.
This is very unfortunate if your table contains billions of rows.
Examples of simple queries affected by the bug:
A. “SELECT MAX(ColumnN) FROM PartitionedTable” (with a non-clustered aligned index on ColumnN)
B. “SELECT TOP 10 ColumnX, ColumnY, ColumnZ FROM PartitionedTable ORDER BY ColumnX, ColumnY, ColumnZ” (with a clustered aligned index on ColumnX, ColumnY, ColumnZ)
In our case the above queries resulted in a difference in execution duration of milliseconds vs. hours, for the exact same queries, on non-partitioned and partitioned tables respectively.
Microsoft has no plans to fix this bug in current releases of SQL Server, and has no plans to fix it in the next major either.
So for an enterprise solution that allows for you to:
)
-prioritize data in the same table on physical storage (because all data has to be available online, but only some is frequently queried)
-keep the physical prioritization transparent to the layers above the database (because you already have existing applications using the database, and/or is using ORMs in the data access layer)
-keep your indexes aligned to utilize PARTITION SWITCH (because your maintenance window for moving less relevant data to low priority storage is limited)
-use TOP N/ORDER BY style queries (because who doesn’t
… you will have to look elsewhere.
Vestergaard April 25, 2011 | 3:27 pm
Yesterday i wrote a few posts on this topic. The first post was even answered by Brent. They are gone now. It’s not like they were controversial. Brent did you delete my posts ?
Brent Ozar April 25, 2011 | 3:27 pm
Vestergaard – sorry, we were in the midst of a web server migration, and those comments were left after the database was exported. They’re gone. Sorry about that!
Pingback: Partitioning in SQL Server « Dev Blog
Umesh August 30, 2011 | 12:38 am
We have a table with 160 million records. First thing is there is no clustered index on table reason being data will pumped into this table from other source because of which performance will be slow to insert the data. The problem we are facing is report generation is slow when application tries to select the data from this table. Can we take advantage of partitioning so that data retrival is faster for report generation.
Brent Ozar August 30, 2011 | 5:58 am
Umesh – the first thing I’d check is an indexing strategy, perhaps including a clustered index. While you can partition a heap (a table with no clustered indexes), it’s unlikely that partitioning will make report generation run faster.
Pingback: SQL Server A to Z - Partitioning | Cleveland DBA
Iain February 21, 2012 | 4:39 am
“…it’s completely transparent to applications…” I’d disagree with this – one of the requirements for table partitioning is that the PK (and indeed any other unique index) on a partitinoed table must cover the partitioning column, this also means that any related FKs (which references the PK or other unique index of a partitioned table) must also be amended. Hence any applications which scan metadata of the database to determine relationships (e.g. analysis services, reporting services, ect) see these modified relationships and modify their bahaviour accordingly, Analysis Services for example produces different queries based on these modified relationships…
Brent Ozar February 21, 2012 | 5:41 am
Iain – great point! I’m going to edit the post to clarify that it’s completely transparent as long as the primary and foreign keys aren’t changed.
Paul McKanzy November 4, 2012 | 8:56 am
In my project we use Rewius to do partitioning on Standard edition database and it work quite well
Brent Ozar November 4, 2012 | 8:58 am
Paul – can you clarify on what Rewius is? Never heard of it before, and Google isn’t coming up with results.
Sérgio Eustáquio April 5, 2013 | 2:22 am
Sorry for digging up an old thread but it might be useful for someone.
I think he’s talking about this: https://sites.google.com/site/rewiues/home
Great article by the way.
Brent Ozar April 5, 2013 | 7:38 am
Ah – just the web site alone makes me a little nervous.
Michael November 26, 2012 | 12:31 pm
Hello,
I have a table that I am currently planning to partition. the partitioning key is an integer MonthID that translate to specific Months. Example – 1 is 1/1990, 2 is 2/1990, 3 is 3/1990 and so on… currently we have 283 of these Months in the table and the plan is to go up to 508. The primary key is made up of the MonthID and AccountNumber. The reports and select statements may access data of any Month period. Does anyone have any advise on this as far as index planning etc…?
Thanks in advance.
Brent Ozar November 26, 2012 | 12:34 pm
Hi, Michael. What’s the problem you’re trying to solve with this design?
Michael November 26, 2012 | 4:13 pm
Hi Brent,
I just wanted to verify the indexing should stay as is once the table is partitioned. Also, I read several blogs that certain designs could cause more trouble than good. Our tables is currently at 2.3 billion records and the loading process deletes then replaces records or inserts new records. I just wanted to make sure that doing the table partition will not hurt the data selection performance.
Thanks for any pointers.