This is not about table partitioning
With that out of the way, let’s talk about partitioned views, and then create one from a real live table in the StackOverflow database.
What is a partitioned view?
It’s a view that combines the full results of a number of physical tables that are logically separated by a boundary. You can think of the boundary the same way you’d think of a partitioning key. That’s right about where the similarities between table partitioning and partitioned views end.
For example, you have Table1, Table2, and Table3. All three tables are vertically compatible. Your view would look something like this.
CREATE VIEW dbo.Doodad AS SELECT columns... FROM Table1 UNION ALL SELECT columns... FROM Table2 UNION ALL SELECT columns... FROM Table3 SELECT * FROM dbo.Doodad
Hooray. Now you have to type less.
Partitioned views don’t need a scheme or a function, there’s no fancy syntax to swap data in or out, and there’s far less complexity in figuring out RIGHT vs LEFT boundaries, and leaving empty partitions, etc. and so forth. You’re welcome.
A lot gets made out of partition level statistics and maintenance being available to table partitioning. That stuff is pretty much automatic for partitioned views, because you have no choice. It’s separate tables all the way down.
How else are they different?
Each table in a partitioned view is its own little (or large) data island. Unlike partitioning, where the partitions all come together to form a logical unit. For example, if you run ALTER TABLE on a partitioned table, you alter the entire table. If you add an index to a partitioned table, it’s on the entire table. In some cases, you can target a specific partition, but not all. For instance, if you wanted to apply different compression types on different partitions for some reason, you can do that.
With partitioned views, you add some flexibility in not needing to align all your indexes to the partitioning key, or alter all your partitions at once, but you also add some administrative overhead in making sure that you iterate over all of the tables taking part in your partitioned view. The underlying tables can also all have their own identity columns, if you’re into that sort of thing. If you’re not, you should use a Sequence here, assuming you’re on 2012 or later.
Are partitioned views better?
Yes and no! The nice thing about partitioned views is that they’re available in Standard Edition. You can create tables on different files and filegroups (just like with partitioning), and if you need to change the data your view covers, it’s a matter of altering the view that glues all your results together.
Tables can have different columns (as long as you account for them in your view definition), which isn’t true for partitioning, and you can compress or index different indexes on different partitions differently depending on the workload that hits them. Think about reporting queries that want to touch your historical data. Again, not ‘aligning’ all your nonclustered indexes to the partitioned view boundary doesn’t necessarily hurt you here.
They do share a similar problem to regular partitioning, in that they don’t necessarily make your queries better unless your predicates include the partitioning key. Sad face. It is fairly easy to get partition elimination with the correct check constraints, as long as you include a partition eliminating element in your query predicates. If your workload doesn’t reliably filter on something like that, neither form of partitioning is going to help your queries go faster.
One problem I’ve found particularly challenging with partitioned views is getting the min/max/average per partition the way you would with normal partitioning. There are a couple good articles about that here and here, but they don’t really help with partitioned views. If anyone out there knows any tricks, leave a comment ;^}
Modifications can be tricky, too. Bulk loads will have to be directed to the tables, not the view itself. Deletes and updates may hit all the underlying tables if you’re not careful with your query predicates.
Ending on a high note, the view name can obfuscate sensitive table names from them there hackers.
From the comments!
Geoff Patterson has a Connect Item about pushing Bitmap Filters through Concatenation Operators, which would be really powerful for Partitioned Views.
If this kind of thing would suit your needs, give it an upvote over here.
Enough already, what’s it look like?
Download the script below. It assumes that you have a copy of the StackOverflow database with the Votes table, and that you’re not a StackOverflow employee running this on the production box. Things might get weird for you.
Assuming everything runs correctly, in about a minute you should have a series of tables called Votes_20XX. How many you end up with depends on which copy of the database you have. Mine has data through 2016, so i have Votes_2008 – Votes_2016.
Each table has a PK/CX on the Id column, a constraint for the year of dates in the table on the CreationDate column, and a nonclustered index to help out some of our tester queries. You can just hit f5, and the RETURN will stop before running any of the example queries.
You should see something like this when it’s done. Yay validation.
How does query work?
It’s pretty easy to see what I was talking about before. If you’re not looking at CreationDate in the WHERE clause, you end up touching a lot of tables. When you use it, though, SQL is really smart about which tables it hits.
Ranges work pretty well too.
Even when they’re broken up!
But if you don’t specify a date, you run into trouble.
You’d think this would go better, but it doesn’t. I mean, the MAX date would have to come from the… oh, forget it.
That’s all, y’all
I hope you enjoyed this, and that it helps some of you out. If you’re looking to implement partitioned views, feel free to edit my script to work on your table(s). Just make sure it’s on a dev server where you can test out your workload to make sure everything is compatible.