Bill Baker: Keynote on Office PerformancePoint 2007
Bill’s a fantastic speaker, and this is an interesting product, but our organization’s BI isn’t mature enough to take on this product yet. The product has a lot of potential; in a nutshell, it combines the flexibility and agility of Excel with the data safety & consistency of SQL Server BI.
Sitting in seminars the last few days, though, has gotten me really excited about the direction of the Microsoft BI stack. When our company is ready to take it to the next level, Microsoft has some great tools to help us get there. First I gotta get Analysis Services in place.
Linchi Shea: SQL Server on SANs
I picked this seminar before I realized I’m already subscribed to Linchi’s blog. He does a great job in the blog of illustrating SAN performance issues with charts and tests, and this seminar was like his blog’s greatest hits. His recommendations included:
Measure the performance of each piece of the storage I/O path using MBps, IOps, and latency. That will make it easier to identify the bottlenecks. Use IOMeter or SQLIO.exe to quantify each I/O path with those metrics. Then follow up with repeated measurements over time to watch for shifting SAN performance.
Linchi’s slide on “Create a balanced storage I/O path (6)” clearly showed how I/O bottlenecks can occur at each point of the SAN infrastructure. 2Gb cards mean 256MB/sec throughput max.
Linchi likes synchronous storage-based replication simply because it’s easier to manage large scale replication jobs than it is to manage large scale database mirroring. Heavy I/O servers may not work out because of the added latency, but for most servers he likes it. Note that he’s using fiber to connect the datacenters, which keeps latency low.
A lot of the data applied to EMC, but did not apply as much to IBM, whose SANs carve out LUNs differently. I can’t say how much it applied to other SANs, though.
He recommended sticking with basic disks instead of dynamic disks, and noted that dynamic disks are not supported in a MS failover cluster. He hasn’t seen a performance differentiation, and that in-place LUN growth is not quite there yet.
He explained performance tuning as a continual, iterative process cycle: measure I/O, identify the bottleneck, improve or remove the bottleneck, and start the process again.
An EMC guy in the audience kept interrupting and clarifying things. Note to seminar attendees employed by a vendor: if you would like to talk, get your own vendor seminar. Note to other seminar attendees: when someone talks, look at their badge, because they’re not always innocent bystanders.
Phil Hildebrand: Hash Partitioning and Scaling the OLTP Database
Phil spent the first 45 minutes walking through the basics of how SQL 2005 automated partitioning works and how to implement it. This was a snoozer for me because I’ve got partitioning in production already, but I was clearly in the minority. The rest of the attendees asked lots of basic partitioning questions, and they asked the kinds of questions that showed they were grasping the concept.
One of them caught on quickly that in select statements, SQL 2005 only eliminates partitions when the partition key is used as a part of the WHERE clause. Otherwise, it always scans every partition. Phil sighed and agreed that this is a huge problem, and he hopes SQL 2008 does a better job of partition elimination. I wholeheartedly agree there. He pointed out that you still get a performance gain on insert/update/delete statements, since they only lock a single partition.
The best part for me was the last 15 minutes, where he walked through hash partitioning, and I got this huge “AH!” look on my face.
Imagine a Customer table with an identity field for CustomerID. You don’t want to pre-create partitions ahead of time for all of the possible customer ranges. Instead, you just want to reduce load by lumping customers into, say, 10 partitions.
The solution: add a persisted computed column that contains your own hash function like modulus 10 (CustomerID % 10). Every customer will fall into one of ten buckets (0-9), instantly slicing your table volume by 10. Wow. I wish I’d have thought of that! This reduces insert/update contention because you can load balance the data across 10 partition, preferably on 10 different LUNs.
The biggest drawback is the partition elimination problem here too: SQL will continue to scan all of the partitions unless your where clause specifically states the persisted computed column value. Makes me want to go play with 2008 just to see if the partition elimination is better.
Another drawback would be the difficulty of changing the partition scheme later. Changes would require modifying the persisted computed field, recalculating the data for all of the customers, and moving their records onto the new filegroups. Ouch. Regardless, it’s a neat idea with interesting applications.
And there you have it!
Now for the tough part: poring over all of the notes from the sessions I attended, figuring out what to implement first, and even worse, examining the slides from the sessions I couldn’t attend due to conflicts! What a great experience.