PASS Recap: Interview with Val Fontama

#SQLPass

Next up in my PASS Summit interview series is Microsoft’s Val Fontama. Val focused on the new Parallel Data Warehouse Edition of SQL Server 2008R2, formerly known as Project Madison. I was really excited about the prospects of this new edition, and I had a lot of practical implementation questions. Let’s see whether the answers pass this DBA’s smell test.

Parallel Data Warehouse Edition is Bundled with Hardware

Short story: you won’t be downloading PDWE on MSDN and building a lab.

PDWE will be sold as a Fast Track reference architecture solution made up of a SAN, storage servers, compute servers and a control node. Infiniband connects the servers together, which explains a little about why the servers are sold as a tightly integrated package. You’ll buy the whole shebang from one of these vendor partners:

  • Bull servers with an EMC SAN
  • Dell servers with an EMC SAN
  • HP servers with an HP MSA SAN
  • IBM servers with an IBM SAN

Normally, when I hear someone say, “You can only buy this along with hardware from these 3-4 server vendors,” I feel like I don’t really have choices. However, Microsoft’s marketing this solution as offering choices, so how do they pull it off? Well, remember that the competition is Oracle and Teradata, both of which sell their solutions on their own hardware. Compared to those guys, Microsoft is indeed offering choice here, even though the choices are limited.

Drawbacks of Hardware Bundles

One drawback of prepackaged hardware/software solutions, though, is that it can be difficult to do proof-of-concept setups for businesses to try before they buy. To accommodate this, some Microsoft Technology Centers scattered around the US will have Parallel Data Warehouse Edition packages already set up. Businesses will be able to bring their data warehouse data to the MTC and kick the tires on a server. Some hardware vendors may also have their own labs available to customers.

This drawback jacks up the implementation cost for customers. They’re going to want to test it before they buy, which means flying their team members to a Microsoft Technology Center and putting them up for a matter of days or weeks to build and test the solution. At the end of the engagement, the team will return, place a hardware/software order, and then repeat the process in-house.

Another drawback is that companies can’t cobble together a development or QA environment from leftover gear. They’ll probably need at least three separate PDWE environments:

  • Production
  • Development (if not QA as well)
  • Disaster Recovery

Companies might be more flexible on skipping a duplicated dev or QA environment if PDWE was exactly the same as pure SQL Server implementations, but it’s not. PDWE v1 is still bound by some of DATAllegro’s limitations (PDWE is the result of Microsoft’s DATAllegro acquisition.) In version 1, the control node still has its own query engine that builds its own unique query plans. Not all queries are supported, and not all features of SQL Server are supported. You wouldn’t want to build a solution against SQL Server and then just hope to upsize it to Parallel Data Warehouse Edition.

While we’re talking about what’s not included, be aware that PDWE’s licensing only includes the storage engine, not SSAS, SSIS, or SSRS. The hardware and software bought for Parallel is only for Parallel – your ETL work, analysis cubes, and reports will all need to be on separate hardware with its own SQL Server licensing.

SSMS and DBAs Need Not Apply

The most venerable of free tools, SQL Server Management Studio, isn’t relevant here because in v1, management will be done in DATAllegro’s management utility, not SSMS. Not that it matters much to you, because the production DBA won’t be responsible for PDWE performance diagnostics or tuning. PDWE is considered a sealed box – if you have performance problems, you have to call an engineer. The engineer will come out, review your setup, and decide what changes need to be made or what additional hardware needs to be purchased.

People who will love this: DBAs who don’t want to hassle with BI teams complaining about slow queries.

People who will hate this: me.

Even if I can’t fix the problem, I want to at least be able to look under the hood and learn how to do diagnostics. In my Perfmon & Profiler presentation, I talk about how SQL Server is like a car dashboard, and you need something more powerful than just a Check Engine light. With Parallel Data Warehouse Edition, I feel like I’m being chained into a Honda Civic with nothing more than a Check Engine light. I want a full suite of instrumentation. I want my DMVs. I want to be able to tweak knobs and make a difference.

I still love the idea of Parallel Data Warehouse Edition, and I can’t wait to play around with it. I’m just bummed that the closest I’ll get is looking at it through the glass walls of a Microsoft Technology Center.

Commodity Hardware: Not as Good as It Sounds

Technically, the servers involved are all commodity hardware – meaning, you could pick up the phone and call Dell to get another storage server or compute server shipped to you. However, to me, the real advantage of commodity hardware means that I can repurpose other hardware in my datacenter to adjust capacity reasonably quickly. If I need three more compute nodes for the holiday season, or if I need four more storage nodes to temporarily do a big import of past historical data, I’d like to be able to repurpose gear from other projects. This is the age of virtualization, when I can add capacity with the click of a mouse. Calling in an engineer sounds so 1998.

If you only have a limited subset of hardware vendors, and if you can’t pop the hood to tweak it, and if you can’t add capacity on demand (and no, calling Microsoft to send an engineer then order separate hardware doesn’t count as “on demand”) then why not just host it? I asked Val if Microsoft had any plans to offer PDWE as a hosted solution, especially since the current 10 customers on the preview program are all running their data warehouses out of Microsoft’s labs anyway. I didn’t get a solid answer there.

I’d been hoping that businesses could say, “We’re running our data warehouse right now on SQL Server, but we’re running into performance issues. Let’s make an investment in commodity hardware, reuse the SAN we’ve already got (possibly adding capacity), and move to Project Madison. We already know and trust SQL Server, we’ve got a full staff who knows how to manage it, and our apps work well with it. This is a no-brainer.” Instead, the migration to PDWE will require a pretty hefty expenditure, application changes, and a new hands-off management style. When faced with that choice, managers won’t find PDWE a no-brainer – they’ll put it through its paces against Oracle and Teradata. Those companies both have mature solutions, and PDWE is basically v1. Microsoft is going to have to win on price – something I found much more probable before they jacked up the prices on SQL Server 2008 R2 by 20-30%.

Previous Post
PASS Recap: Interview with Bill Graziano
Next Post
PASS Recap: Discussions with PASS Board Members

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.