Microsoft’s keynotes and sessions catered to the business intelligence sector of the audience. In Tom Casey’s keynote on Thursday, one slide said:
“You are on your way to becoming a BI expert.”
I know what you’re thinking, dear reader. If you’re a DBA, whether it’s a production DBA or a development DBA, you’re thinking, “That’s crazy.” And if I hadn’t been at PASS, I would have agreed with you. Take me, for example – I’m an engine guy. I don’t know jack about SSIS, SSAS or SSRS, and all I really care about is making the engine run faster and find ways to manage it easier.
But if someone needs to implement a large BI solution with a big data warehouse, my knowledge will help get the job done because I’m using exactly the same tools that the largest data warehouses will be running on.
The Background on DATAllegro
Microsoft recently acquired a company called DATAllegro that offered a scale-out database solution. If you needed to grow your SQL Server, you could mash together several cheap rack-mount servers, and those servers would cooperate to run the database together. When you executed a query, the work was spread out across all of the servers, and the results were returned faster than a single big server could do it.
DATAllegro’s stuff ran on Linux and open source databases, but they built it in a way that the OS and the database didn’t really matter: with coding work, a different database platform could be substituted in. That’s exactly what Microsoft has done since acquiring DATAllegro: they ripped out the open source under the covers, stuffed in Microsoft products, called it Project Madison and they’re relaunching it in the first half of 2010.
Microsoft demoed this at PASS by using a multi-node system, running a big query on it, and showing how all of the nodes crunched away to deliver the query results. All of the nodes are running SQL Server. I don’t know the specifics of how it’s managed – like, do you connect with SQL Server Management Studio, do you manage the nodes individually, etc – and from the sounds of it, those things are still in flux. But forget the specifics for a second and think about the general idea.
Dr. David DeWitt’s keynote on Friday talked about the challenges of building a parallel query processor: how do you partition the rows, how do you handle partition skew, how do you make sure the nodes get roughly the same amount of work assigned, etc. It was mesmerizing to me as an engine DBA because really, at the heart of it, I’m taking the same engine concepts I’ve been using for years and extending the size of the engine.
- A query was broken up into multiple threads
- Each thread executed on a different processor
- Each thread may have been hitting different disks on the SAN
With Project Madison:
- A query is broken up into multiple threads (by the Control Node)
- Each thread executes on a different node
- Each thread will definitely hit different disks, probably local to the node
It’s really not all that different. The jump from data warehousing to Project Madison isn’t large at all, and the jump from small OLTP databases to Project Madison is about the same as from OLTP to data warehousing. It’s big, but it’s doable as long as you have mentors or senior DBAs around.
So I was listening to DeWitt’s presentation saying to myself, “I can do this. I bet I could build one of these in my lab the day it comes out. This won’t be rocket science. How fast can I get this, and how much will it cost?”
Interview with Tom Casey, SQL Server GM
I had the chance to talk to Tom Casey and one of my questions was about how this will be licensed. In my mind, I was thinking about how easy it would be to sell this to a customer (not that I do sales) if they’re already on Enterprise Edition. A data warehouse is probably already using partitioning, and if it’s easy enough to scale out, I could see moving over to Madison fairly easily as long as the licensing isn’t too prohibitive. Unfortunately, it’s too early to tell about licensing or CTP preview dates.
Microsoft’s vision for the cloud is to offer the same services in the cloud that they offer at a customer’s site. Microsoft SQL Data Services is the ground layer that they’ll be building on, and gradually offering more and more capabilities over time. Same pricing/licensing questions pop up here too: it’s just too early to tell.
I asked about the jaw-dropping visual graphs that were demoed in PASS keynotes, especially the moving-timeline growing-circles showstopper. Tom said Microsoft is looking more at visual communication of data, and that they’re doing some animation work to convey trends. I asked about Silverlight, and he said yes, that’s a possibility too, but it’s too early to tell.
Security, the thorn in the side of cloud pushers everywhere, seems to have an easier, more basic answer. Tom’s guidance was to think of the application as the security container: SQL Server already has the ability to do pretty robust security at the instance level. You can put multiple databases on the same server, tighten down security, and each user will only be able to hit the databases that you want them to hit. Sure, we end up with a bunch of SA’s because people write crappy applications that demand SA, but in theory, there won’t be such a thing as SA in the cloud. You will be locked down at the application level, because the service is designed from the ground up to be multi-tenant.
This ties into the Fabric concept and DAC packs: my guess would be that a DAC pack has self-contained security, and you won’t be able to deploy a DAC pack that needs server admin requirements. Otherwise, you could deploy a malicious DAC pack that got SA control on the server, and that’d be a nightmare. If DAC packs don’t get that level of security, then a DAC pack could be deployed either to a local instance or to the cloud (once the cloud services catch up to local capabilities).
Or to a Project Madison farm, depending on licensing. I’m probably crazy to ask for this, but I would love to be able to manage my single instances, clusters, Project Madison farms and cloud-based applications all with a single tool and a single concept. Especially now that…
I’m Going for the Master Certification
This is still up in the air, but it looks like I’ll be making a run at the Microsoft Certified Master program. The folks at Quest are encouraging me to give it a shot, and I’d be an idiot to turn down an opportunity like that.
I think now is the perfect time to do it: databases are about to make a few rapid changes. The cloud is coming, Project Madison is coming, and the Fabric is coming, and if you get in now and nail down your ground level of knowledge, you’ll be able to get the inside track from Microsoft on training with these advanced technologies. Call me crazy, call me greedy, but this seems like a great chance, and I’ll take it.
That also means I’ll be heads-down in Microsoft exams for the next couple of months. I haven’t taken a Microsoft test since 1999, and the Master prerequisite includes a whole slew of certs! Wish me luck.
I’ve got a blog post brewing in the back of my head regarding this, except I’m coming from the other direction. I’m a database developer; I don’t tweak the engine, and I don’t worry much about the application language de jour. I sling wicked SQL, and I worry about normalization, validation, integrity, tables, and views.
One thing I learned at PASS was that even though Microsoft is trying to develop the role of the DBD, the rest of the community isn’t quite sure what to do with guys like me. Most of the peeps I ran into were DBA’s, or BI guys. There were a few application developers (folks who mostly write client apps on top of SQL Server), but very few simple database developers were roaming around. We’re like Yeti.
What does this mean? It means that I need to start expanding my repertoire, and BI looks like it’s part of that ride. I need to work on my SSIS/SSAS/SSRS skills, even if I consider it my ancillary set.
I’m on my way… (great! now I have the Proclaimers stuck in my head)…