Tag Archive: ssas

Data Mining the StackOverflow Database

StackOverflow released a public dump of their database this morning. Jeff Atwood and the guys believe that if you, the community, are putting the work into this huge body of knowledge, then you should be able to have rights to use it.

This is a great dataset to show off one of my favorite toys from the Microsoft SQL Server Data Mining team. In this half-hour video, Tom LaRock and I will walk you through data mining the StackOverflow user list to find out more about the users and see what makes the rockstar high-reputation users different from the worker bees like me.

If this looks interesting to you, here’s what else I’ve been doing with the StackOverflow data:

Now, back to what I did in the video – let’s talk about the tools I used.

Microsoft’s Free Data Mining Tools

For today’s demo, I’m using SQL Server Analysis Services installed on my desktop. Relax – it’s really easy. Literally just install SQL Server 2005 or 2008 Developer Edition, check the box for Analysis Services, and use the defaults. You don’t have to know what you’re doing in order to get it up and running, and it just runs in the background as a service. After you’re done playing around, you can stop the service and set it to manual to prevent it from sapping your system resources. Go into Control Panel, Administrative Tools, double-click on the SQL Server Analysis Services service, and change the startup type to Manual.

Depending on your version of SQL Server and Excel, you’ll need one of these free plugins from Microsoft:

If you want to avoid the whole SQL Server Analysis Services thing altogether, you can also use Microsoft’s free SQL Server Data Mining in the Cloud plugin. Be aware that it’s a technical preview, not a fully supported & released product. Their cloud servers can (and do) go down. Also know that your data is going into the cloud, which has its own ramifications as I’ve discussed in my previous cloud data mining tutorial.

What’s Coming Next: SQL Server 2008 R2 with BI in Excel

In the next version of SQL Server, Microsoft will deliver business intelligence to end users through Excel. At the Professional Association for SQL Server Summit last November, Donald Farmer demoed slicing and dicing of huge spreadsheets with real-time analytics that previously would have required some pretty hefty hardware.

Excel 2007 has a million-row limit, but the forthcoming version will not. Some of the StackOverflow export tables like Votes have more than a million rows, so we can’t yet data mine those using Excel as a front end, but we can play with the Users table today.

Subscribing or Downloading My Podcasts

If you have an MP3 player or a portable video player and you want to download my podcasts automatically, you can subscribe to the SQLServerPedia podcast feeds here:

You can also download this video to watch it later:

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL Server Kilimanjaro, Gemini announcements

At the BI Conference in Seattle, Microsoft unveiled some components of project code name Kilimanjaro, a sort of R2 release for SQL Server expected in the first half of 2010.  Here’s a couple of relevant news sources:

Here’s the part I really like: Gemini centers around Excel as its user interface.

Let’s face it: our power users live, eat and breathe Excel.  They’re all over it.  They know it forwards and backwards, and they pull tricks with pivot tables that make DBAs scratch their heads.  My bosses have always been able to out-Excel me, and that’s been fine with me.

If Excel is going to be the future interface for self-service BI, with SQL Server as the back end, I for one welcome our new spreadsheet overlords.

Why am I embracing Excel so much?  Because the cloud is coming, open source is coming, and our competitors just keep coming.  The one thing none of them do well is the front end, the power user interface.  None of them have anything even remotely close to a user interface as rich and powerful as Excel.

If my BI team comes in and says, “Give me one good reason we shouldn’t switch our data warehouse over from SQL Server to Oracle/cloud-based MySQL/cheap Postgres/fast-database-du-jour,” I’ve got a new cool answer: our power users love self-service BI with Excel, and nobody else is going to be able to touch that.

Are there beautiful BI front ends out there that put lipstick on Oracle, MySQL and cool new databases?  Sure – but most companies are already licensing Excel on the desktop, and their users know how to use it.  Cheaper, less training, less implementation time – it’s a win all the way around for SQL Server.

And you know what I like the most?  It lets the SQL Server Management Studio team focus on day-to-day DBA task management and stay out of the BI power user business.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

SQL Server Data Mining in the cloud!

Fun stuff from the SQL Server Data Mining team – they’re showing off SQL Server data mining in the cloud.  You can connect to a Microsoft SQL Server Analysis Services machine from your local desktop with an Excel 2007 add-in, analyze your data on their servers, all at no charge (for now).

I just published a short demo of SQL Server Data Mining in the cloud on SQLServerPedia.com.

I am reeeeeally impressed.  Wiping the drool off my keyboard.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts