Blog

StackOverflow released a public dump of their database this morning. Jeff Atwood and the guys believe that if you, the community, are putting the work into this huge body of knowledge, then you should be able to have rights to use it.

This is a great dataset to show off one of my favorite toys from the Microsoft SQL Server Data Mining team. In this half-hour video, Tom LaRock and I will walk you through data mining the StackOverflow user list to find out more about the users and see what makes the rockstar high-reputation users different from the worker bees like me.

If this looks interesting to you, here’s what else I’ve been doing with the StackOverflow data:

Now, back to what I did in the video – let’s talk about the tools I used.

Microsoft’s Free Data Mining Tools

For today’s demo, I’m using SQL Server Analysis Services installed on my desktop. Relax – it’s really easy. Literally just install SQL Server 2005 or 2008 Developer Edition, check the box for Analysis Services, and use the defaults. You don’t have to know what you’re doing in order to get it up and running, and it just runs in the background as a service. After you’re done playing around, you can stop the service and set it to manual to prevent it from sapping your system resources. Go into Control Panel, Administrative Tools, double-click on the SQL Server Analysis Services service, and change the startup type to Manual.

Depending on your version of SQL Server and Excel, you’ll need one of these free plugins from Microsoft:

If you want to avoid the whole SQL Server Analysis Services thing altogether, you can also use Microsoft’s free SQL Server Data Mining in the Cloud plugin. Be aware that it’s a technical preview, not a fully supported & released product. Their cloud servers can (and do) go down. Also know that your data is going into the cloud, which has its own ramifications as I’ve discussed in my previous cloud data mining tutorial.

What’s Coming Next: SQL Server 2008 R2 with BI in Excel

In the next version of SQL Server, Microsoft will deliver business intelligence to end users through Excel. At the Professional Association for SQL Server Summit last November, Donald Farmer demoed slicing and dicing of huge spreadsheets with real-time analytics that previously would have required some pretty hefty hardware.

Excel 2007 has a million-row limit, but the forthcoming version will not. Some of the StackOverflow export tables like Votes have more than a million rows, so we can’t yet data mine those using Excel as a front end, but we can play with the Users table today.

Subscribing or Downloading My Podcasts

If you have an MP3 player or a portable video player and you want to download my podcasts automatically, you can subscribe to the SQLServerPedia podcast feeds here:

You can also download this video to watch it later:

↑ Back to top
  1. Great stuff Brent, really really good video. It’d be great to see more people leverage these fantastic data mining tools!

  2. Wow, excellent post and video. It really reinforces what you and I discussed over the Memorial Day Weekend.

  3. The MP4 video download link is not working with a 403 error.

  4. Felix – sorry about that! It’s fixed now.

  5. Brent, just watched your video. I’m more of a Java guy than a .Net guy so I’m less familiar with SQL Server and haven’t seen the data mining tools before.

    Very interesting!

    I look forward to future posts and/or screencasts on the topic.

  6. This is some awesome stuff – I knew that SQL Server had analysis tools but I didn’t know it was so intuitive to set up via Excel. The ability to find trends like that solely through giving what you want to optimize towards is incredibly useful!

    Thanks again for putting the hard work into making podcasts like this, for systems programmers like me who don’t know much about databases, stuff like this is invaluable.

  7. Glad you guys liked it! I’ve got another one coming next week with me and Tom LaRock walking through it.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php