Want to play around with the StackOverflow database export? Here’s how to import the XML files into SQL Server, and some notes about the tables and data schema.
Script to Import StackOverflow XML to SQL Server
This T-SQL script will create six stored procedures:
usp_ETL_Load_Badges
usp_ETL_Load_Comments
usp_ETL_Load_Posts
usp_ETL_Load_Users
usp_ETL_Load_Votes
usp_ETL_Load_PostsTags (which isn’t one of the StackOverflow tables – more on that in a [...]
Read the full article »
The first stage of mining is a dirty, ugly business.
Miners don’t emerge from tunnels bearing armfuls of shiny diamonds. They come out with filthy, misshapen rocks that might be something valuable – but might be worthless junk. There’s no way to tell what you’ve really got until you’ve spent some time analyzing and polishing.
Take one [...]
Read the full article »
StackOverflow released a public dump of their database this morning. Jeff Atwood and the guys believe that if you, the community, are putting the work into this huge body of knowledge, then you should be able to have rights to use it.
This is a great dataset to show off one of my favorite toys [...]
Read the full article »
Last week, I blogged about the basics of SQL Server index fragmentation: why it happens, how to fix it, and how often people are fixing it. I left you with a cliffhanger: it seemed that the frequency of defrag jobs didn’t appear to affect fragmentation levels:
Databases with no index defragmentation were an average of 5% [...]
Read the full article »
The Microsoft SQL Server Data Mining team is looking for ways to help DBAs with data mining, and they came up with an interesting idea: data mine the DMVs to find interesting information. I’m going to start by data mining index fragmentation statistics, and I need your help.
I’ve got a DMV query to gather information [...]
Read the full article »
George Box said, “Essentially, all models are wrong, but some are useful.” Donald illustrated this point with a brief history lesson. Early models of the universe said the sun revolved around the earth, and with that model, marvelous things were possible in architecture, celestial navigation and science. Even though the model was wrong, it was [...]
Read the full article »
Fun stuff from the SQL Server Data Mining team – they’re showing off SQL Server data mining in the cloud. You can connect to a Microsoft SQL Server Analysis Services machine from your local desktop with an Excel 2007 add-in, analyze your data on their servers, all at no charge (for now).
I just published a [...]
Read the full article »