Tag Archive: datamining

How to Import the StackOverflow XML into SQL Server

Want to play around with the StackOverflow database export?  Here's how to import the XML files into SQL Server, and some notes about the tables and data schema. Script to Import StackOverflow XML to SQL Server This T-SQL script will create six stored procedures: usp_ETL_Load_Badges usp_ETL_Load_Comments usp_ETL_Load_Posts usp_ETL_Load_Users usp_ETL_Load_Votes usp_ETL_Load_PostsTags (which isn't one of the StackOverflow tables - more on that ...

Read the full article

StackOverflow Data Mining: Cleansing the Data

The first stage of mining is a dirty, ugly business. Miners don't emerge from tunnels bearing armfuls of shiny diamonds.  They come out with filthy, misshapen rocks that might be something valuable - but might be worthless junk.  There's no way ...

Read the full article

Data Mining the StackOverflow Database

StackOverflow released a public dump of their database this morning. Jeff Atwood and the guys believe that if you, the community, are putting the work into this huge body of knowledge, then you should be able to have rights to use it. This is a great dataset to show off one of my favorite ...

Read the full article

Index Fragmentation Findings: Part 2, Size Matters

Last week, I blogged about the basics of SQL Server index fragmentation: why it happens, how to fix it, and how often people are fixing it.  I left you with a cliffhanger: it seemed that the frequency of defrag jobs didn't appear to affect fragmentation levels: Databases with no index defragmentation were an average of ...

Read the full article

Contest: Data Mine the DMVs

The Microsoft SQL Server Data Mining team is looking for ways to help DBAs with data mining, and they came up with an interesting idea: data mine the DMVs to find interesting information.  I'm going to start by data mining index fragmentation statistics, and I need your help. I’ve got a DMV query to gather ...

Read the full article

Sept 30-Oct 2 – SQLBits - York, UK - doing sessions on virtualization & storage.

Nov 8-11 - PASS Summit - Seattle, WA - doing sessions on virtualization & professional development.

More Upcoming Events