Updated Stack Overflow Public Data Set for June 2019

Stack Overflow
5 Comments
Stack Overflow
The place that saves your job

Taryn and the kind folks at Stack Overflow have updated their public XML data dump for June, so I’ve imported that into an updated sample database for your blogging and presenting satisfaction.

You can download the 40GB torrent (magnet) and it expands to a ~350GB SQL Server 2008 database. Because it’s so large, we only distribute it with BitTorrent – if you’re new to that, here are more detailed instructions.

Fun facts about this month’s release:

  • The Votes table is up to 172,502,324 rows, but only takes 6.2GB space (since it’s fairly narrow.)
  • The PostHistory table, on the other hand, only has 118,390,637 rows, but consumes 196GB (185GB of which is off-row text data.)
  • The Users table finally broke 8 digits: it’s got 10,528,666 rows, and is still a nice tidy 1.3GB (it’s wide, but most people don’t populate much in the text fields like Location, WebsiteUrl, AboutMe.)

I’m torn about whether or not I’ll distribute the next one in SQL Server 2008 format, or start using SQL Server 2012. The VM I use to build the database has 2008, so it’s not like it costs me extra work to continue using 2008. Plus, you can still attach this in 2019 – gotta love how robust SQL Server’s file handling is. Is there a reason I should change to distributing the next one in 2012 format instead?

Previous Post
What’s Better, CTEs or Temp Tables?
Next Post
What Happens to DBAs When We Move to the Cloud?

5 Comments. Leave new

  • George Lavan
    June 7, 2019 9:17 am

    Updated Stack Overflow Public Data Set for June 2019
    – Is there a reason I should change to distributing the next one in 2012 format instead?

    *** My .02 – If it’s still supported by 2019, why bother. If it’s not broke, leave it alone and don’t fix it.

    Reply
  • I would say it depends on the goal – if it’s to reach and teach the most people possible, then leave it in 2008.

    Reply
  • Yes, because I think it will give you peace of mind knowing you are on the lowest supported version of SQL.

    Reply
  • How long does it take you to get the database online from the moment you download the 7z files?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.