New Stack Overflow Public Database Available (2017-06)

Stack Overflow
5 Comments

Nick Craver and the kind folks at Stack Overflow publish their data export periodically with your questions, answers, comments, user info, and more. It’s available as an XML data dump, which I then take and import into SQL Server for teaching performance tuning.

You can download the 16GB torrent (magnet), which gives you a series of 7Zip files that you can extract to produce a 118GB SQL Server 2008 database. You can then attach it to any 2008-2017 SQL Server.

Stack Overflow
The place that saves your job

The data goes up to 2017/06/11 and includes:

  • Badges – 23M rows, 1.1GB data
  • Comments – 58.2M rows, 18.5GB data
  • Posts – 36.1M rows, 90GB data, 15.5GB off which is off-row text data. This table holds questions & answers, so the Body NVARCHAR(MAX) field can get pretty big.
  • PostLinks – 4.2M rows, 0.1GB
  • Users – 7.3M rows, 1GB
  • Votes – 128.4M rows, 4.5GB

To learn more:

  • BrentOzar.com/go/querystack – my page about the SQL Server export with more info about how I produce the database.
  • Data.StackExchange.com – a web-based SSMS where you can run your own queries against a recently restored copy of the Stack databases, or run other folks’ queries.
  • Watch Brent Tune Queries – free sessions where I take different queries from Data.StackExchange.com and tune it live.
  • How to Think Like the Engine – free videos where I show the Users table to explain clustered indexes, nonclustered indexes, statistics, sargability, and more.
Previous Post
A Better Way To Select Star
Next Post
sp_AllNightLog: ¿Por que los queues?

5 Comments. Leave new

  • Brad Stiritz
    July 6, 2017 6:05 pm

    Hi Brent, I’m having trouble finding step-by-step instructions online for attaching entire databases in XML format. I followed your link to “XML data dump”, which is mainly just legalese. I scrolled through the comments as well. Someone was talking about opening the XML files in a text editor! How awful, is this really necessary?

    Do all the tables have to be imported one-at-a-time, as in the following–?
    https://www.mssqltips.com/sqlservertip/2899/importing-and-processing-data-from-xml-files-into-sql-server-tables/

    I regret not getting the basics here. Any help appreciated, please.

    Reply
    • No, sorry, I don’t have step by step instructions on how to import XML into a database, but you can get started with this app: https://github.com/BrentOzarULTD/soddi

      Why are you trying to do that though? Just download the database per the instructions in the post above.

      Reply
      • Brad Stiritz
        July 6, 2017 6:29 pm

        As I said, maybe I’m missing something very basic here? The Stack Overflow databases are provided as XML, right? And our goal is to import those databases into SQL Server, so that we can run queries against them, as you guys show in your blog posts?

        But SSMS / Attach Database doesn’t appear to work for XML files, is this correct? So do we have to import tables one-by-one, as per the link I provided above? I’m afraid I’m missing something very simple, I’m sorry.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.