Nick Craver and the kind folks at Stack Overflow publish their data export periodically with your questions, answers, comments, user info, and more. It’s available as an XML data dump, which I then take and import into SQL Server for teaching performance tuning.
You can download the 16GB torrent (magnet), which gives you a series of 7Zip files that you can extract to produce a 118GB SQL Server 2008 database. You can then attach it to any 2008-2017 SQL Server.
The data goes up to 2017/06/11 and includes:
- Badges – 23M rows, 1.1GB data
- Comments – 58.2M rows, 18.5GB data
- Posts – 36.1M rows, 90GB data, 15.5GB off which is off-row text data. This table holds questions & answers, so the Body NVARCHAR(MAX) field can get pretty big.
- PostLinks – 4.2M rows, 0.1GB
- Users – 7.3M rows, 1GB
- Votes – 128.4M rows, 4.5GB
To learn more:
- BrentOzar.com/go/querystack – my page about the SQL Server export with more info about how I produce the database.
- Data.StackExchange.com – a web-based SSMS where you can run your own queries against a recently restored copy of the Stack databases, or run other folks’ queries.
- Watch Brent Tune Queries – free sessions where I take different queries from Data.StackExchange.com and tune it live.
- How to Think Like the Engine – free videos where I show the Users table to explain clustered indexes, nonclustered indexes, statistics, sargability, and more.
Hi Brent, I’m having trouble finding step-by-step instructions online for attaching entire databases in XML format. I followed your link to “XML data dump”, which is mainly just legalese. I scrolled through the comments as well. Someone was talking about opening the XML files in a text editor! How awful, is this really necessary?
Do all the tables have to be imported one-at-a-time, as in the following–?
I regret not getting the basics here. Any help appreciated, please.
No, sorry, I don’t have step by step instructions on how to import XML into a database, but you can get started with this app: https://github.com/BrentOzarULTD/soddi
Why are you trying to do that though? Just download the database per the instructions in the post above.
As I said, maybe I’m missing something very basic here? The Stack Overflow databases are provided as XML, right? And our goal is to import those databases into SQL Server, so that we can run queries against them, as you guys show in your blog posts?
But SSMS / Attach Database doesn’t appear to work for XML files, is this correct? So do we have to import tables one-by-one, as per the link I provided above? I’m afraid I’m missing something very simple, I’m sorry.
Click on the first link under “to learn more” and give that one a good close read start to finish. Enjoy!
Thank you, Brent. I appreciate your help, as always 😉