Nick Craver and the kind folks at Stack Overflow publish their data export periodically with your questions, answers, comments, user info, and more. It’s available as an XML data dump, which I then take and import into SQL Server for teaching performance tuning.
You can download the 16GB torrent (magnet), which gives you a series of 7Zip files that you can extract to produce a 118GB SQL Server 2008 database. You can then attach it to any 2008-2017 SQL Server.
The data goes up to 2017/06/11 and includes:
- Badges – 23M rows, 1.1GB data
- Comments – 58.2M rows, 18.5GB data
- Posts – 36.1M rows, 90GB data, 15.5GB off which is off-row text data. This table holds questions & answers, so the Body NVARCHAR(MAX) field can get pretty big.
- PostLinks – 4.2M rows, 0.1GB
- Users – 7.3M rows, 1GB
- Votes – 128.4M rows, 4.5GB
To learn more:
- BrentOzar.com/go/querystack – my page about the SQL Server export with more info about how I produce the database.
- Data.StackExchange.com – a web-based SSMS where you can run your own queries against a recently restored copy of the Stack databases, or run other folks’ queries.
- Watch Brent Tune Queries – free sessions where I take different queries from Data.StackExchange.com and tune it live.
- How to Think Like the Engine – free videos where I show the Users table to explain clustered indexes, nonclustered indexes, statistics, sargability, and more.