Time to make SQL Server demos a little more fun. Now you can download a torrent of the SQL Server database version of the Stack Overflow data dump.
We keep past versions here in case your demos need specific sizes:
- 2017-06 – 16GB torrent (magnet), 118GB SQL Server 2008 database
(Starting with the 2017-06 torrent, I broke this up into 8 data files, each in their own 7z file, to make compression/decompression/distribution a little easier. You need all of those files to attach the database.)
- 2017-01 – 14GB torrent (magnet), 110GB SQL Server 2008 database
- 2016-03 – 12GB torrent (magnet), 95GB SQL Server 2005 database
- 2015-08 – 9GB torrent (magnet), 70GB SQL Server 2005 database
As with the original data dump, this is provided under cc-by-sa 3.0 license. That means you are free to share this database and adapt it for any purpose, even commercially, but you must attribute it to the original authors (not us):
Current Data Sizes
As of the 2017-06 data dump, clustered indexes only (no nonclustered indexes just to keep distribution size small):
- Badges – 23M rows, 1.1GB data
- Comments – 58.2M rows, 18.5GB data
- Posts – 36.1M rows, 90GB data, 15.5GB off which is off-row text data. This table holds questions & answers, so the Body NVARCHAR(MAX) field can get pretty big.
- PostLinks – 4.2M rows, 0.1GB
- Users – 7.3M rows, 1GB
- Votes – 128.4M rows, 4.5GB
How to Get the Database
- Install a BitTorrent client – I recommend Deluge, a fast, free, easy open source one.
- Download & open this .torrent file – it’s a small metadata file that tells your BitTorrent client where to connect and start downloading the files.
- Wait. The big file may take a few hours to download depending on your internet connection and how many other people are seeding the torrent.
- Extract the .7Zip files with 7Zip – it will create the database MDF, NDFs (additional data files), LDF, and a Readme.txt file. Don’t extract the files directly into your SQL Server’s database directories – instead, extract them somewhere else first, and then move or copy them into the SQL Server’s database directories. (This just avoids permissions hassles.)
- Attach the database – it’s in Microsoft SQL Server 2008 format (2005 for the older torrents), so you can attach it to any 2008 or newer instance. It doesn’t use any Enterprise Edition features like partitioning or compression, so you can attach it to Developer, Standard, or Enterprise Edition. (If your SSMS crashes or throws permissions errors, you likely tried extracting the archive directly into the database directory, and you’ve got permissions problems on the data/log files.)
Please leave the torrent up and running – seeding the torrent helps other folks get it faster.
Why I’m Using BitTorrent
BitTorrent is a peer-to-peer file distribution system. When you download a torrent, you also become a host for that torrent, sharing your own bandwidth to help distribute the file. It’s a free way to get a big file shared amongst friends.
The download is relatively large, so it would be expensive for me to host on a server. For example, if I hosted it in Amazon S3, I’d have to pay around $1 USD every time somebody downloaded the file. I like you people, but not quite enough to go around handing you dollar bills. (As it is, I’m paying for a seedbox to get this thing started.)
Some corporate firewalls understandably block BitTorrent because it can use a lot of bandwidth, and it can also be used to share pirated movies/music/software/whatever. If you have difficulty running BitTorrent from work, you’ll need to download it from home instead.
What’s Inside the StackOverflow Database
I want you to get started quickly while still keeping the database size small, so:
- All tables have a clustered index
- No other indexes are included (nonclustered or full text)
- The log file is small, and you should grow it out if you plan to build indexes or modify data
- It’s distributed as an mdf/ldf so you don’t need space to restore a backup – just attach it
- It only includes StackOverflow.com data, not data for other Stack sites
To get started, here’s a few helpful links: