How to Download the Stack Overflow Database

Last Updated 2 years ago

I use a Microsoft SQL Server version of the public Stack Overflow data export for my blog posts and training classes because it’s way more interesting than a lot of sample data sets out there. It’s easy to learn, has just a few easy-to-understand tables, and has real-world data distributions for numbers, dates, and strings. Plus, it’s open source and no charge for you – just choose your size:

Small: 10GB database as of 2010: 1GB direct download, or torrent or magnet. Expands to a ~10GB database called StackOverflow2010 with data from the years 2008 to 2010. If all you need is a quick, easy, friendly database for demos, and to follow along with code samples here on the blog, this is all you probably need.
Medium: 50GB database as of 2013: 10GB direct download, or torrent or magnet. Expands to a ~50GB database called StackOverflow2013 with data from 2008 to 2013 data. I use this in my Fundamentals classes because it’s big enough that slow queries will actually be kinda slow.
Large: current 430GB database as of 2022-06: 54GB torrent (magnet.) Expands to a ~430GB SQL Server 2016 database. Because it’s so large, I only distribute it with BitTorrent, not direct download links.
For my training classes: specialized copy as of 2018/06: 47GB torrent (magnet.) Expands to a ~180GB SQL Server 2016 database with queries and indexes specific to my training classes. Because it’s so large, I only distribute it with BitTorrent, not direct download links.

After you download it, extract the .7Zip files with 7Zip. (I use that for max compression to keep the downloads a little smaller.) The extract will have the database MDF, NDFs (additional data files), LDF, and a Readme.txt file. Don’t extract the files directly into your SQL Server’s database directories – instead, extract them somewhere else first, and then move or copy them into the SQL Server’s database directories. You’re going to screw up the database over time, and you’re going to want to start again – keep the original copy so you don’t have to download it again.

Then, attach the database. It’s in Microsoft SQL Server 2016 format (2008 for the older torrents), so you can attach it to any 2016 or newer instance. It doesn’t use any Enterprise Edition features like partitioning or compression, so you can attach it to Developer, Standard, or Enterprise Edition. If your SSMS crashes or throws permissions errors, you likely tried extracting the archive directly into the database directory, and you’ve got permissions problems on the data/log files.

As with the original data dump, this is provided under cc-by-sa 4.0 license. That means you are free to share this database and adapt it for any purpose, even commercially, but you must attribute it to the original authors (not me):

What’s Inside the StackOverflow Database

I want you to get started quickly while still keeping the database size small, so:

All tables have a clustered index on Id, an identity field
No other indexes are included (nonclustered or full text)
The log file is small, and you should grow it out if you plan to build indexes or modify data
It only includes StackOverflow.com data, not data for other Stack sites

To get started, here’s a few helpful links:

This Meta.SE post explains the database schema.
If you want to learn how to tune queries, Data.StackExchange.com is a fun source for queries written by other people.
For questions about the data, check the data-dump tag on Meta.StackExchange.com.

Past Versions

I also keep past versions online too in case you need to see a specific version for a demo.

2021-02 – 54GB torrent (magnet.) Expands to a ~401GB SQL Server 2016 database.
2020-06 – 46GB torrent (magnet.) Expands to a ~381GB SQL Server 2008 database. This is the last export that can be used with SQL Server 2014 & prior.
2019-12 – 52GB torrent (magnet.) Expands to a ~361GB SQL Server 2008 database.
2019-09 – 43GB torrent (magnet.) Expands to a ~352GB SQL Server 2008 database. This is the last export licensed with the cc-by-sa 3.0 license.
2019-06 – 40GB torrent (magnet.) Expands to a ~350GB SQL Server 2008 database.
2018-12 – 41GB torrent (magnet.) Expands to a ~323GB SQL Server 2008 database.
2018-09 – 39GB torrent (magnet.) Expands to a ~312GB SQL Server 2008 database.
2018-06 – 38GB torrent (magnet.) Expands to a ~304GB SQL Server 2008 database. Starting with this version & newer, the giant PostHistory table is included. As you can probably guess by the name, this would make for excellent partitioning and archival demos. As you might not guess, the NVARCHAR(MAX) datatypes of the Comment and Text fields make those demos rather…challenging.
2017-12 – 19GB torrent (magnet.) Expands to a ~137GB SQL Server 2008 database.
2017-08 – 16GB torrent (magnet), 122GB SQL Server 2008 database. Starting with this version & newer, each table’s Id fields are identity fields. This way we can run real-life-style insert workloads during my Mastering Query Tuning class. (Prior to this version, the Id fields were just INTs, so you needed to select the max value or some other trick to generate your own Ids.)
2017-06 – 16GB torrent (magnet), 118GB SQL Server 2008 database. Starting with this torrent & newer, I broke this up into multiple SQL Server data files, each in their own 7z file, to make compression / decompression / distribution a little easier. You need all of those files to attach the database.
2017-01 – 14GB torrent (magnet), 110GB SQL Server 2008 database
2016-03 – 12GB torrent (magnet), 95GB SQL Server 2005 database
2015-08 – 9GB torrent (magnet), 70GB SQL Server 2005 database

Why are Some Sizes/Versions Only On BitTorrent?

BitTorrent is a peer-to-peer file distribution system. When you download a torrent, you also become a host for that torrent, sharing your own bandwidth to help distribute the file. It’s a free way to get a big file shared amongst friends.

The download is relatively large, so it would be expensive for me to host on a server. For example, if I hosted it in Amazon S3, I’d have to pay around $5 USD every time somebody downloaded the file. I like you people, but not quite enough to go around handing you dollar bills. (As it is, I’m paying for multiple seedboxes to keep these available, heh.)

Some corporate firewalls understandably block BitTorrent because it can use a lot of bandwidth, and it can also be used to share pirated movies/music/software/whatever. If you have difficulty running BitTorrent from work, you’ll need to download it from home instead.

Does Creating an Indexed View Require Exclusive Locks on an Underlying Table?

I’m on the Away From the Keyboard Podcast

164 Comments. Leave new

Dejan Markovic
October 3, 2015 2:00 pm

Thanks

Reply
MisterP
October 4, 2015 6:45 am

Thx for sharing and kudos for choosing torrent.

Reply
- Krutik
  February 16, 2024 11:07 pm
  
  Thanks for sharing! Can i use it on postgres db?
  
  Reply
  - Brent Ozar
    February 18, 2024 1:34 pm
    
    No.
    
    Reply
Solomon Rutzky
October 4, 2015 11:07 am

Hi there. For those who want to play with the data on their system, but via the Stack Exchange Data Explorer UI / WebApp and not boring ‘ol SSMS, you can run that locally as well :). Just download the WebApp at:

https://github.com/StackExchange/StackExchange.DataExplorer

Reply
Dayton Brown
October 4, 2015 2:03 pm

So I’m curious…and in my googling around, I couldn’t find what I thought was a good answer. Why is the StackOverflow Database set up as case sensitive? It was actually a nice little learning opportunity for me as I didn’t even know you could make column names etc., case sensitive. Thanks.

Reply
- Solomon Rutzky
  October 4, 2015 3:34 pm
  
  Hi Dayton. That would be a good question to ask on: http://meta.stackexchange.com/ (be sure to tag it with [data-explorer] ).
  
  For now, here is a related question from there that doesn’t answer why case-sensitivity was chosen, but how to deal with it (be sure to read the comments as well 🙂
  
  http://meta.stackexchange.com/questions/119304/why-is-the-like-operator-case-sensitive-in-data-explorer
  
  Reply
- Brent Ozar
  October 4, 2015 3:52 pm
  
  Dayton – it’s a function of the database importer. You set the collation when you create a database. I use case sensitivity on all of my database servers because I want my scripts to work on everyone’s servers, and there’s a surprisingly large number of case-sensitive servers out there. Forcing my stuff to be case-sensitive from the start means I get less support calls on my scripts down the road.
  
  Reply
Dayton Brown
October 4, 2015 8:24 pm

Well that’s a very good answer then Brent. I’m curious though. How do you handle text searching. StackOverflow is super fast. But there is no way it would be that fast if you are UPPERING() or LOWERING() all of the text searches. That’s a quick way to defenestrate the sargability. Although, I have to admit, I’m a bit of a padawan when it comes to SQL. Side note, it would be pretty cool if you could include the actual indexes used (as a script) Thanks!

Reply
- Brent Ozar
  October 5, 2015 6:37 am
  
  Dayton – the text search is done in ElasticSearch.
  
  When you say the actual indexes, can you elaborate on what you mean?
  
  Reply
  - Dayton Brown
    October 5, 2015 7:32 am
    
    Thanks Brent. The actual indexes that are on the production DB. I’m guessing there are more than just a clustered index on each table. Something like a script that has all the create index statements.
    
    Reply
    - Brent Ozar
      October 5, 2015 8:00 am
      
      Dayton – ah, no, that wouldn’t be appropriate.
      
      Reply
      - Dayton Brown
        October 5, 2015 8:29 am
        
        Okey Dokey. Thanks for the db/torrent etc.
      - Luis de Santos
        April 6, 2017 4:37 pm
        
        I don’t understand how then we are to use this with the stress test Random_Q SP if none of the tables have indexes.
      - Brent Ozar
        April 6, 2017 4:45 pm
        
        Luis – that’s part of your job – to tune them by finding the right indexes during your load tests. (Hey, if I gave you a perfect database with perfect queries and perfect indexes, then you wouldn’t learn anything, hahaha.)
Denis Reznik
October 5, 2015 6:35 am

That’s amazing! Thank you, Brent, for sharing this!

Reply
- Brent Ozar
  October 5, 2015 6:37 am
  
  Denis – you’re welcome, glad I could help.
  
  Reply
Wyatt
October 5, 2015 8:21 am

Why is this a zipped MDF & LDF and not a BAK file? I would have thought a full database backup would have been a more natural way to distribute the data set than an unattached MDF/LDF.

Reply
- Brent Ozar
  October 5, 2015 8:50 am
  
  Wyatt – great question! Because in order to get the file size down, I would still have to compress the backup with 7Z. That means you would have to have enough space for the 7Z, the backup, and the MDF/LDF. The extraction time would also be longer, because it takes you a long time to restore a 70GB database. This way, you need less space (just the 7Z and MDF/LDF), and much less time (extract, then attach).
  
  Reply
pmpjr
October 5, 2015 9:46 am

I’m seeding this now, I’ll let it run as long as the wife doesn’t yell at me for the internet being slow.

Reply
- Brent Ozar
  October 5, 2015 11:55 am
  
  Hahaha, thanks!
  
  Reply
Krzysztof
October 6, 2015 4:08 am

I have following error during extract.
“Unsupported compression method error for the file…”
Does it mean that I need to download the file once again?

When I’m checking the file, the method is set to 21.
Listing archive: F:\StackOverflow201508.

Method = 21
Solid = –
Blocks = 3
Physical Size = 9240856556
Headers Size = 228
———-

Path = Readme.txt
Size = 1365
Packed Size = 788
Modified = 2015-09-27 16:06:56
Attributes = ….A
CRC = BC21062A
Encrypted = –
Method = 21
Block = 0

Path = StackOverflow.mdf
Size = 69038243840
Packed Size = 9206452306
Modified = 2015-09-27 15:59:26
Attributes = ….A
CRC = 7E857A08
Encrypted = –
Method = 21
Block = 1

Path = StackOverflow_log.ldf
Size = 524165120
Packed Size = 34403234
Modified = 2015-09-27 15:59:26
Attributes = ….A
CRC = 330FDAC2
Encrypted = –
Method = 21
Block = 2

Reply
- Brent Ozar
  October 6, 2015 6:45 am
  
  Krzysztof – sorry, I can’t troubleshoot that for you. You may want to try a different extraction tool as well.
  
  Reply
KingOfZeal
October 11, 2015 1:25 pm

Added to my 1 gigabit seedbox. I’ll have it up for at least a month, so it should help anyone who wants to get it.

Reply
- Brent Ozar
  October 12, 2015 3:07 pm
  
  Thanks sir!
  
  Reply
Bob
October 13, 2015 12:24 am

I noticed that the most recent users and posts in your data dump are from September 14th **2014**, not 2015. So this snapshot is over a year old, correct?

Reply
Jay Sanati
May 7, 2016 12:56 pm

There are some missing tables like CloseAsOffTopicReasonTypes, TagSynonyms . Any thoughts?

Reply
- Brent Ozar
  May 7, 2016 5:00 pm
  
  Jay – they’re not in the public data dump, right?
  
  Reply
  - Aaron
    October 5, 2019 6:25 pm
    
    @Brent Ozar.
    There are many missing tables especially linked/lookup tables.
    I have download 2019 from your website and hope to find missing tables but they do not exist.
    
    I have a script to create all tables but it would be nice if those table get filled with data as well.
    This is a link to my project. The idea is convert some interesting query to LINQ to EF.
    https://github.com/codesanook/CodeSanook.StackOverflowEFQuery/blob/master/create-a-database.sql
    
    Thank you so much.
    
    Reply
    - Brent Ozar
      October 5, 2019 6:26 pm
      
      @Aaron – the tables aren’t missing – they’re not given out by Stack Overflow. To see the tables that Stack Overflow makes public, click the related link in the post to see their original data export.
      
      Reply
      - Aaron
        October 5, 2019 8:53 pm
        
        @Brent Ozar
        Thank you so much for your reply. Thank you so much for your blog/article/videos. It is very helpful.
        I am thinking to join your class soon.
        BTW, way this link contains all StackOverflow tables (Schema only)
        https://github.com/codesanook/CodeSanook.StackOverflowEFQuery/blob/master/create-a-database.sql
        
        However, if we want to have some data in those tables, do you mean we need to work out by ourselves.
        
        My goal is to convert some insetting SQL query from “https://data.stackexchange.com/stackoverflow/queries” to LINQ to EF.
        I think it is very interesting to learn how create real world meaningful query and pretty complex LINQ to EF.
      - Brent Ozar
        October 5, 2019 8:54 pm
        
        OK, cool, good luck!
Davaya
July 2, 2016 7:12 am

Helloo,
Does this mean we have the source code of the website ? Or it is juste the database, what can we do with this database ? Thank you

Reply
- Brent Ozar
  July 2, 2016 7:15 am
  
  Just the database. If you don’t know what you can do with a database, then this isn’t really for you. Thanks!
  
  Reply
JAMES YOUKHANIS
July 20, 2016 11:01 am

I am trying to download the BiTtorrent file to follow you on the training videos, however its been trying to connect to peers over two hours? Can i also get the same file from https://archive.org/details/stackexchange.

Reply
- Brent Ozar
  July 21, 2016 11:47 am
  
  James – unfortunately we can’t troubleshoot that remotely. It’s working fine here though.
  
  Reply
shaul Bel
August 8, 2016 12:08 pm

I have downloaded the file and extracted it. But when I try to attach it I get It asks for full-text catalog to be added but there is no one.
How can I solve this?

Reply
- Brent Ozar
  August 8, 2016 12:33 pm
  
  Shaui – there isn’t one. You can skip that part.
  
  Reply
  - shaul Bel
    August 8, 2016 1:25 pm
    
    Thanks.
    My mistake.
    
    The real problem was database limitation because the instance was sqlexpress and the limit is 10 gig database.
    
    I am moving the files to another server.
    
    Reply
James Medlin
August 9, 2016 12:46 pm

There’s two different torrent files linked in the article. The top one is newer (going by the name).

Thanks for compiling and publishing this DB – it’s a great help for learning!

Reply
- Brent Ozar
  August 9, 2016 12:48 pm
  
  James – great catch! Fixed. Thanks!
  
  Reply
SDC
October 19, 2016 3:27 pm

Uh oh, doesn’t look like either of the torrent links works at this time.

Reply
- Brent Ozar
  October 19, 2016 3:34 pm
  
  Working fine here. Your office may be blocking BitTorrent.
  
  Reply
Lucas
December 20, 2016 1:13 am

Seems there aren’t enough seeds? FrostWire is telling me the download will be completely in an infinite number of days, hours, minutes and seconds from now!! I’ve managed to get 57.7KB so far…..

Reply
- Lucas
  December 20, 2016 1:13 am
  
  completely = completed
  
  Reply
Kelly
January 27, 2017 1:14 pm

Still pointing to the old (201603) torrent

Reply
- Kelly
  January 27, 2017 1:14 pm
  
  in the download instructions that is
  
  Reply
  - Erik Darling
    January 27, 2017 1:33 pm
    
    Thanks, fixed.
    
    Reply
M
January 31, 2017 9:07 am

Thanks Brent Ozar team! This is helpful as always. Do you have any recommendations for free sample databases that are a bit smaller, such as 10-20 GB? I like the Stack dump, but it’s now at the point where it’s surpassing the size of many laptop SSDs, which is where I’d like to mess around with it.

Reply
- Brent Ozar
  February 11, 2017 6:28 am
  
  M – you can buy a 1TB laptop SSD for under $250:
  http://amzn.to/2l0bMNz
  
  Reply
wqw
February 1, 2017 1:55 am

Here is a direct download link: http://ovh.to/D4JmSb6 (hosted on Hubic in France)

Reply
- Brent Ozar
  February 1, 2017 7:00 am
  
  WQW – awesome, thanks! I’ve added that to the post.
  
  Reply
Teach Yourself SQL Server Performance Tuning (Dear SQL DBA Episode 12) - by Kendra Little
February 21, 2017 12:46 pm

[…] StackOverflow sample database, shared on BitTorrent by Brent Ozar […]

Reply
Christian
March 24, 2017 6:29 pm

Is it possible to use the stackoverflow database in SQL Server 2016?

Reply
- Erik Darling
  March 24, 2017 7:48 pm
  
  Yep, 2005 and up.
  
  Reply
Kannan GS
August 26, 2017 4:50 am

Hey Brent,
We enjoy and learn more with the StackOverFlow DB. Thanks for the great work.. Post downloading we rebuilt the clustered index as Clustered Columstore store and size was reduced by around 30%. Would it possible to add in the future torrents release? which would save us a lot of space..

Reply
- Brent Ozar
  August 26, 2017 5:19 am
  
  Kannan – glad you like it. No, not all versions support columnstore indexes.
  
  Reply
Lukasz
August 30, 2017 3:58 am

Why there are only 9 tables? Did I set up the database in a wrong way?
The tables I can see are:
PostTypes
Votes
Users
Posts
Comments
Badges
PostLinks
LinkTypes
VoteTypes

Reply
- Brent Ozar
  August 30, 2017 4:42 am
  
  Yep. What other tables were you expecting to see, just curious?
  
  Reply
  - Lukasz
    August 30, 2017 5:34 am
    
    The original database on Data.StackExchange has many more tables. One of them is the dbo.Tags, which is used in one of your first lessons. I wanted to use these lessons to perform live SQL training for my colleagues.
    I was expecting to see all of them.
    
    I might reconsider using the AdventureWorks database (or the new MS training db?)
    I might use the SE database for some query tuning training or something like that.
    
    Reply
    - Brent Ozar
      August 30, 2017 5:49 am
      
      Hmmm, which lesson? I don’t remember using an existing dbo.Tags table (although I know in one of my lessons, I have you CREATE one.) Can you point me to it? I want to make sure I get that fixed. Thanks!
      
      Reply
    - Brent Ozar
      August 30, 2017 5:49 am
      
      Also, to be clear – the Data.StackExchange.com database is a backup of production, whereas the process to create the public data dump (XML) is a little different, and has never contained all of the production tables/columns.
      
      Reply
      - Lukasz
        August 30, 2017 7:08 am
        
        Brent, it’s this lesson:
        https://www.brentozar.com/learn-query-sql-server-stackoverflow-database/learn-query-part-1-getting-data-select/from-getting-data-table/
      - Brent Ozar
        August 30, 2017 7:13 am
        
        Gotcha. That class is specifically designed for folks using Data.StackExchange.com, not the Stack Overflow data dump. (You’ll notice that the instructions all focus on Data.StackExchange.com.) That’s different from the data dump.
Jakob Bindslet
November 9, 2017 2:57 pm

The StackExchange is an excellent DBA “toy”. Thank you for providing an easy way to access it Brent!
One question that I am curious about: how do you import the XML files into your SQL Server?

Reply
- Brent Ozar
  November 9, 2017 3:47 pm
  
  Jakob – we use the Stack Overflow Data Dump Importer: https://github.com/BrentOzarULTD/soddi
  
  Reply
saad
April 3, 2018 6:33 am

are these datasets contains the question? and tags assigned to them or not?

Reply
- Brent Ozar
  April 3, 2018 6:39 am
  
  Yep!
  
  Reply
BACKUP DATABASE testDB TO DISK='NUL:'
April 27, 2018 11:55 am

[…] database on your machine to play around, check out Brent Ozar’s instructions on how to do it here). My laptop is running Crucial MX200 1TB SSD. My throughput is 490.491 MB / sec (see snapshot […]

Reply
Gert Hauan
May 6, 2018 7:12 am

Hi
Thx for this, just downloaded and attached to a SQL Server 2017 instance.
I see that non-clustered-indexes has been removed, but does that also go for foreign key constraints? Or are there none of those in the prod-DB?

Just wanted to mention: when I attached the mdf-file (the first one, there were 3 .mdf-files in the download), it complained if the log-file from the download already was in the log-file-folder. Seems it wanted to create it.

Reply
- Brent Ozar
  May 7, 2018 6:10 am
  
  Gert – the data dump isn’t a direct backup of Stack Overflow’s production database. They export the data to XML, and then we import it into SQL Server format. The tables aren’t necessarily identical in structure to Stack’s live schema – it’s very highly similar, but not identical.
  
  Sounds like you didn’t quite follow the process of normal database attachment, but that’s totally okay. Have fun with it!
  
  Reply
  - Sven Bleckwedel
    June 13, 2023 4:09 pm
    
    Hello for ALL,
    
    Thanks for offering this data to be used for trying to execute some testings and I just downloaded and wanted to attach to a SQL Server 2019 Standard instance.
    
    ”
    54GB torrent (magnet.) Expands to a ~401GB SQL Server 2016 database
    
    After you download it, extract the .7Zip files with 7Zip.
    
    Then, attach the database. It’s in Microsoft SQL Server 2016 format (2008 for the older torrents), so you can attach it to any 2016 or newer instance. It doesn’t use any Enterprise Edition features like partitioning or compression, so you can attach it to Developer, Standard, or Enterprise Edition. If your SSMS crashes or throws permissions errors, you likely tried extracting the archive directly into the database directory, and you’ve got permissions problems on the data/log files.
    ”
    
    I’ve done all these steps with with the StackOverflow2010 database and when attached it to an Express Edition, worked fine.
    
    But, when trying to attach the largest sample in SQL Server 2019 Standard instance, the process hanged (apparently) undefinitely…
    
    Could anyone share how much time should I wait to this process to complete ?
    
    Any advice should be greatly appreciated, of course too.
    
    Best Regards,
    Sven
    
    Reply
    - Brent Ozar
      June 13, 2023 7:30 pm
      
      When you’ve got a slow or unresponsive query, you can check out tools like sp_BlitzWho or sp_WhoIsActive to learn what the query’s waiting on. I teach you how to do that in my How I Use the First Responder Kit training class.
      
      Reply
SQL Server / Query Optimization / Merge Join Operator / Sort Operator | SQL Server Blog
May 29, 2018 7:10 pm

[…] How to Download the Stack Overflow Database via BitTorrent […]

Reply
SQL Server / Query Optimization / TempDB Page Splits / Table Pool (Lazy Spool) | SQL Server Blog
July 24, 2018 5:19 pm

[…] How to Download the Stack Overflow Database via BitTorrent […]

Reply
Backup Full Diário, sua melhor opção? – Edvaldo Castro
July 31, 2018 5:38 pm

[…] os exemplos, utilizarei o banco de dados StackOverflow2010, clique aqui para mais informações e download desse banco de dados de […]

Reply
Ways To Check For Non-Existence – Curated SQL
August 10, 2018 7:01 am

[…] using the free Stack Overflow database, and I wanna find all of the users who have not left a comment. The tables involved […]

Reply
benedikt
September 11, 2018 4:51 am

Thank u for this great Information

Reply
NYC Yellow Cab Data in Azure SQL Data Warehouse - SQL Hammer | SQL Hammer
October 22, 2018 8:02 am

[…] was using the copy of StackOverflow’s database which Brent Ozar maintains as a BitTorrent, here. Once I spent all the time to download the database and then export it in a format which is easy to […]

Reply
Behold The Power of Dynamic SQL- Part 1 – Cyndi Johnson
November 26, 2018 7:41 pm

[…] It’s a bit painful to get in database form, please see Brent Ozar’s blog for instructions […]

Reply
One reason for using the Estimated Execution Plan – Art's DBA Blog
December 11, 2018 11:06 am

[…] (Download StackOverflow2010 here) […]

Reply
JUAN MUNOZ
January 31, 2019 12:42 pm

Thank you Brent… Firewall at work blocks all links and it makes it look like they are broken.

Reply
IEnumerable m? IQueryable m? ? - Cengizhan BAKIR - Kodlama sahnesindeki çakma assolist
February 20, 2019 12:33 pm

[…] ?uradan Stack Overflow’un 2008-2010 y?llar?na ait veritab?n?n? indirip, .mdf uzant?l? dosyay?, Attach i?lemi ile SQL Server içerisinde olu?turuyorum. […]

Reply
Ofir Assif
March 3, 2019 4:35 pm

Hi!
Is there a way to connect this sql server to some sort of web-gui that can be set up on-premise, for an offline stackoverflow.com alternative?

Reply
- Brent Ozar
  March 3, 2019 4:57 pm
  
  Yep! That GUI is SQL Server Management Studio or Azure Data Studio, and you can run whatever queries you like against the database to get your answers while you’re offline.
  
  Reply
SQL Server – StackOverflow Sample Database - DBTUTS
March 20, 2019 8:50 pm

[…] Download | Visit StackOverflow Page […]

Reply
Compliant database development using Redgate SQL Provision Part 2 – JUST:BI
April 2, 2019 4:00 am

[…] The Stack Overflow database is periodically published as an XML data dump and Brent Ozar uses it as part of his performance tuning courses and so being the super helpful guy he is carves it up into several different sized backup files. What I find most helpful about this, is given that the structure is consistent it means I can test stuff out at speed against one of the smaller databases before unleashing it on one of the larger datasets. There are tonnes of versions available, but the ones I have used for this test can be found on this page. […]

Reply
Window Functions vs GROUP BYs - SQL with Bert
April 16, 2019 4:01 am

[…] be using the StackOverflow 2014 data dump for these examples if you want to play along at […]

Reply
Correlated Subqueries vs Derived Tables - SQL with Bert
April 23, 2019 4:01 am

[…] my series to document ways of refactoring queries for improved performance. I’ll be using the StackOverflow 2014 data dump for these examples if you want to play along at […]

Reply
IN vs UNION ALL : Which is better for performance? - SQL with Bert
April 30, 2019 4:00 am

[…] my series to document ways of refactoring queries for improved performance. I’ll be using the StackOverflow 2014 data dump for these examples if you want to play along at […]

Reply
Temporary Staging Tables - Divide and Conquer Querying - SQL with Bert
May 7, 2019 4:02 am

[…] to document ways of refactoring queries for improved performance. I’ll be using the StackOverflow 2014 data dump for these examples if you want to play along at […]

Reply
Ildephonse Gasongo
June 2, 2019 11:38 am

I am running into a virus issue when trying to download the qbittorrent. Any help on this?
Sorry, this file is infected with a virus

Only the owner is allowed to download infected files.

Reply
- Brent Ozar
  June 2, 2019 3:34 pm
  
  Sure, your antivirus client may not like that torrent app, and you may need to try a different one.
  
  Reply
RemingtonBlake
June 29, 2019 6:39 pm

Anyone still seeding the big boy (40GBs)? Want to get the full meal deal but no peers. 🙁

Reply
- Brent Ozar
  June 29, 2019 6:41 pm
  
  Yes, I’ve got a few seed boxes – you may just be firewalled off.
  
  Reply
  - brettoconn
    June 30, 2019 4:46 pm
    
    Switched to a different torrent client. Working now thanks!
    
    Reply
- Jakob Bindslet
  June 30, 2019 3:05 am
  
  I am seeing the same thing right now. Can you download one of the smaller (older) version through torrent as a way of testing for firewalls and other limitations?
  
  Reply
Setting a Custom Variable in an Azure DevOps Pipeline with PowerShell - by Kendra Little
September 2, 2019 8:01 am

[…] a lightweight clone of the “production” database (I’m using a copy of StackOverflow, thanks Brent & the folks at […]

Reply
Setting a Custom Variable in an Azure DevOps Pipeline with PowerShell | My Blog
September 3, 2019 1:06 pm

[…] a lightweight clone of the “production” database (I’m using a copy of StackOverflow, thanks Brent & the folks at […]

Reply
willemhenk
October 14, 2019 12:19 pm

It seems like the 350GB DB is only 92 GB? when decompressed? Or am I doing something wrong?

Reply
- Brent Ozar
  October 14, 2019 12:19 pm
  
  Sounds like you’re doing something wrong.
  
  Reply
  - Henk
    October 14, 2019 12:25 pm
    
    Thanks for the quick reply. I was already in the process of hitting myself.
    
    Reply
willemhenk
October 14, 2019 12:20 pm

eh… iam doing something wrong here…… so nevermind… i will start hitting myself

Reply
Debezium Introduction: Another Change Data Capture Tool | Hacker Noon – 3D Printing Technology
July 12, 2020 4:01 pm

[…] test this out, I use the Stack Overflow data (~10GB) provided by Brent Ozar with a simple setup as […]

Reply
Debezium ????????: ??? ???? ?????????? ????? ?????? ?? ?????????? | ????? ??????? — Kudago31.ru ?????????????? ???????
July 12, 2020 4:37 pm

[…] ?????? ???????????? ????? (~ 10 ??), ??????????????? ????? ???? ? ??????? ?????????? ????????? […]

Reply
Debezium Introduction: Another Change Data Capture Tool | Hacker Noon - Coiner Blog
July 12, 2020 9:03 pm

[…] test this out, I use the Stack Overflow data (~10GB) provided by Brent Ozar with a simple setup as […]

Reply
Debezium Introduction: Another Change Data Capture Tool - Disha Technology
July 13, 2020 12:05 am

[…] test this out, I use the Stack Overflow data (~10GB) provided by Brent Ozar with a simple setup as […]

Reply
SQL Server - Hashing an Email Address - Koderly
July 15, 2020 5:32 am

[…] version of the StackOverflow2013 database, which contains 2.3million user records. You can get from here if you want to try this out […]

Reply
Does the Order of Columns in an Index Matter? – VitalFew Blog
August 2, 2020 11:10 pm

[…] used the sample Stack Overflow database (50 GB) for the […]

Reply
mike.g.williams
October 29, 2020 7:18 am

Is it necessary to use 7-zip to unpack the SO databases? Will WinZip work, or is the compression/de-compression specific to 7-zip only? I ask because the 7-zip.org site is blocked by our url filters because its been identified as having malware, spyware, or phishing.

Reply
- Brent Ozar
  October 29, 2020 8:00 am
  
  I’m not familiar with whether WinZip extracts 7z files, sorry.
  
  Reply
- mike.g.williams
  October 29, 2020 8:01 am
  
  WinZip Pro worked.
  
  Reply
loretta.anabui
November 20, 2020 7:41 am

I am having issues trying to download the Stack Overflow Database

Reply
- Brent Ozar
  November 20, 2020 8:09 am
  
  Okay, can you be more specific, like with error messages?
  
  Reply
  - loren
    November 23, 2020 8:36 am
    
    Nevermind I got it figure out.Thanks
    
    Reply
loren
November 23, 2020 11:36 am

Brent thought I got it figured out ,I am able to download the stack overflow Database but I can’t unzip it and attach to my management studio, Is it possible to get a BAK file?

Reply
- Brent Ozar
  November 23, 2020 11:37 am
  
  Loren – if you have a Live Class Season Pass, you can go to the class prerequisites page for Mastering Index Tuning, follow the instructions there, and get a bak file.
  
  Reply
Bailey
March 3, 2021 4:40 pm

Just a quick heads up. I downloaded the training torrent (specialized copy as of 2018), but instead of 7z files, they are bak files. SSMS is not able to read them to restore, and changing the extension to 7z doesn’t work as 7zip sees it as an invalid archive.

Reply
- Brent Ozar
  March 3, 2021 5:19 pm
  
  You may have a bad download – I downloaded them yesterday and they worked fine. They are indeed backup files and you need to restore all four as a striped set as described in the class instructions.
  
  Reply
FD
March 8, 2021 2:07 pm

Brent, thanks for all of this. I need a large sample database to test a DB interface that I am building, however the important specification for me is the number of records. I am looking for a “Master” table with 100,000 or more records. Can you suggest which database I should use?

Thanks.

Reply
- Brent Ozar
  March 8, 2021 2:19 pm
  
  Any of them will work, even the 10GB one. 100K isn’t a lot of rows.
  
  Reply
  - FD
    March 8, 2021 2:22 pm
    
    Fantastic. Thank you for the quick reply!
    
    Reply
D.C.
April 6, 2021 2:10 pm

Sorry for that, but get a certificate warning when trying to download torrents, too.

Reply
- Brent Ozar
  April 6, 2021 2:13 pm
  
  Don’t apologize! My bad there – I reshuffled a bunch of the S3 links this weekend. Give ‘er a shot again now. Thanks!
  
  Reply
Viorel Ciucu
April 6, 2021 9:04 pm

Hi Brent,

My name is Viorel and I’m currently writing a blog post/script to help people quickly mount a copy of StackOverflow database (small and medium only) on an existing instance.
The script will download the archives from the links you provide here and it will extract and attach the database files.
I wanted to reach out and ask if it’s ok with you if I use the direct links from your domain.

Thank you!

Reply
- Brent Ozar
  April 6, 2021 9:08 pm
  
  Hi. Thanks for checking – those links will actually change in the coming month. You can post links to the blog post, but please don’t rely on direct links to the files. Thanks!
  
  Reply
Alex
May 14, 2021 6:52 pm

Can I attach this database to SQL Server 2019?

Reply
- Brent Ozar
  May 14, 2021 7:04 pm
  
  It’s in Microsoft SQL Server 2008 format (2005 for the older torrents), so you can attach it to any 2008 or newer instance.
  
  Reply
za
January 18, 2022 7:48 am

Can I have the data of how many answers a user have answered in the databese? I could not have the information in the given database.

Reply
- Brent Ozar
  January 18, 2022 7:52 am
  
  Sorry, I don’t do queries as a service. You’ll need to download the database as instructed in the post. Cheers!
  
  Reply
  - za
    January 19, 2022 7:30 am
    
    Thank you all the same!Cheers!
    
    Reply
Sql Dude
February 10, 2022 4:10 am

Is this open to holders of the live class season pass?

Reply
- Brent Ozar
  February 10, 2022 11:08 am
  
  Yes, the database is open to everyone.
  
  Reply
Kevin
February 10, 2022 10:49 pm

Do you recommend any other database besides MS Sql ? Maybe PostgreSql ? I don’t have SQL Server hence I ask.

Reply
- Brent Ozar
  February 10, 2022 10:50 pm
  
  no
  
  Reply
  - kevin
    February 10, 2022 11:14 pm
    
    Thank you for sharing. Much appreciated!
    
    Reply
Chris
March 19, 2022 4:29 pm

Thank you for making this available. Once my copy finishes downloading I plan to attempt to exapnd then repackage it applying a segment of compression that interests me called deduplication. If successful will share the results with you. Instead of distributing large zip archives that still need to be extracted the deduplication feature that is part of Windows server 2019 works really well on certain datasets. In the end you are left with a compact deduped/compressed vhd file powered by ntfs internally. No extraction required, just mount on win2019 and use.

Reply
- Brent Ozar
  March 19, 2022 10:32 pm
  
  I’m going to hold off, but feel free to share your own version as long as you keep the license intact.
  
  Reply
  - Chris
    March 22, 2022 2:36 pm
    
    Hi Brent, Out of curiosity I did end up performing the test using the deduplication engine. Although I am not exactly sure why, I wrote up the results up which can be found at https://version12.com/beta/Portable Data Deduplication.pdf
    
    Please let me know if my references to you or the dataset license are inappropriate.
    
    Reply
    - Brent Ozar
      March 22, 2022 2:38 pm
      
      Cool, thanks for the heads up! Just FYI in regards to the findings – what most people do with the dataset is that they rapidly create & change indexes on it during the training classes. It’s not something that’s rarely accessed – it’s *heavily* accessed (and modified) during the classes.
      
      Reply
    - Chris
      March 22, 2022 2:38 pm
      
      err url correction https://version12.com/beta/PortableDataDeduplication.pdf
      
      Reply
Henry
May 18, 2022 9:20 pm

Hi,
Do you have a database for this year ?

Reply
- Brent Ozar
  May 18, 2022 9:22 pm
  
  No.
  
  Reply
  - Henry
    May 18, 2022 10:54 pm
    
    Thanks
    
    Reply
Koen Verbeeck
August 2, 2022 10:31 am

If you’re having issues attaching the database using SSMS, try running SSMS as an admin.

Reply
jaden.hoch
March 5, 2023 4:45 pm

I can not restore this databases with SQL SERVER 2022 and SSMS 19.

Get an error.

Reply
- Brent Ozar
  March 5, 2023 4:55 pm
  
  an error, huh
  
  dang
  
  it’s a shame errors didn’t have messages that were more specific, telling you what happened
  
  maybe someday someone will invent something like that
  
  until then, guess we’re just screwed, huh
  
  Reply
J.A
March 13, 2023 4:02 pm

I have downloaded the following database “Small: 10GB database as of 2010” and there is a problem. The problem is that in this database the schema is different and tables are missing to the schema which is explained here “This Meta.SE post explains the database schema” can you please help me regarding this problem? thanks!

Reply
- Brent Ozar
  March 13, 2023 4:03 pm
  
  Correct, it’s a smaller version. If you want all of the tables, download one of the larger ones.
  
  Reply
  - J.A
    March 14, 2023 12:51 pm
    
    Ok, nice thank you. How I can delete the data that the larger version with “50GB” is after deleting rows only about 1-5GB. Please help me, thanks
    
    Reply
    - Brent Ozar
      March 14, 2023 12:52 pm
      
      Personalized query support is beyond what I can do for free in the comments here. Hope that’s fair.
      
      Reply
      - JA
        March 14, 2023 1:33 pm
        
        Ok I understand but i just asked if this is possible. Don’t need the query.
        
        Another question in which version are all tables ? In the medium, large ??
Ant
March 14, 2023 1:46 pm

Which version of the database has the same tables and schemas which is explained here “This Meta.SE post explains the database schema” can you please help me, thanks!

Reply
Christy
November 15, 2023 7:21 am

Good day. I tried to see if my question was already asked recently and did not find it. So I apologize if this was already asked and answered. I opted to download the database version referenced in bullet “For my training classes: specialized copy as of 2018/06” as I’m taking your Fundamentals and Mastering courses. Once I finished extracting the files with a torrent app, I see a ReadMe.txt file and 4 .bak files for the training version. I do not see a mdf, ndf, or ldf files you reference. So should I do a restore instead of attach to my SQL Server 2017 instance? Or should I just go with the file under the bullet “Large: current 430GB database as of 2022-06”? I see that has 5 7zip files in it unlike the training version. Thank you for your help. Have a great day.

Reply
- Christy
  November 15, 2023 7:24 am
  
  Disregard. Of course, after I post, I find that someone asked a similar question. I apologize. Have a fabulous day!
  
  Reply
JL
December 7, 2023 3:51 pm

Hi, thank you for your kind service — it’s very helpful to the community. I’m wondering if you by any chance have more recent data, especially after ChatGPT was released? I’m interested to know how SO was impacted. Thank you!

Reply
- Brent Ozar
  December 7, 2023 3:53 pm
  
  No, but you can import a more recent version yourself: https://www.brentozar.com/archive/2017/11/updated-stack-overflow-database-sql-server/
  
  Reply
Restaurar sólo una tabla en SQL Server - SoyDBA
December 14, 2023 7:31 pm

[…] las pruebas voy a usar la base de datos StackOverflow2013. Esta es una base de datos de ejemplo, pero podéis usar cualquier base de datos de pruebas que […]

Reply
Query Exercise: Find Foreign Key Problems - Brent Ozar Unlimited®
January 4, 2024 1:16 pm

[…] going to use the Stack Overflow database, and we’ll focus on these 3 […]

Reply
Query Exercise: Find the Best Time for Maintenance - Brent Ozar Unlimited®
January 18, 2024 1:16 pm

[…] open the Stack Overflow database, and for the sake of simplicity for this […]

Reply
How to Provision a Low-Cost SQL 2022 Testing Lab - SQL Server Consulting - Straight Path Solutions
January 18, 2024 5:48 pm

[…] Overflow 10GB – https://www.brentozar.com/archive/2015/10/how-to-download-the-stack-overflow-database-via-bittorrent… (I used the ‘10GB Direct Download’) – this one will need to be unzipped, my machine took […]

Reply
Query Exercise: Improving Cardinality Estimation - Brent Ozar Unlimited®
February 8, 2024 1:16 pm

[…] can test it with any version of the Stack Overflow database. To test it, we’ll turn on a couple of tuning […]

Reply
Query Exercise: Finding Long Values Faster - Brent Ozar Unlimited®
February 15, 2024 1:16 pm

[…] Our developers have come to us with a problem query that isn’t as fast as they’d like. Using any Stack Overflow database: […]

Reply
Query Exercise: Find Recent Superstars - Brent Ozar Unlimited®
February 22, 2024 1:15 pm

[…] this week’s Query Exercise, we’re working with the Stack Overflow database, and our business users have asked us to find the new superstars. They’re looking for the top […]

Reply