New York City: The Data That Never Sleeps

SQL Server

I love living in the city

Blog posts about people’s favorite data sets seem to be popular these days, so I’m throwing my hat in the ring.

NYC has been collecting all sorts of data from all sorts of sources. There’s some really interesting stuff in here.

Another personal favorite of mine is MTA turnstile data. If you’re a developer looking to hone your ETL skills, this is a great dataset, because it’s kind of a mess. I actually had to use PowerShell to fix inconsistencies with the older text files, which I’m still recovering from. I won’t spoil all the surprises for you.

Of course, there’s Stack Overflow.

You can’t go wrong with data from either of these sources. They’re pretty big. The main problem I have with Adventure Works is that it’s a really small database. It really doesn’t mimic the large databases that people deal with in the real world, unless you do some work run a script to make it bigger. The other problem with Adventure Works is that it went out of business a decade ago because no one wanted to buy yellow bikes. I’ve been learning a bit about Oracle, and their sample data sets are even smaller. If anyone knows of better ones, leave a comment.

Anyway, get downloading! Just don’t ask me about SSIS imports. I still haven’t opened it.

Thanks for reading!

Previous Post
When Shrinking Tempdb Just Won’t Shrink
Next Post
Stored Procedure Cached Time vs SQL Statement Cached Time

4 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.