Twitter historical database of my tweeps

19 Comments

In my never-ending attempts to distract you from doing real work, I give you something you have absolutely no use for: a SQL Server database backup with about 100k tweets from people I’ve followed over the last couple of months.

I use Tweet-SQL to cache and analyze a lot of things from Twitter.  This database isn’t the actual one I use, but it’s just an export of a subset of tables:

  • Users – the tweeps.  The “id” field is Twitter’s internal number for you, not my own – comes from their API.  The cached_* and subscription_* fields are my own, not Twitter’s.
  • UsersHistory – whenever I fetch results from the Twitter API and someone’s information has changed, I store the old version of their profile in this table.  Typically, the field that’s changing is their followers_count.  The “id” field is my own identity number, not from Twitter’s API.
  • Statuses – the tweets (and yes, Twitter calls them Statuses).  The “id” field is from Twitter’s API.

Sample Queries

This will give you the most loudmouthed tweeps:

Resulting in:

Top 10 Loudmouths
Top 10 Loudmouths

And this query gives you the hours when people tweet the most (in Central time):

Resulting in:

Lively Times of Day
Lively Times of Day

Things to Know About the Data

There’s some holes in the data when my server bombed or the Twitter API didn’t return data correctly, and unfortunately, a lot of those holes are around the PASS Summit.  I wanted to refetch that data before giving you this database, but I’m running out of time and I’ve got other things on my plate, so I figured I’d just let this loose as is.

The database doesn’t include people with protected tweets, and it only includes things I’d see on my home page.  If someone mentioned me but I’m not following them, you won’t see it in this database export.

You can download the SQL Server database backup and restore it onto a SQL 2005 (or newer) server.  If you find anything interesting in the backup, post it here in the comments.  I’d love to see what you find!  And of course, I’d highly recommend Tweet-SQL – it’s a fun little tool if you’d like to analyze Twitter data like who’s following who, who gets retweeted the most, or what you’re missing when you’re gone.

Previous Post
How I Use GTD 50,000 Foot Goals
Next Post
Knowing the Relative Value of Databases

19 Comments. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.