In my never-ending attempts to distract you from doing real work, I give you something you have absolutely no use for: a SQL Server database backup with about 100k tweets from people I’ve followed over the last couple of months.
I use Tweet-SQL to cache and analyze a lot of things from Twitter. This database isn’t the actual one I use, but it’s just an export of a subset of tables:
- Users – the tweeps. The “id” field is Twitter’s internal number for you, not my own – comes from their API. The cached_* and subscription_* fields are my own, not Twitter’s.
- UsersHistory – whenever I fetch results from the Twitter API and someone’s information has changed, I store the old version of their profile in this table. Typically, the field that’s changing is their followers_count. The “id” field is my own identity number, not from Twitter’s API.
- Statuses – the tweets (and yes, Twitter calls them Statuses). The “id” field is from Twitter’s API.
This will give you the most loudmouthed tweeps:
SELECT s.user_id, u.screen_name, COUNT(*) AS tweets FROM dbo.Statuses s INNER JOIN dbo.Users u ON s.user_id = u.id GROUP BY s.user_id, u.screen_name ORDER BY COUNT(*) DESC
And this query gives you the hours when people tweet the most (in Central time):
SELECT DATEPART(hh, created_at) AS TweetHour, COUNT(*) AS RECS FROM dbo.Statuses GROUP BY DATEPART(hh, created_at) ORDER BY COUNT(*) DESC
Things to Know About the Data
There’s some holes in the data when my server bombed or the Twitter API didn’t return data correctly, and unfortunately, a lot of those holes are around the PASS Summit. I wanted to refetch that data before giving you this database, but I’m running out of time and I’ve got other things on my plate, so I figured I’d just let this loose as is.
The database doesn’t include people with protected tweets, and it only includes things I’d see on my home page. If someone mentioned me but I’m not following them, you won’t see it in this database export.
You can download the SQL Server database backup and restore it onto a SQL 2005 (or newer) server. If you find anything interesting in the backup, post it here in the comments. I’d love to see what you find! And of course, I’d highly recommend Tweet-SQL – it’s a fun little tool if you’d like to analyze Twitter data like who’s following who, who gets retweeted the most, or what you’re missing when you’re gone.