Twitter historical database of my tweeps

In my never-ending attempts to distract you from doing real work, I give you something you have absolutely no use for: a SQL Server database backup with about 100k tweets from people I’ve followed over the last couple of months.

I use Tweet-SQL to cache and analyze a lot of things from Twitter.  This database isn’t the actual one I use, but it’s just an export of a subset of tables:

  • Users – the tweeps.  The “id” field is Twitter’s internal number for you, not my own – comes from their API.  The cached_* and subscription_* fields are my own, not Twitter’s.
  • UsersHistory – whenever I fetch results from the Twitter API and someone’s information has changed, I store the old version of their profile in this table.  Typically, the field that’s changing is their followers_count.  The “id” field is my own identity number, not from Twitter’s API.
  • Statuses – the tweets (and yes, Twitter calls them Statuses).  The “id” field is from Twitter’s API.

Sample Queries

This will give you the most loudmouthed tweeps:

SELECT s.USER_ID, u.screen_name, COUNT(*) AS tweets
FROM dbo.Statuses s
INNER JOIN dbo.Users u ON s.USER_ID = u.id
GROUP BY s.USER_ID, u.screen_name
ORDER BY COUNT(*) DESC

Resulting in:

Top 10 Loudmouths

Top 10 Loudmouths

And this query gives you the hours when people tweet the most (in Central time):

SELECT DATEPART(hh, created_at) AS TweetHour, COUNT(*) AS RECS
FROM dbo.Statuses
GROUP BY DATEPART(hh, created_at)
ORDER BY COUNT(*) DESC

Resulting in:

Lively Times of Day

Lively Times of Day

Things to Know About the Data

There’s some holes in the data when my server bombed or the Twitter API didn’t return data correctly, and unfortunately, a lot of those holes are around the PASS Summit.  I wanted to refetch that data before giving you this database, but I’m running out of time and I’ve got other things on my plate, so I figured I’d just let this loose as is.

The database doesn’t include people with protected tweets, and it only includes things I’d see on my home page.  If someone mentioned me but I’m not following them, you won’t see it in this database export.

You can download the SQL Server database backup and restore it onto a SQL 2005 (or newer) server.  If you find anything interesting in the backup, post it here in the comments.  I’d love to see what you find!  And of course, I’d highly recommend Tweet-SQL – it’s a fun little tool if you’d like to analyze Twitter data like who’s following who, who gets retweeted the most, or what you’re missing when you’re gone.

19 Responses to Twitter historical database of my tweeps
  1. David Stein
    December 3, 2009 | 8:22 AM

    I didn’t need data mining to guess that Jorge would be at the top of the list. :)

  2. Jorge Segarra
    December 3, 2009 | 8:37 AM

    Brent, should it concern us that you, myself and mrdenny are sandwiched between two bot accounts?

    • Brent Ozar
      December 3, 2009 | 8:39 AM

      HAHAHA, no, @FancyFembot isn’t a bot. She’s a scifi fan in Houston that I ran into in my coworking days.

      • Jorge Segarra
        December 3, 2009 | 8:49 AM

        lol ok that makes me feel a LITTLE better…but still, we’re outtweeting bots. Am I a bot? AM I IN TEH MATRIX??!?

  3. David Stein
    December 3, 2009 | 8:51 AM

    Jorge is Borg. Prepare to be assimilated.

  4. Jen McCown
    December 3, 2009 | 10:05 AM

    Cool, I’m in the top 50, even without SQLPASS!

    I was really disappointed to see PASS wasn’t in there :( I was looking for something specific. If you happen to get the more fuller-est data together, reply in comments or email pretty please.

    Oh, and you inspired me to write a beginner level RESTORE WITH MOVE blog: http://midnightdba.itbookworm.com/midnightdba/blog/post/Restore-Database-WITH-MOVE—A-Review.aspx
    Smoothies,
    Jen

    • Brent Ozar
      December 3, 2009 | 10:59 AM

      Yeah, I’ll definitely keep trying to capture the historical data. The problem is Twitter’s API – it keeps throwing problems about rate limits even for unlimited accounts. I can’t fetch the old data fast enough. (sigh)

  5. Jamie Thomson
    December 3, 2009 | 10:52 AM

    Hey Brent,
    How about sticking it up on SQL Azure and handing out a read-only username/password for us all to use before Microsoft start charging for using the service!!

    If you don’t want to let me know and I might just do it myself!
    -Jamie

  6. K. Brian Kelley
    December 3, 2009 | 10:58 AM

    I made the top 10? Seriously? Wow. At least I’m not Jorge.

  7. André Kamman
    December 3, 2009 | 5:58 PM

    First time connecting to SQL Azure (Thanks Jamie !), impressed by the speed….
    Looks like Tweetdeck is by far the source most used (in this data set anyway)

    SELECT TOP 20
    COUNT(*) #,
    CASE
    WHEN RIGHT(s.source, 4) = ‘‘
    THEN SUBSTRING(source, CHARINDEX(‘nofollow’, source) + 10, LEN(source) – (CHARINDEX(‘nofollow’, source) + 13))
    ELSE source
    END source
    FROM statuses s
    GROUP BY source
    ORDER BY # DESC

    • André Kamman
      December 3, 2009 | 6:06 PM

      hmm, that query won’t work because I can’t post the html : WHEN RIGHT(s.source,4) = ”
      should have this between the quotes : less than / a greater than
      (without spaces)

  8. Andy Leonard
    January 24, 2010 | 8:48 PM

    Hi Brent,

    Interesting database! I ran this to determine who created Twitter accounts when:

    SELECT distinct s.User_Id, u.screen_name
    FROM dbo.Statuses s
    INNER JOIN dbo.Users u ON s.USER_ID = u.id
    order by s.User_Id

    :{> Andy

    • Brent Ozar
      January 25, 2010 | 8:39 AM

      Well, kinda sorta – that also lines up with when I started following them. They may have been tweeting before. But there’s an app for that:

      http://www.whendidyoujointwitter.com/

      That’s one of those fun chest-beating things. “I joined before you, nyah nyah nyah.” ;-)

Trackbacks/Pingbacks
  1. SSIS Junkie : TwitterCache now hosted on SQL Azure
  2. Weekly Link Post 122 « Rhonda Tipton’s WebLog
  3. SSIS Junkie : Tweetpoll and RESTful Northwind go bye-bye
Leave a Reply


Wanting to leave an <em>phasis on your comment?

Trackback URL http://www.brentozar.com/archive/2009/12/twitter-historical-database-of-my-tweeps/trackback/
Sept 30-Oct 2 – SQLBits - York, UK - doing sessions on virtualization & storage.

Nov 8-11 - PASS Summit - Seattle, WA - doing sessions on virtualization & professional development.

More Upcoming Events