Getting Your Twitter Feed into SQL Server with Tweet-SQL

15 Comments

If you’re reading this blog, you’re probably a SQL Server DBA or developer, and you’re probably into Twitter, so you’re probably going to be interested in Rhys Campbell‘s Tweet-SQL.

Tweet-SQL is a set of SQL Server stored procedures that let you work with the Twitter API just by writing T-SQL code. You can do things like:

  • Get the list of people you’re following into a database
  • Retrieve their messages into a table
  • Analyze who’s following who, and figure out who you should be following
  • Compile metrics about what’s going on out there

There are public services like MrTweet that let you do some of this, but being a DBA, I wasn’t quite satisfied with those. There’s also a free SSIS Twitter Suite by Andy Leonard and Jessica Moss, and I started playing around with that, but I suck at SSIS. I’m pretty good with T-SQL, though, and because I didn’t want to spread my skill set any thinner, I just ate the cost of Tweet-SQL and got started.

Download and Install Tweet-SQL

To get started, you’ll need a 30-day eval copy of Tweet-SQL. It’s free to evaluate, and costs £25 at the time of this writing – roughly $37 US.

Evaluate it before you buy it though, because I’ve had some difficulties getting it up and running on some workstations, especially those where I’m running SQL 2008 under a non-admin domain account on Windows 2008. I’ve ended up running Tweet-SQL in its own virtual machine with Windows 2003 and SQL Server 2005 just to keep things simple.

During setup, Tweet-SQL asks for a username and password – that’s your Twitter information, which is used to connect to Twitter. Before you click OK, click Tweet-SQL at the top of the config screen, and click Deploy. That actually deploys the Tweet-SQL stored procedures to the database of your choosing.

Calling the Twitter API with T-SQL

Start a new query in the database where you deployed Tweet-SQL and execute this code:

TweetSQL Status Timeline
TweetSQL Status Timeline

The first line enables recordset outputs (instead of XML) and the second line fetches our friends’ timeline from Twitter. You should see something like the window pictured at right – you can click on that to zoom in.

Ignore the first recordset that only has one line in it. The good stuff is in the second two recordsets.

The second recordset lists the status updates. Each line is a single status update. Note the “status_Id” column, second to last one – it starts at 0 and goes up.

The third recordset lists the users who made each status update. It joins to the second recordset on the “status_Id” column. Yes, in a perfect world, these would already be joined together for us, but we’re big bad database people, and that’s not a problem for us. Let’s create some tables to store these results.

Our Twitter Cache Database Schema

In order to pull our data into a database, we’re going to need some tables. For today’s lesson, we’re only going to create two tables:

  • Users – stores the list of people we’ve seen on Twitter
  • Statuses – stores the things they’ve said.

The Twitter API has a call that gets our status timeline, and it includes some basic information about each user. We’re going to kill two birds with one stone: whenever someone tweets, we’re going to store their user profile data in the User table too. There are separate Twitter API calls that will fetch our followers and our friends, but let’s learn to crawl first.

Here’s the script to create the tables. Be aware that I have zero clue what the actual field types are – these may need to be changed over time from nvarchar(200) to nvarchar(240), for example. (Yes, believe it or not, Twitter fields can be longer than 140 characters, as we’ll show later.)

I’m not doing any fancy indexing here, just the bare minimum to get started.

The Logistics of Importing Twitter Data into SQL Server

Before we go wild and crazy with the code, let’s talk through it in pseudocode. Here’s what we need to accomplish:

  • Call the Twitter API to get our most recent messages in XML format
  • Store that data in a temporary table for faster juggling
  • Insert users into our Users table if they don’t already exist
  • Insert statuses into our Status table if they don’t already exist

We could just write this as a stored procedure, and then call it as a SQL Server Agent job every minute. However, the Twitter API has a couple of limitations that we need to watch out for.

The API only sends up to 200 results back at a time. If we call this every 15 minutes for users who are following a lot of people, we could easily lose messages if more than 200 come through in 15 minutes. We could work around that by calling the API every minute or so, but that leads into our second challenge: we’re only allowed to call the API a certain number of times per hour.

To help work around this, the Twitter API has an extra parameter: we can specify a status ID#, and Twitter can return just the new messages since that ID#. This way, I can call the API once per hour, specify my most recent cached status #, and I’ll only get the new messages. I can also call it with a “page” parameter, so we can get several groups of 200 results one page at a time until we’ve exhausted all of the data.

On our Users table, there’s a cached_friends_status_id field: that’s where we’re going to store the most recent status ID. Why wouldn’t we just query for the max id in the Statuses table? I’m glad you asked, because that shows that you’re smart enough to keep an eye out for Lesson 2, where we’ll get into more advanced uses of these tables.

With that in mind, here’s what our pseudocode will look like:

  • Query the Users table to find the most recent cached status ID for our user
  • Call the Twitter API to get our most recent statuses since the most recent cached status ID
  • Store that data in a temporary table for faster juggling
  • Insert users into our Users table if they don’t already exist
  • Insert statuses into our Status table if they don’t already exist
  • If we retrieved 200 results, there’s probably more, so go back to step 2, but ask for the next page of results.
  • Update the Users table with the most recent cached status ID

The Stored Procedure to Import Twitter Data

With no further ado, here’s the stored procedure:

Are you still with me? Whew, good. I think we lost a few people there.

The stored proc takes an optional @import_users_id parameter – that’s your Twitter numeric user_id. You won’t know it yet, but after you run the stored proc, you can query your Users table to find yourself, assuming you posted a status update recently.

When you run the stored procedure like this:

you should get these messages:

That means, well, everything went awesome. Your Users and Statuses tables should now be populated.

At this point, you can get your user ID:

And then from that point forward, you can call the stored proc with your user id, thereby enabling the last-cached-id code.

When Things Don’t Go Awesome

I’ve been running this code for a while, and it’s very much alpha code. It may deflower your laptop at a moment’s notice.

When things break, it’s usually because the Twitter API is down or because my database schema doesn’t fully handle Twitter’s fields. For example, I’ve had several cases where they’ve delivered data that’s wider than I expected – those NVARCHAR(200) fields didn’t start out quite that wide. Over time, I’ll come back and revise the table schemas to be more closely aligned with whatever Twitter’s actually storing.

Now that I’m thinking of it, I didn’t test this code on a case sensitive server. I’m usually really good about that. Doh!

Stuff To Do When You’ve Got Twitter Data In Your Database

I’ve just scratched the surface of what you can do with Tweet-SQL. The Twitter API is pretty darned powerful, and it enables you to do things like:

  • Find out who your followers are following, which might be people you also find interesting
  • Find influential people in your Twitter network
  • Get recap reports of what links have been sent out recently, or what hot topics are out there
  • Get peoples’ updates without actually following them directly (aka stalking, hee hee ho ho)
  • Build a web-based equivalent to TweetDeck, with multiple user functionality

For sample scripts, check out SQLCLRNews, Rhys’s blog where he posts how-to examples for Tweet-SQL.

If you found this interesting, I’d suggest following rhyscampbell on Twitter, and of course, me – I’m BrentO.

Previous Post
Book Review: Microsoft SQL Server 2008 Management and Administration
Next Post
My First Computer

15 Comments. Leave new

  • Nice write-up Brent! I'm working on v2 and your comments will be taken into consideration. It's been great fun working on Tweet-SQL and it sounds like you've had fun too. I'm probably going to move the Tweet-SQL example scripts over to my new blog http://www.youdidwhatwithtsql.com which I should be launching soon.

    Reply
  • Tom Overton
    June 17, 2009 3:59 am

    Well written. Can’t wait to paste a few of these into Query Analyzer. Do you have v2 you can email? I’ve been trying to download all night at 5KB/s until the server conn resets.

    thanks again for the content.

    Tom

    Reply
  • Hey guys I wrote a free Twitter client for SQL Server.

    It’s very basic, you can only post tweets from SQL Server with it, but it might enough for some people.

    It’s a SQL CLR Assembly.
    Did I mention that it was Free?

    Reply
    • Hey guys I wrote a free Twitter client for SQL Server.

      It’s very basic, you can only post tweets from SQL Server with it, but it might enough for some people.

      It’s a SQL CLR Assembly.
      Did I mention that it was Free?

      Oh sorry I forgot the link:
      http://corevolt.com/blog/post/TwiSQL-Use-twitter-from-T-SQL-(For-free).aspx

      Reply
      • Two things. First, hopefully you code better than you comment. 😉 Second, the user has to figure out how to install it themselves? And it doesn’t consume Twitter data, only post? Okay, that’s three things, but if you’re after “free”, there’s already an open source SSIS project that both sends and receives tweets:

        http://www.codeplex.com/SSISTwitterSuite/

        Reply
        • My apologies for the duplicate entry, it was a mistake. I’m not a corporate entity pushing a product, just a developer that likes technology. This was just a little project that I wanted to share with you and your readers. My project is a SQL CLR assembly, anyone with reasonable SQL and/or .Net experience should be able to use it.

          Regards.

          Reply
          • TOTALLY okay. And you know what, I apologize for overreacting. Somebody was kicking me in the junk yesterday and I took it out on you. I apologize, and it wasn’t fair to you.

            I’ve written a bunch of code that got started just the way you did – you add one feature at a time, over time, for yourself, and you end up with something that everybody loves. Heck, that’s how Linux got started. Good luck with the project!

  • Hi Brent,

    I am having trouble executing usp_ImportStatusesFromFriends. I get Error: The remote server returned an error:(401)Unauthorized. Any ideas on how to fix this? Your help would be greatly appreciated.

    Thanks,
    nrauli

    Reply
  • Hi Brent,

    I stumbled across your Tweet-SQL to SQL page. The test query works, but the SP is just sending back un-authorized messages. Is this something silly I did?

    Reply
  • Wow! Thanks for the quick response.

    Reply
  • Paul Sieberhagen
    September 9, 2015 6:08 pm

    Has anyone tried running tweet-SQL V2 on any of these:

    https://www.brentozar.com/archive/2013/04/cloud-alternatives-to-microsoft-sql-server/

    Same issue as always: Needing to do a quick proof of concept by IT dept. too busy to assist with firewall settings if done on dev server in our farm so thought could do a short term on a cloud VM, install tweet-SQL and test.

    Any comments / advice?

    Regards
    Paul

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.