Getting Your Twitter Feed into SQL Server with Tweet-SQL

If you’re reading this blog, you’re probably a SQL Server DBA or developer, and you’re probably into Twitter, so you’re probably going to be interested in Rhys Campbell‘s Tweet-SQL.

Tweet-SQL is a set of SQL Server stored procedures that let you work with the Twitter API just by writing T-SQL code. You can do things like:

  • Get the list of people you’re following into a database
  • Retrieve their messages into a table
  • Analyze who’s following who, and figure out who you should be following
  • Compile metrics about what’s going on out there

There are public services like MrTweet that let you do some of this, but being a DBA, I wasn’t quite satisfied with those. There’s also a free SSIS Twitter Suite by Andy Leonard and Jessica Moss, and I started playing around with that, but I suck at SSIS. I’m pretty good with T-SQL, though, and because I didn’t want to spread my skill set any thinner, I just ate the cost of Tweet-SQL and got started.

Download and Install Tweet-SQL

To get started, you’ll need a 30-day eval copy of Tweet-SQL. It’s free to evaluate, and costs £25 at the time of this writing – roughly $37 US.

Evaluate it before you buy it though, because I’ve had some difficulties getting it up and running on some workstations, especially those where I’m running SQL 2008 under a non-admin domain account on Windows 2008. I’ve ended up running Tweet-SQL in its own virtual machine with Windows 2003 and SQL Server 2005 just to keep things simple.

During setup, Tweet-SQL asks for a username and password – that’s your Twitter information, which is used to connect to Twitter. Before you click OK, click Tweet-SQL at the top of the config screen, and click Deploy. That actually deploys the Tweet-SQL stored procedures to the database of your choosing.

Calling the Twitter API with T-SQL

Start a new query in the database where you deployed Tweet-SQL and execute this code:

EXEC dbo.tweet_cfg_resultset_send 1;
EXEC dbo.tweet_sts_friends_timeline NULL, NULL;
TweetSQL Status Timeline

TweetSQL Status Timeline

The first line enables recordset outputs (instead of XML) and the second line fetches our friends’ timeline from Twitter. You should see something like the window pictured at right – you can click on that to zoom in.

Ignore the first recordset that only has one line in it. The good stuff is in the second two recordsets.

The second recordset lists the status updates. Each line is a single status update. Note the “status_Id” column, second to last one – it starts at 0 and goes up.

The third recordset lists the users who made each status update. It joins to the second recordset on the “status_Id” column. Yes, in a perfect world, these would already be joined together for us, but we’re big bad database people, and that’s not a problem for us. Let’s create some tables to store these results.

Our Twitter Cache Database Schema

In order to pull our data into a database, we’re going to need some tables. For today’s lesson, we’re only going to create two tables:

  • Users – stores the list of people we’ve seen on Twitter
  • Statuses – stores the things they’ve said.

The Twitter API has a call that gets our status timeline, and it includes some basic information about each user. We’re going to kill two birds with one stone: whenever someone tweets, we’re going to store their user profile data in the User table too. There are separate Twitter API calls that will fetch our followers and our friends, but let’s learn to crawl first.

Here’s the script to create the tables. Be aware that I have zero clue what the actual field types are – these may need to be changed over time from nvarchar(200) to nvarchar(240), for example. (Yes, believe it or not, Twitter fields can be longer than 140 characters, as we’ll show later.)

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Statuses](
	[id] [INT] NOT NULL,
	[created_at] [datetime] NOT NULL,
	[text] [nvarchar](200) NOT NULL,
	[SOURCE] [nvarchar](160) NOT NULL,
	[truncated] [bit] NOT NULL DEFAULT ((0)),
	[in_reply_to_status_id] [INT] NULL,
	[in_reply_to_user_id] [INT] NULL,
	[favorited] [bit] NOT NULL DEFAULT ((0)),
	[user_id] [INT] NOT NULL,
	[cached_date] [datetime] NOT NULL CONSTRAINT [DF_Statuses_cached_date]
        DEFAULT (getutcdate()),
 CONSTRAINT [PK_Statuses] PRIMARY KEY CLUSTERED
(
	[id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON,
  ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
 
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Users](
	[id] [INT] NOT NULL,
	[name] [nvarchar](50) NULL,
	[password] [nvarchar](50) NULL,
	[screen_name] [nvarchar](50) NULL,
	[location] [nvarchar](200) NULL,
	[description] [nvarchar](160) NULL,
	[profile_image_url] [nvarchar](200) NULL,
	[url] [nvarchar](200) NULL,
	[protected] [bit] NULL,
	[followers_count] [INT] NULL,
	[cached_date] [datetime] NOT NULL 
	  CONSTRAINT [DF_Users_cached_date]  DEFAULT (getutcdate()),
	[cached_friends_status_id] [INT] NULL,
 CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED
(
	[id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF,
  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON,
  ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

I’m not doing any fancy indexing here, just the bare minimum to get started.

The Logistics of Importing Twitter Data into SQL Server

Before we go wild and crazy with the code, let’s talk through it in pseudocode. Here’s what we need to accomplish:

  • Call the Twitter API to get our most recent messages in XML format
  • Store that data in a temporary table for faster juggling
  • Insert users into our Users table if they don’t already exist
  • Insert statuses into our Status table if they don’t already exist

We could just write this as a stored procedure, and then call it as a SQL Server Agent job every minute. However, the Twitter API has a couple of limitations that we need to watch out for.

The API only sends up to 200 results back at a time. If we call this every 15 minutes for users who are following a lot of people, we could easily lose messages if more than 200 come through in 15 minutes. We could work around that by calling the API every minute or so, but that leads into our second challenge: we’re only allowed to call the API a certain number of times per hour.

To help work around this, the Twitter API has an extra parameter: we can specify a status ID#, and Twitter can return just the new messages since that ID#. This way, I can call the API once per hour, specify my most recent cached status #, and I’ll only get the new messages. I can also call it with a “page” parameter, so we can get several groups of 200 results one page at a time until we’ve exhausted all of the data.

On our Users table, there’s a cached_friends_status_id field: that’s where we’re going to store the most recent status ID. Why wouldn’t we just query for the max id in the Statuses table? I’m glad you asked, because that shows that you’re smart enough to keep an eye out for Lesson 2, where we’ll get into more advanced uses of these tables.

With that in mind, here’s what our pseudocode will look like:

  • Query the Users table to find the most recent cached status ID for our user
  • Call the Twitter API to get our most recent statuses since the most recent cached status ID
  • Store that data in a temporary table for faster juggling
  • Insert users into our Users table if they don’t already exist
  • Insert statuses into our Status table if they don’t already exist
  • If we retrieved 200 results, there’s probably more, so go back to step 2, but ask for the next page of results.
  • Update the Users table with the most recent cached status ID

The Stored Procedure to Import Twitter Data

With no further ado, here’s the stored procedure:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF  EXISTS (SELECT * FROM sys.objects
  WHERE object_id = OBJECT_ID(N'[dbo].[usp_ImportStatusesFromFriends]')
  AND TYPE IN (N'P', N'PC'))
DROP PROCEDURE [dbo].[usp_ImportStatusesFromFriends]
GO
CREATE PROCEDURE [dbo].[usp_ImportStatusesFromFriends] @import_users_id INT = NULL
AS
  SET NOCOUNT ON
 
  DECLARE  @xml                                XML
           ,@handle                            INT
           ,@page                              INT
           ,@optional                          NVARCHAR(50)
           ,@RowsReturned                      INT
           ,@new_cached_friends_status_id INT
           ,@cached_friends_status_id     INT
 
  /* Table to join statuses and users */
  DECLARE  @imported_statuses  TABLE(
                                     id                      INT
                                     ,created_at             DATETIME
                                     ,text                   NVARCHAR(200)
                                     ,SOURCE                 NVARCHAR(160)
                                     ,truncated              BIT
                                     ,in_reply_to_status_id  INT
                                     ,in_reply_to_user_id    INT
                                     ,favorited              BIT
                                     ,user_id                INT
                                     ,user_name              NVARCHAR(50)
                                     ,user_screen_name       NVARCHAR(50)
                                     ,user_location          NVARCHAR(200)
                                     ,user_description       NVARCHAR(160)
                                     ,user_profile_image_url NVARCHAR(200)
                                     ,user_url               NVARCHAR(200)
                                     ,user_protected         BIT
                                     ,user_followers_count   INT
                                     )
 
  /* Get the last cached status ID if it's available */
  SELECT @cached_friends_status_id = cached_friends_status_id
    FROM dbo.Users
    WHERE id = @import_users_id
  SET @page = 1;
  SET @optional = '?count=200&page=' + CAST(@page AS NVARCHAR(4));
 
  IF @cached_friends_status_id IS NOT NULL
    BEGIN
      SET @optional = @optional + '&since_id='
        + CAST(@cached_friends_status_id AS NVARCHAR(30))
    END
 
  /* Turn off resultsets from Tweet-SQL because we'll be using XML data */
  EXEC dbo.tweet_cfg_resultset_send
    0;
 
  /* Get the first set of statuses */
  WHILE (@page <= 10)
    BEGIN
      EXEC dbo.tweet_sts_friends_timeline
        @optional ,
        @xml OUTPUT;
 
      EXEC sp_xml_preparedocument
        @handle OUTPUT ,
        @xml;
 
      /* Copy the XML data into a single table to make handling easier */
      INSERT INTO @imported_statuses
                 (id
                  ,created_at
                  ,text
                  ,SOURCE
                  ,truncated
                  ,in_reply_to_status_id
                  ,in_reply_to_user_id
                  ,favorited
                  ,user_id
                  ,user_name
                  ,user_screen_name
                  ,user_location
                  ,user_description
                  ,user_profile_image_url
                  ,user_url
                  ,user_protected
                  ,user_followers_count)
      SELECT DISTINCT id
                      ,dbo.tweet_fnc_dateconvert(created_at)
                      ,text
                      ,SOURCE
                      ,truncated
                      ,in_reply_to_status_id
                      ,in_reply_to_user_id
                      ,favorited
                      ,user_id
                      ,user_name
                      ,user_screen_name
                      ,user_location
                      ,user_description
                      ,user_profile_image_url
                      ,user_url
                      ,user_protected
                      ,user_followers_count
      FROM   OPENXML (@handle, '/statuses/status/user', 2)
                WITH (created_at             NVARCHAR(50)  '../created_at',
                      id                     INT           '../id',
                      text                   NVARCHAR(200) '../text',
                      SOURCE                 NVARCHAR(200) '../source',
                      truncated              NVARCHAR(5)   '../truncated',
                      in_reply_to_status_id  INT           '../in_reply_to_status_id',
                      in_reply_to_user_id    INT           '../in_reply_to_user_id',
                      favorited              NVARCHAR(5)   '../favorited',
                      user_id                INT           'id',
                      user_screen_name       NVARCHAR(200) 'screen_name',
                      user_name              NVARCHAR(200) 'name',
                      user_location          NVARCHAR(200) 'location',
                      user_description       NVARCHAR(200) 'description',
                      user_profile_image_url NVARCHAR(200) 'profile_image_url',
                      user_url               NVARCHAR(200) 'url',
                      user_protected         NVARCHAR(5)   'protected',
                      user_followers_count   INT           'followers_count')
 
      EXEC sp_xml_removedocument
        @handle
 
      /* Insert new members that haven't been cached yet.
         The not-exists clause at the end is to avoid adding duplicate users. */
      INSERT INTO dbo.Users
                 (id
                  ,name
                  ,screen_name
                  ,location
                  ,description
                  ,profile_image_url
                  ,url
                  ,protected
                  ,followers_count)
      SELECT DISTINCT imp.user_id
                      ,user_name
                      ,user_screen_name
                      ,user_location
                      ,user_description
                      ,user_profile_image_url
                      ,user_url
                      ,user_protected
                      ,user_followers_count
      FROM   @imported_statuses imp
             LEFT OUTER JOIN dbo.Users u
               ON imp.user_id = u.id
      WHERE  u.id IS NULL
             AND NOT EXISTS (SELECT *
                             FROM   @imported_statuses impNewer
                             WHERE  imp.user_id = impNewer.user_id
                                    AND imp.id < impNewer.id)
 
      /* Import statuses that haven't been cached yet */
      INSERT INTO dbo.Statuses
                 (id
                  ,created_at
                  ,text
                  ,SOURCE
                  ,truncated
                  ,in_reply_to_status_id
                  ,in_reply_to_user_id
                  ,favorited
                  ,user_id)
      SELECT DISTINCT imp.id
                      ,imp.created_at
                      ,imp.text
                      ,imp.SOURCE
                      ,imp.truncated
                      ,imp.in_reply_to_status_id
                      ,imp.in_reply_to_user_id
                      ,imp.favorited
                      ,imp.user_id
      FROM   @imported_statuses imp
             LEFT OUTER JOIN dbo.Statuses s
               ON imp.id = s.id
      WHERE  s.id IS NULL
 
      /* Update the Groups with the most recent messageid */
      SET @RowsReturned = (SELECT COUNT(* )
                           FROM   @imported_statuses)
      IF @page = 1
         AND @RowsReturned >= 1
        BEGIN
          SET @new_cached_friends_status_id = (SELECT   TOP 1 id
                                                    FROM     @imported_statuses
                                                    ORDER BY id DESC)
        END
 
      SET @page = @page + 1
      SET @optional = '?count=200&page=' + CAST(@page AS NVARCHAR(4));
      IF @cached_friends_status_id IS NOT NULL
        BEGIN
          SET @optional = @optional + '&since_id='
            + CAST(@cached_friends_status_id AS NVARCHAR(30))
        END
 
      /* If we retrieved less than 200 rows, then we hit the last page. */
      IF @RowsReturned < 200
        BEGIN
          SET @page = 999
        END
 
      /* Empty out the holding table for the next pass */
      DELETE @imported_statuses
    END
 
  /* Update the Users table with the most recently cached record. */
  UPDATE dbo.Users
    SET cached_friends_status_id = @new_cached_friends_status_id
    WHERE id = @import_users_id
 
  SET NOCOUNT OFF

Are you still with me? Whew, good. I think we lost a few people there.

The stored proc takes an optional @import_users_id parameter – that’s your Twitter numeric user_id. You won’t know it yet, but after you run the stored proc, you can query your Users table to find yourself, assuming you posted a status update recently.

When you run the stored procedure like this:

EXEC [dbo].[usp_ImportStatusesFromFriends]

you should get these messages:

200 OK: everything went awesome.
200 OK: everything went awesome.
200 OK: everything went awesome.
200 OK: everything went awesome.
200 OK: everything went awesome.

That means, well, everything went awesome. Your Users and Statuses tables should now be populated.

At this point, you can get your user ID:

SELECT * FROM dbo.Users WHERE name = 'YourUsernameHere'

And then from that point forward, you can call the stored proc with your user id, thereby enabling the last-cached-id code.

When Things Don’t Go Awesome

I’ve been running this code for a while, and it’s very much alpha code. It may deflower your laptop at a moment’s notice.

When things break, it’s usually because the Twitter API is down or because my database schema doesn’t fully handle Twitter’s fields. For example, I’ve had several cases where they’ve delivered data that’s wider than I expected – those NVARCHAR(200) fields didn’t start out quite that wide. Over time, I’ll come back and revise the table schemas to be more closely aligned with whatever Twitter’s actually storing.

Now that I’m thinking of it, I didn’t test this code on a case sensitive server. I’m usually really good about that. Doh!

Stuff To Do When You’ve Got Twitter Data In Your Database

I’ve just scratched the surface of what you can do with Tweet-SQL. The Twitter API is pretty darned powerful, and it enables you to do things like:

  • Find out who your followers are following, which might be people you also find interesting
  • Find influential people in your Twitter network
  • Get recap reports of what links have been sent out recently, or what hot topics are out there
  • Get peoples’ updates without actually following them directly (aka stalking, hee hee ho ho)
  • Build a web-based equivalent to TweetDeck, with multiple user functionality

For sample scripts, check out SQLCLRNews, Rhys’s blog where he posts how-to examples for Tweet-SQL.

If you found this interesting, I’d suggest following rhyscampbell on Twitter, and of course, me – I’m BrentO.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

13 Responses to Getting Your Twitter Feed into SQL Server with Tweet-SQL
  1. [...] Brent Ozar has posted an article on Getting Your Twitter Feed into SQL Server with Tweet-SQL. [...]

  2. Rhys
    February 16, 2009 | 6:12 PM

    Nice write-up Brent! I'm working on v2 and your comments will be taken into consideration. It's been great fun working on Tweet-SQL and it sounds like you've had fun too. I'm probably going to move the Tweet-SQL example scripts over to my new blog http://www.youdidwhatwithtsql.com which I should be launching soon.

  3. Tom Overton
    June 17, 2009 | 3:59 AM

    Well written. Can’t wait to paste a few of these into Query Analyzer. Do you have v2 you can email? I’ve been trying to download all night at 5KB/s until the server conn resets.

    thanks again for the content.

    Tom

  4. Abraham
    February 8, 2010 | 5:01 PM

    Hey guys I wrote a free Twitter client for SQL Server.

    It’s very basic, you can only post tweets from SQL Server with it, but it might enough for some people.

    It’s a SQL CLR Assembly.
    Did I mention that it was Free?

    • Abraham
      February 8, 2010 | 5:02 PM

      Hey guys I wrote a free Twitter client for SQL Server.

      It’s very basic, you can only post tweets from SQL Server with it, but it might enough for some people.

      It’s a SQL CLR Assembly.
      Did I mention that it was Free?

      Oh sorry I forgot the link:
      http://corevolt.com/blog/post/TwiSQL-Use-twitter-from-T-SQL-(For-free).aspx

      • Brent Ozar
        February 8, 2010 | 5:03 PM

        Two things. First, hopefully you code better than you comment. ;-) Second, the user has to figure out how to install it themselves? And it doesn’t consume Twitter data, only post? Okay, that’s three things, but if you’re after “free”, there’s already an open source SSIS project that both sends and receives tweets:

        http://www.codeplex.com/SSISTwitterSuite/

        • Corevolt
          February 8, 2010 | 6:03 PM

          My apologies for the duplicate entry, it was a mistake. I’m not a corporate entity pushing a product, just a developer that likes technology. This was just a little project that I wanted to share with you and your readers. My project is a SQL CLR assembly, anyone with reasonable SQL and/or .Net experience should be able to use it.

          Regards.

          • Brent Ozar
            February 9, 2010 | 7:02 AM

            TOTALLY okay. And you know what, I apologize for overreacting. Somebody was kicking me in the junk yesterday and I took it out on you. I apologize, and it wasn’t fair to you.

            I’ve written a bunch of code that got started just the way you did – you add one feature at a time, over time, for yourself, and you end up with something that everybody loves. Heck, that’s how Linux got started. Good luck with the project!

  5. nrauli
    August 25, 2011 | 5:48 AM

    Hi Brent,

    I am having trouble executing usp_ImportStatusesFromFriends. I get Error: The remote server returned an error:(401)Unauthorized. Any ideas on how to fix this? Your help would be greatly appreciated.

    Thanks,
    nrauli

    • Brent Ozar
      August 25, 2011 | 1:42 PM

      NRauli – your best bet is to contact TweetSQL for help.

  6. Rich Buckley
    November 28, 2011 | 8:10 AM

    Hi Brent,

    I stumbled across your Tweet-SQL to SQL page. The test query works, but the SP is just sending back un-authorized messages. Is this something silly I did?

    • Brent Ozar
      November 28, 2011 | 8:12 AM

      Hi, Rich. Unfortunately I can’t diagnose that quickly – your best bet is to contact the authors of TweetSQL.

  7. Rich Buckley
    November 28, 2011 | 8:17 AM

    Wow! Thanks for the quick response.

Leave a Reply

Wanting to leave an <em>phasis on your comment?

Notify me of followup comments via e-mail. You can also subscribe without commenting.