If you’re reading this blog, you’re probably a SQL Server DBA or developer, and you’re probably into Twitter, so you’re probably going to be interested in Rhys Campbell‘s Tweet-SQL.
Tweet-SQL is a set of SQL Server stored procedures that let you work with the Twitter API just by writing T-SQL code. You can do things like:
- Get the list of people you’re following into a database
- Retrieve their messages into a table
- Analyze who’s following who, and figure out who you should be following
- Compile metrics about what’s going on out there
There are public services like MrTweet that let you do some of this, but being a DBA, I wasn’t quite satisfied with those. There’s also a free SSIS Twitter Suite by Andy Leonard and Jessica Moss, and I started playing around with that, but I suck at SSIS. I’m pretty good with T-SQL, though, and because I didn’t want to spread my skill set any thinner, I just ate the cost of Tweet-SQL and got started.
Download and Install Tweet-SQL
To get started, you’ll need a 30-day eval copy of Tweet-SQL. It’s free to evaluate, and costs £25 at the time of this writing – roughly $37 US.
Evaluate it before you buy it though, because I’ve had some difficulties getting it up and running on some workstations, especially those where I’m running SQL 2008 under a non-admin domain account on Windows 2008. I’ve ended up running Tweet-SQL in its own virtual machine with Windows 2003 and SQL Server 2005 just to keep things simple.
During setup, Tweet-SQL asks for a username and password – that’s your Twitter information, which is used to connect to Twitter. Before you click OK, click Tweet-SQL at the top of the config screen, and click Deploy. That actually deploys the Tweet-SQL stored procedures to the database of your choosing.
Calling the Twitter API with T-SQL
Start a new query in the database where you deployed Tweet-SQL and execute this code:
1 2 |
EXEC dbo.tweet_cfg_resultset_send 1; EXEC dbo.tweet_sts_friends_timeline NULL, NULL; |
The first line enables recordset outputs (instead of XML) and the second line fetches our friends’ timeline from Twitter. You should see something like the window pictured at right – you can click on that to zoom in.
Ignore the first recordset that only has one line in it. The good stuff is in the second two recordsets.
The second recordset lists the status updates. Each line is a single status update. Note the “status_Id” column, second to last one – it starts at 0 and goes up.
The third recordset lists the users who made each status update. It joins to the second recordset on the “status_Id” column. Yes, in a perfect world, these would already be joined together for us, but we’re big bad database people, and that’s not a problem for us. Let’s create some tables to store these results.
Our Twitter Cache Database Schema
In order to pull our data into a database, we’re going to need some tables. For today’s lesson, we’re only going to create two tables:
- Users – stores the list of people we’ve seen on Twitter
- Statuses – stores the things they’ve said.
The Twitter API has a call that gets our status timeline, and it includes some basic information about each user. We’re going to kill two birds with one stone: whenever someone tweets, we’re going to store their user profile data in the User table too. There are separate Twitter API calls that will fetch our followers and our friends, but let’s learn to crawl first.
Here’s the script to create the tables. Be aware that I have zero clue what the actual field types are – these may need to be changed over time from nvarchar(200) to nvarchar(240), for example. (Yes, believe it or not, Twitter fields can be longer than 140 characters, as we’ll show later.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Statuses]( [id] [int] NOT NULL, [created_at] [datetime] NOT NULL, [text] [nvarchar](200) NOT NULL, [source] [nvarchar](160) NOT NULL, [truncated] [bit] NOT NULL DEFAULT ((0)), [in_reply_to_status_id] [int] NULL, [in_reply_to_user_id] [int] NULL, [favorited] [bit] NOT NULL DEFAULT ((0)), [user_id] [int] NOT NULL, [cached_date] [datetime] NOT NULL CONSTRAINT [DF_Statuses_cached_date] DEFAULT (getutcdate()), CONSTRAINT [PK_Statuses] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Users]( [id] [int] NOT NULL, [name] [nvarchar](50) NULL, [password] [nvarchar](50) NULL, [screen_name] [nvarchar](50) NULL, [location] [nvarchar](200) NULL, [description] [nvarchar](160) NULL, [profile_image_url] [nvarchar](200) NULL, [url] [nvarchar](200) NULL, [protected] [bit] NULL, [followers_count] [int] NULL, [cached_date] [datetime] NOT NULL CONSTRAINT [DF_Users_cached_date] DEFAULT (getutcdate()), [cached_friends_status_id] [int] NULL, CONSTRAINT [PK_Users] PRIMARY KEY CLUSTERED ( [id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO |
I’m not doing any fancy indexing here, just the bare minimum to get started.
The Logistics of Importing Twitter Data into SQL Server
Before we go wild and crazy with the code, let’s talk through it in pseudocode. Here’s what we need to accomplish:
- Call the Twitter API to get our most recent messages in XML format
- Store that data in a temporary table for faster juggling
- Insert users into our Users table if they don’t already exist
- Insert statuses into our Status table if they don’t already exist
We could just write this as a stored procedure, and then call it as a SQL Server Agent job every minute. However, the Twitter API has a couple of limitations that we need to watch out for.
The API only sends up to 200 results back at a time. If we call this every 15 minutes for users who are following a lot of people, we could easily lose messages if more than 200 come through in 15 minutes. We could work around that by calling the API every minute or so, but that leads into our second challenge: we’re only allowed to call the API a certain number of times per hour.
To help work around this, the Twitter API has an extra parameter: we can specify a status ID#, and Twitter can return just the new messages since that ID#. This way, I can call the API once per hour, specify my most recent cached status #, and I’ll only get the new messages. I can also call it with a “page” parameter, so we can get several groups of 200 results one page at a time until we’ve exhausted all of the data.
On our Users table, there’s a cached_friends_status_id field: that’s where we’re going to store the most recent status ID. Why wouldn’t we just query for the max id in the Statuses table? I’m glad you asked, because that shows that you’re smart enough to keep an eye out for Lesson 2, where we’ll get into more advanced uses of these tables.
With that in mind, here’s what our pseudocode will look like:
- Query the Users table to find the most recent cached status ID for our user
- Call the Twitter API to get our most recent statuses since the most recent cached status ID
- Store that data in a temporary table for faster juggling
- Insert users into our Users table if they don’t already exist
- Insert statuses into our Status table if they don’t already exist
- If we retrieved 200 results, there’s probably more, so go back to step 2, but ask for the next page of results.
- Update the Users table with the most recent cached status ID
The Stored Procedure to Import Twitter Data
With no further ado, here’s the stored procedure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[usp_ImportStatusesFromFriends]') AND type in (N'P', N'PC')) DROP PROCEDURE [dbo].[usp_ImportStatusesFromFriends] GO CREATE PROCEDURE [dbo].[usp_ImportStatusesFromFriends] @import_users_id INT = NULL AS SET NOCOUNT ON DECLARE @xml XML ,@handle INT ,@page INT ,@optional NVARCHAR(50) ,@RowsReturned INT ,@new_cached_friends_status_id INT ,@cached_friends_status_id INT /* Table to join statuses and users */ DECLARE @imported_statuses TABLE( id INT ,created_at DATETIME ,text NVARCHAR(200) ,source NVARCHAR(160) ,truncated BIT ,in_reply_to_status_id INT ,in_reply_to_user_id INT ,favorited BIT ,user_id INT ,user_name NVARCHAR(50) ,user_screen_name NVARCHAR(50) ,user_location NVARCHAR(200) ,user_description NVARCHAR(160) ,user_profile_image_url NVARCHAR(200) ,user_url NVARCHAR(200) ,user_protected BIT ,user_followers_count INT ) /* Get the last cached status ID if it's available */ SELECT @cached_friends_status_id = cached_friends_status_id FROM dbo.Users WHERE id = @import_users_id SET @page = 1; SET @optional = '?count=200&page=' + CAST(@page AS NVARCHAR(4)); IF @cached_friends_status_id IS NOT NULL BEGIN SET @optional = @optional + '&since_id=' + CAST(@cached_friends_status_id AS NVARCHAR(30)) END /* Turn off resultsets from Tweet-SQL because we'll be using XML data */ EXEC dbo.tweet_cfg_resultset_send 0; /* Get the first set of statuses */ WHILE (@page <= 10) BEGIN EXEC dbo.tweet_sts_friends_timeline @optional , @xml OUTPUT; EXEC sp_xml_preparedocument @handle OUTPUT , @xml; /* Copy the XML data into a single table to make handling easier */ INSERT INTO @imported_statuses (id ,created_at ,text ,source ,truncated ,in_reply_to_status_id ,in_reply_to_user_id ,favorited ,user_id ,user_name ,user_screen_name ,user_location ,user_description ,user_profile_image_url ,user_url ,user_protected ,user_followers_count) SELECT DISTINCT id ,dbo.tweet_fnc_dateconvert(created_at) ,text ,source ,truncated ,in_reply_to_status_id ,in_reply_to_user_id ,favorited ,user_id ,user_name ,user_screen_name ,user_location ,user_description ,user_profile_image_url ,user_url ,user_protected ,user_followers_count FROM OPENXML (@handle, '/statuses/status/user', 2) WITH (created_at NVARCHAR(50) '../created_at', id INT '../id', text NVARCHAR(200) '../text', source NVARCHAR(200) '../source', truncated NVARCHAR(5) '../truncated', in_reply_to_status_id INT '../in_reply_to_status_id', in_reply_to_user_id INT '../in_reply_to_user_id', favorited NVARCHAR(5) '../favorited', user_id INT 'id', user_screen_name NVARCHAR(200) 'screen_name', user_name NVARCHAR(200) 'name', user_location NVARCHAR(200) 'location', user_description NVARCHAR(200) 'description', user_profile_image_url NVARCHAR(200) 'profile_image_url', user_url NVARCHAR(200) 'url', user_protected NVARCHAR(5) 'protected', user_followers_count INT 'followers_count') EXEC sp_xml_removedocument @handle /* Insert new members that haven't been cached yet. The not-exists clause at the end is to avoid adding duplicate users. */ INSERT INTO dbo.Users (id ,name ,screen_name ,location ,description ,profile_image_url ,url ,protected ,followers_count) SELECT DISTINCT imp.user_id ,user_name ,user_screen_name ,user_location ,user_description ,user_profile_image_url ,user_url ,user_protected ,user_followers_count FROM @imported_statuses imp LEFT OUTER JOIN dbo.Users u ON imp.user_id = u.id WHERE u.id IS NULL AND NOT EXISTS (SELECT * FROM @imported_statuses impNewer WHERE imp.user_id = impNewer.user_id AND imp.id < impNewer.id) /* Import statuses that haven't been cached yet */ INSERT INTO dbo.Statuses (id ,created_at ,text ,source ,truncated ,in_reply_to_status_id ,in_reply_to_user_id ,favorited ,user_id) SELECT DISTINCT imp.id ,imp.created_at ,imp.text ,imp.source ,imp.truncated ,imp.in_reply_to_status_id ,imp.in_reply_to_user_id ,imp.favorited ,imp.user_id FROM @imported_statuses imp LEFT OUTER JOIN dbo.Statuses s ON imp.id = s.id WHERE s.id IS NULL /* Update the Groups with the most recent messageid */ SET @RowsReturned = (SELECT COUNT(* ) FROM @imported_statuses) IF @page = 1 AND @RowsReturned >= 1 BEGIN SET @new_cached_friends_status_id = (SELECT TOP 1 id FROM @imported_statuses ORDER BY id DESC) END SET @page = @page + 1 SET @optional = '?count=200&page=' + CAST(@page AS NVARCHAR(4)); IF @cached_friends_status_id IS NOT NULL BEGIN SET @optional = @optional + '&since_id=' + CAST(@cached_friends_status_id AS NVARCHAR(30)) END /* If we retrieved less than 200 rows, then we hit the last page. */ IF @RowsReturned < 200 BEGIN SET @page = 999 END /* Empty out the holding table for the next pass */ DELETE @imported_statuses END /* Update the Users table with the most recently cached record. */ UPDATE dbo.Users SET cached_friends_status_id = @new_cached_friends_status_id WHERE id = @import_users_id SET NOCOUNT OFF |
Are you still with me? Whew, good. I think we lost a few people there.
The stored proc takes an optional @import_users_id parameter – that’s your Twitter numeric user_id. You won’t know it yet, but after you run the stored proc, you can query your Users table to find yourself, assuming you posted a status update recently.
When you run the stored procedure like this:
1 |
exec [dbo].[usp_ImportStatusesFromFriends] |
you should get these messages:
1 2 3 4 5 |
200 OK: everything went awesome. 200 OK: everything went awesome. 200 OK: everything went awesome. 200 OK: everything went awesome. 200 OK: everything went awesome. |
That means, well, everything went awesome. Your Users and Statuses tables should now be populated.
At this point, you can get your user ID:
1 |
select * from dbo.Users where name = 'YourUsernameHere' |
And then from that point forward, you can call the stored proc with your user id, thereby enabling the last-cached-id code.
When Things Don’t Go Awesome
I’ve been running this code for a while, and it’s very much alpha code. It may deflower your laptop at a moment’s notice.
When things break, it’s usually because the Twitter API is down or because my database schema doesn’t fully handle Twitter’s fields. For example, I’ve had several cases where they’ve delivered data that’s wider than I expected – those NVARCHAR(200) fields didn’t start out quite that wide. Over time, I’ll come back and revise the table schemas to be more closely aligned with whatever Twitter’s actually storing.
Now that I’m thinking of it, I didn’t test this code on a case sensitive server. I’m usually really good about that. Doh!
Stuff To Do When You’ve Got Twitter Data In Your Database
I’ve just scratched the surface of what you can do with Tweet-SQL. The Twitter API is pretty darned powerful, and it enables you to do things like:
- Find out who your followers are following, which might be people you also find interesting
- Find influential people in your Twitter network
- Get recap reports of what links have been sent out recently, or what hot topics are out there
- Get peoples’ updates without actually following them directly (aka stalking, hee hee ho ho)
- Build a web-based equivalent to TweetDeck, with multiple user functionality
For sample scripts, check out SQLCLRNews, Rhys’s blog where he posts how-to examples for Tweet-SQL.
If you found this interesting, I’d suggest following rhyscampbell on Twitter, and of course, me – I’m BrentO.
15 Comments. Leave new
Nice write-up Brent! I'm working on v2 and your comments will be taken into consideration. It's been great fun working on Tweet-SQL and it sounds like you've had fun too. I'm probably going to move the Tweet-SQL example scripts over to my new blog http://www.youdidwhatwithtsql.com which I should be launching soon.
Well written. Can’t wait to paste a few of these into Query Analyzer. Do you have v2 you can email? I’ve been trying to download all night at 5KB/s until the server conn resets.
thanks again for the content.
Tom
Hey guys I wrote a free Twitter client for SQL Server.
It’s very basic, you can only post tweets from SQL Server with it, but it might enough for some people.
It’s a SQL CLR Assembly.
Did I mention that it was Free?
Hey guys I wrote a free Twitter client for SQL Server.
It’s very basic, you can only post tweets from SQL Server with it, but it might enough for some people.
It’s a SQL CLR Assembly.
Did I mention that it was Free?
Oh sorry I forgot the link:
http://corevolt.com/blog/post/TwiSQL-Use-twitter-from-T-SQL-(For-free).aspx
Two things. First, hopefully you code better than you comment. 😉 Second, the user has to figure out how to install it themselves? And it doesn’t consume Twitter data, only post? Okay, that’s three things, but if you’re after “free”, there’s already an open source SSIS project that both sends and receives tweets:
http://www.codeplex.com/SSISTwitterSuite/
My apologies for the duplicate entry, it was a mistake. I’m not a corporate entity pushing a product, just a developer that likes technology. This was just a little project that I wanted to share with you and your readers. My project is a SQL CLR assembly, anyone with reasonable SQL and/or .Net experience should be able to use it.
Regards.
TOTALLY okay. And you know what, I apologize for overreacting. Somebody was kicking me in the junk yesterday and I took it out on you. I apologize, and it wasn’t fair to you.
I’ve written a bunch of code that got started just the way you did – you add one feature at a time, over time, for yourself, and you end up with something that everybody loves. Heck, that’s how Linux got started. Good luck with the project!
Hi Brent,
I am having trouble executing usp_ImportStatusesFromFriends. I get Error: The remote server returned an error:(401)Unauthorized. Any ideas on how to fix this? Your help would be greatly appreciated.
Thanks,
nrauli
NRauli – your best bet is to contact TweetSQL for help.
Hi Brent,
I stumbled across your Tweet-SQL to SQL page. The test query works, but the SP is just sending back un-authorized messages. Is this something silly I did?
Hi, Rich. Unfortunately I can’t diagnose that quickly – your best bet is to contact the authors of TweetSQL.
Wow! Thanks for the quick response.
Has anyone tried running tweet-SQL V2 on any of these:
https://www.brentozar.com/archive/2013/04/cloud-alternatives-to-microsoft-sql-server/
Same issue as always: Needing to do a quick proof of concept by IT dept. too busy to assist with firewall settings if done on dev server in our farm so thought could do a short term on a cloud VM, install tweet-SQL and test.
Any comments / advice?
Regards
Paul
No, and I’d just contact the author of the app.