Jamie Thomson’s whistlestop tour of SSIS addins

1 Comment

At #SQLbits, Jamie Thomson (Blog – Twitter) demoed the following SSIS add-ins:

  • Kimball SDC
  • Trace File Source
  • XMLify
  • File Watcher
  • Dynamic Data Flow
  • Rank Transform
  • Normaliser
  • Twitter
  • Compression

XMLify

Jamie find this useful when parsing error output from other SSIS steps, which often have several result sets with different columns.  XMLify dumps all of the data into a single XML file for easier error storage.

It’s currently only available for SQL 2005 on Codeplex, but Jamie’s updated it for 2008.  He expects to upload it to Codeplex shortly.

Kimball SDC

SSIS includes a Slowly Changing Dimensions wizard, but Jamie says it has extremely poor performance due to the amount of lookups it does, plus you can’t modify your work.  You have to reuse the wizard.  One workaround is the KimballSCD tool on Codeplex, which handles Type 1 and Type 2 dimensions.  These dimensions track the history of our data – for example, as our customers change from Single to Married to Divorced, we may want to capture those changes so we can view their status over time.

The KimballSCD component takes your incoming stream of data and dumps out separate streams with new records, changed records, deleted records, invalid input, and so forth.  It has a staggering array of configuration choices on how to handle errors.

Twitter Task

Built by Andy Leonard and Jessica Moss, this SSIS component can fetch and retrieve tweets.  Of course, when he went to demo Twitter.com, the page took forever to load.  Ah, the curse of using Twitter and WiFi networks for demos!

He demoed how to send tweets, then how to receive them and pipe the results through the Term Extraction task.  It’s basically data mining for text – it finds the most popular phrases in the stream of tweets.  Jamie asked if attendees found it interesting, and they definitely did.  A couple of attendees wanted to know when Twitter search integration would be included.  Doh!

Normaliser

This component takes a flat source like a list of orders that includes both header and detail info in the same row, and then normalizes it.  You have to pass in the data in the right sort number, because it uses those sorts to determine duplicates.  You pick which fields determine a header versus a detail record using checkboxes.

Rank Transform

The ranking T-SQL functions in SQL 2005 will take a set of data and add a rank column, and this SSIS component does the same thing.  Jamie says it’s still a little buggy, but he’ll work out the bugs in the next few days.  (Honesty is the best policy!)  The bugs are problems with the UI – checkboxes don’t show up as checked when they’re supposed to be.  The data under the covers works though.

It has nice options for rank, dense rank, row number, and row number by partition.  Rank and dense rank are two different ways to handle ties.

Compression Task

Pass in a file – but only one file at a time – and this task will gzip it.  It works for SQL 2005, and there’s a newer version coming on Codeplex to work in SQL Server 2008’s SSIS.  There’s an identical task to unzip.  I can see this as being useful for passing big XML files over networks.

File Watcher

This task watches a directory and waits for files to arrive.  When a file is created, it goes on to the next task in the package.  The File Watcher passes the name of the newly created file to the next task as well, so you can process that file.

Trace Reader

This takes a SQL Server trace file as a source, and then splits it out into three data outputs: short queries that took under 300ms, queries that took 300-1,000ms, and queries that took over a second.  Useful for DBAs who want to automate performance tuning on a large scale.

Dynamic Data Flow

The only non-free tool in the demo, Dynamic Data Flow is by a company called CozyRoc.  You can use one destination in SSIS, but populate different tables based on the contents of your data.

I bailed out just before the last task was shown because I had to get to my own session.

End of the Line

Jamie did an awesome job of demoing a lot of stuff in a short period of time without being overwhelming.  I can really appreciate how much work he had to put into these demos – every single component needed its own set of data.  Fantastic work.

Previous Post
Simon Sabin’s talk on query performance
Next Post
Microsoft MVPs: NDA + Twitter = Yammer

1 Comment. Leave new

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu