At #SQLbits, Jamie Thomson (Blog – Twitter) demoed the following SSIS add-ins:
- Kimball SDC
- Trace File Source
- File Watcher
- Dynamic Data Flow
- Rank Transform
Jamie find this useful when parsing error output from other SSIS steps, which often have several result sets with different columns. XMLify dumps all of the data into a single XML file for easier error storage.
It’s currently only available for SQL 2005 on Codeplex, but Jamie’s updated it for 2008. He expects to upload it to Codeplex shortly.
SSIS includes a Slowly Changing Dimensions wizard, but Jamie says it has extremely poor performance due to the amount of lookups it does, plus you can’t modify your work. You have to reuse the wizard. One workaround is the KimballSCD tool on Codeplex, which handles Type 1 and Type 2 dimensions. These dimensions track the history of our data – for example, as our customers change from Single to Married to Divorced, we may want to capture those changes so we can view their status over time.
The KimballSCD component takes your incoming stream of data and dumps out separate streams with new records, changed records, deleted records, invalid input, and so forth. It has a staggering array of configuration choices on how to handle errors.
Built by Andy Leonard and Jessica Moss, this SSIS component can fetch and retrieve tweets. Of course, when he went to demo Twitter.com, the page took forever to load. Ah, the curse of using Twitter and WiFi networks for demos!
He demoed how to send tweets, then how to receive them and pipe the results through the Term Extraction task. It’s basically data mining for text – it finds the most popular phrases in the stream of tweets. Jamie asked if attendees found it interesting, and they definitely did. A couple of attendees wanted to know when Twitter search integration would be included. Doh!
This component takes a flat source like a list of orders that includes both header and detail info in the same row, and then normalizes it. You have to pass in the data in the right sort number, because it uses those sorts to determine duplicates. You pick which fields determine a header versus a detail record using checkboxes.
The ranking T-SQL functions in SQL 2005 will take a set of data and add a rank column, and this SSIS component does the same thing. Jamie says it’s still a little buggy, but he’ll work out the bugs in the next few days. (Honesty is the best policy!) The bugs are problems with the UI – checkboxes don’t show up as checked when they’re supposed to be. The data under the covers works though.
It has nice options for rank, dense rank, row number, and row number by partition. Rank and dense rank are two different ways to handle ties.
Pass in a file – but only one file at a time – and this task will gzip it. It works for SQL 2005, and there’s a newer version coming on Codeplex to work in SQL Server 2008’s SSIS. There’s an identical task to unzip. I can see this as being useful for passing big XML files over networks.
This task watches a directory and waits for files to arrive. When a file is created, it goes on to the next task in the package. The File Watcher passes the name of the newly created file to the next task as well, so you can process that file.
This takes a SQL Server trace file as a source, and then splits it out into three data outputs: short queries that took under 300ms, queries that took 300-1,000ms, and queries that took over a second. Useful for DBAs who want to automate performance tuning on a large scale.
The only non-free tool in the demo, Dynamic Data Flow is by a company called CozyRoc. You can use one destination in SSIS, but populate different tables based on the contents of your data.
I bailed out just before the last task was shown because I had to get to my own session.
End of the Line
Jamie did an awesome job of demoing a lot of stuff in a short period of time without being overwhelming. I can really appreciate how much work he had to put into these demos – every single component needed its own set of data. Fantastic work.