Tag Archive: ssis

Jamie Thomson’s whistlestop tour of SSIS addins

At #SQLbits, Jamie Thomson (Blog – Twitter) demoed the following SSIS add-ins:

  • Kimball SDC
  • Trace File Source
  • XMLify
  • File Watcher
  • Dynamic Data Flow
  • Rank Transform
  • Normaliser
  • Twitter
  • Compression

XMLify

Jamie find this useful when parsing error output from other SSIS steps, which often have several result sets with different columns.  XMLify dumps all of the data into a single XML file for easier error storage.

It’s currently only available for SQL 2005 on Codeplex, but Jamie’s updated it for 2008.  He expects to upload it to Codeplex shortly.

Kimball SDC

SSIS includes a Slowly Changing Dimensions wizard, but Jamie says it has extremely poor performance due to the amount of lookups it does, plus you can’t modify your work.  You have to reuse the wizard.  One workaround is the KimballSCD tool on Codeplex, which handles Type 1 and Type 2 dimensions.  These dimensions track the history of our data – for example, as our customers change from Single to Married to Divorced, we may want to capture those changes so we can view their status over time.

The KimballSCD component takes your incoming stream of data and dumps out separate streams with new records, changed records, deleted records, invalid input, and so forth.  It has a staggering array of configuration choices on how to handle errors.

Twitter Task

Built by Andy Leonard and Jessica Moss, this SSIS component can fetch and retrieve tweets.  Of course, when he went to demo Twitter.com, the page took forever to load.  Ah, the curse of using Twitter and WiFi networks for demos!

He demoed how to send tweets, then how to receive them and pipe the results through the Term Extraction task.  It’s basically data mining for text – it finds the most popular phrases in the stream of tweets.  Jamie asked if attendees found it interesting, and they definitely did.  A couple of attendees wanted to know when Twitter search integration would be included.  Doh!

Normaliser

This component takes a flat source like a list of orders that includes both header and detail info in the same row, and then normalizes it.  You have to pass in the data in the right sort number, because it uses those sorts to determine duplicates.  You pick which fields determine a header versus a detail record using checkboxes.

Rank Transform

The ranking T-SQL functions in SQL 2005 will take a set of data and add a rank column, and this SSIS component does the same thing.  Jamie says it’s still a little buggy, but he’ll work out the bugs in the next few days.  (Honesty is the best policy!)  The bugs are problems with the UI – checkboxes don’t show up as checked when they’re supposed to be.  The data under the covers works though.

It has nice options for rank, dense rank, row number, and row number by partition.  Rank and dense rank are two different ways to handle ties.

Compression Task

Pass in a file – but only one file at a time – and this task will gzip it.  It works for SQL 2005, and there’s a newer version coming on Codeplex to work in SQL Server 2008′s SSIS.  There’s an identical task to unzip.  I can see this as being useful for passing big XML files over networks.

File Watcher

This task watches a directory and waits for files to arrive.  When a file is created, it goes on to the next task in the package.  The File Watcher passes the name of the newly created file to the next task as well, so you can process that file.

Trace Reader

This takes a SQL Server trace file as a source, and then splits it out into three data outputs: short queries that took under 300ms, queries that took 300-1,000ms, and queries that took over a second.  Useful for DBAs who want to automate performance tuning on a large scale.

Dynamic Data Flow

The only non-free tool in the demo, Dynamic Data Flow is by a company called CozyRoc.  You can use one destination in SSIS, but populate different tables based on the contents of your data.

I bailed out just before the last task was shown because I had to get to my own session.

End of the Line

Jamie did an awesome job of demoing a lot of stuff in a short period of time without being overwhelming.  I can really appreciate how much work he had to put into these demos – every single component needed its own set of data.  Fantastic work.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

More Thoughts on SSIS Twitter Integration at #sqlpass

Yesterday, Andy Leonard and Jessica Moss announced that their Wednesday 3pm class on SSIS scripting will cover Twitter integration. In today’s SSIS Boot Camp by Brian Knight, I spoke with some DBAs about the Twitter idea.  The initial reaction has been, “Why would you want to do that?  Twitter is so unreliable.”

True: us database administrators are conditioned to working with extremely reliable tools, logged transactions, mission-critical uptime, yadda yadda yadda.  Twitter is none of those things – so why would we rely on it?

After they stop to think about it for a minute, there’s been some interesting use cases come up:

Tiny daily status reports to the public – say you’re a fundraising group like United Way, and you want to let your followers know how close you are to your fundraising goals.  Or maybe you’re a blood bank, and you want to tell people which branches are open or what types of blood you need most this week.  Tweets would be a great, free answer that scales pretty well, and it’s a pull-based setup where anyone can sign up without you writing a subscription interface.

Mini-reports of yesterday’s metrics for global users – same thing for companies, except you’d protect your Twitter feed.  You might send out a tweet each morning with a recap of whether we were on track for yesterday’s sales.  Imagine an automated tweet from the sales executive recapping yesterday’s sales vs goals.  Of course, keep security in mind – may just want to say, “Sales yesterday was 1.5% over goals!”, not actual sales numbers, even if your updates are protected.

“Interesting record” alerts – if you’re running a kick-ass IT question & answer site, you may end up with users who can’t wait to find out when their question is updated – but they only want to know when they’re online.  I could easily see value in setting up an SSIS/Twitter integration on StackOverflow.com so that when any of my posts are edited, or when someone answers my questions, it sends me a tweet.  It wouldn’t clutter up my inbox, and it’d only show up when I’m online.  (Or if I was a hard-core geek, I could set up Twitter to text me when these come in, but I’m not that hard-core.)

Deal alerts for your customers – Amazon posts sale alerts when items go on sale in their MP3 store.  Check out the AmazonMP3 Twitter feed for an idea.  We could set up automated jobs to take the items in our inventory that have the highest in-stock amount but lowest volume, for example, and discount the price by x% to move it out.  Granted, you’d want to put a lot of business logic in there to make sure it doesn’t discount something that you don’t want to discount, but it’s an interesting way to try to move product.  Yes, you could do this in a separate application, too, but doing it entirely in SQL Server would be a fast way to get the job done.

Building Twitter bots for sales force automation – imagine a Tweet conversation where one of your salespeople could send Tweets to a bot connected to your CRM system or data warehouse:

  • From @BrentO: “@QuestBot whois Jane Doe”
  • From @QuestBot: “@BrentO Jane Doe, ABC Consult., Boston, last order 9/10/08, $10m, rep John Smith”

Of course, you’d probably want to do this via private direct messages, not public messages, but you get the idea.  It’s a fast way of building a web-service-style CRM system without a lot of infrastructure.

Data mining your Tweeps – you can’t really follow more than a couple hundred people without losing Tweets, and even at that level, it takes a lot of time to digest your crowd.  I would love to see a summarized dashboard of activity from yesterday with what keywords were popping up, what links were sent out, who else my followers are talking to (so I can maybe find more interesting people), and so on.  I know I can do that if I can get everything into SQL Server, and this would probably be the first thing I’ll try to accomplish.

I’m way excited about this – can’t wait for Wednesday’s session.  3pm.  Be there or be square.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts

Use SSIS? Like Twitter? Wouldn’t it be great if….

@AndyLeonard: “You got your SSIS in my Twitter!”

@JessicaMMoss: “You got your Twitter in my SSIS!”

That’s right, like peanut butter and jelly, we’ve got a hookup that produces better than the sum of its parts.  Andy and Jessica will be unveiling their magic at their PASS session on Wednesday at 3pm in room 602.  I’ll be there – I’m taking Brian Knight’s SSIS Boot Camp class on Monday, and I will be mashing up Twitter and SQL like crazy.

Brent Ozar

Brent specializes in performance tuning for SQL Server, VMware, and storage. He's one of the very few Microsoft Certified Masters of SQL Server, a published author, and a Microsoft MVP. He likes travel, Jeeps, Apple gear, jokes, and writing about himself in the third person. Read more and contact Brent.

Website - Twitter - Facebook - More Posts