The dangers of building linked servers on the fly

3 Comments

Linked servers let users query different database servers from a central SQL Server.  Some uses for this might be:

  • DBA utility scripts – from a central repository server, we could query all of the servers in our environment to check for things like backup dates or configuration settings
  • Consolidated reporting – we could gather results from multiple databases on different servers and join them together into a single query
  • Moving data from platform to platform – if we had an Oracle, AS400, etc platform and we needed that data in SQL Server, we could fetch it periodically through a linked server query

All of these work great in theory.  The danger comes in when we start scaling up with more data or more linked servers.

As we deal with more and more data, linked servers are less efficient.  Say we’re joining a ten million row sales table on Server A with a ten thousand row customer table on Server B.  That query might run fine if we run it from Server A, with Server B set up as a linked server.  Since Server B only has to pull 10,000 rows and push them all through the network to Server A, we may not see big performance problems.  However, if we try to run that query on Server B, with Server A set up as a linked server, we can have truly horrendous performance depending on how we write the query.

As we scale out to more servers, the temptation arises to build the linked server connection on the fly, like this:

  1. If a linked server connection does not exist to Server B, build it
  2. Run our query
  3. If we added the linked server connection in step 1, then delete it

We’ve now introduced a problem: if our query throws a ginormous error and bails out, we never delete our linked server connection.  We have to be very careful about error checking in step 2, especially around deadlocks.  If a deadlock arises and SQL Server kills our connection, we’re really in trouble – the linked server connection will stick around.

Worse, if the query takes any time whatsoever to run, we suddenly introduce at least two scenarios where this process will fail:

  • If a second automated query fires up while the query is running, it too will check to see if the linked server exists.  When it doesn’t, it will start running a linked server query.  When our original query finishes, it will try to tear down the linked server connection while it’s still in use by our new query.
  • If a human being (damn those meatbags!) sees a linked server connection that shouldn’t exist, they might try to delete it while our query is running.

Another potential problem is that server names change, and databases are moved.  When your query is initially designed, your databases might be on several different machines and require the linked server connection.  Later, when servers are consolidated, these databases may all be on the same server.  If the code isn’t solid enough, you might build linked server connections to servers that no longer exist – or worse, the local server itself.

Bottom line: before building your own linked server connections programatically, whiteboard out every step in the process, and assume that it will fail sooner or later.  Plan for that failure.

Previous Post
Twitter is What You Make It
Next Post
Syndication Hits the Newspaper Business

3 Comments. Leave new

  • I've worked a lot with Linked Servers in a SQL Server & MySQL environment. It's actually pretty solid but best to keep the amounts of data reasonably small. I used a datetime query to pull out recently modified website user records and launch them into salesforce.com. Somehow a mistake on the site cause a null value in my system, MySQL (damn it) happily ran a query that said >= NULL but decided to pull back the entire table of approximately 3.5 millions records. Luckily I caught this in time and managed to kill the process. I would not have been man of the moment had I not!

    Reply
  • Amen, brother. Another issue. Linked server connection sticks around. No one notices the failed job before the next run. Step 1 goes to create linked server connection. Job blows up.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.