Linked servers let users query different database servers from a central SQL Server. Some uses for this might be:
- DBA utility scripts – from a central repository server, we could query all of the servers in our environment to check for things like backup dates or configuration settings
- Consolidated reporting – we could gather results from multiple databases on different servers and join them together into a single query
- Moving data from platform to platform – if we had an Oracle, AS400, etc platform and we needed that data in SQL Server, we could fetch it periodically through a linked server query
All of these work great in theory. The danger comes in when we start scaling up with more data or more linked servers.
As we deal with more and more data, linked servers are less efficient. Say we’re joining a ten million row sales table on Server A with a ten thousand row customer table on Server B. That query might run fine if we run it from Server A, with Server B set up as a linked server. Since Server B only has to pull 10,000 rows and push them all through the network to Server A, we may not see big performance problems. However, if we try to run that query on Server B, with Server A set up as a linked server, we can have truly horrendous performance depending on how we write the query.
As we scale out to more servers, the temptation arises to build the linked server connection on the fly, like this:
- If a linked server connection does not exist to Server B, build it
- Run our query
- If we added the linked server connection in step 1, then delete it
We’ve now introduced a problem: if our query throws a ginormous error and bails out, we never delete our linked server connection. We have to be very careful about error checking in step 2, especially around deadlocks. If a deadlock arises and SQL Server kills our connection, we’re really in trouble – the linked server connection will stick around.
Worse, if the query takes any time whatsoever to run, we suddenly introduce at least two scenarios where this process will fail:
- If a second automated query fires up while the query is running, it too will check to see if the linked server exists. When it doesn’t, it will start running a linked server query. When our original query finishes, it will try to tear down the linked server connection while it’s still in use by our new query.
- If a human being (damn those meatbags!) sees a linked server connection that shouldn’t exist, they might try to delete it while our query is running.
Another potential problem is that server names change, and databases are moved. When your query is initially designed, your databases might be on several different machines and require the linked server connection. Later, when servers are consolidated, these databases may all be on the same server. If the code isn’t solid enough, you might build linked server connections to servers that no longer exist – or worse, the local server itself.
Bottom line: before building your own linked server connections programatically, whiteboard out every step in the process, and assume that it will fail sooner or later. Plan for that failure.