Why Are Linked Server Queries So Bad?

Last Updated 3 years ago

And your penmanship is terrible. — No, I don’t want to be your valentine.

Remember when you were in school, and you had a crush on someone? You would write a note asking them to be your valentine, and you’d ask a mutual friend to pass the note over to them.

The adult equivalent is linked server queries.

When your query needs to get data that’s actually stored on a totally different SQL Server, it’s tempting to use linked server queries. They’re really easy to write: after setting up the linked server, just put the server and database name at the front of your query, and SQL Server takes care of everything for you by connecting to the server that has the data you want.

I’ll demo an example using the open source Stack Overflow database. In this example, the Users table is stored locally, but I need to fetch data from the much larger Posts table, which is stored on another server.

SELECT TOP 10 u.DisplayName, p.*
FROM dbo.Users u
INNER JOIN [RemoteServer].StackOverflow.dbo.Posts p ON u.Id = p.OwnerUserId
WHERE u.UpVotes + u.DownVotes < 0
ORDER BY p.CreationDate DESC;

SELECT TOP 10 u.DisplayName, p.*

FROM dbo.Users u

INNER JOIN [RemoteServer].StackOverflow.dbo.Posts p ON u.Id = p.OwnerUserId

WHERE u.UpVotes + u.DownVotes < 0

ORDER BY p.CreationDate DESC;

I’m looking for the most recent posts written by the users with a low number of votes. Performance is catastrophic, taking minutes of query runtime – and no rows are even returned. What two problems are causing this?

Problem #1: linked server queries can copy entire tables across the network.

In this query, SQL Server believes that LOTS of users are going to match, so it decides to just go start fetching rows from the Posts table across the network. Read the query plan from right to left, top to bottom – the thing at the top right was the first thing SQL Server chose to do:

The first thing SQL Server decided to do was a Remote Query – scanning the Posts table across the network, starting from the most recent posts first. The remote SQL Server sends rows, and for each row, the local SQL Server checks to see if the matching user has a low number of votes.

This plan would work great – if SQL Server’s assumptions about a large number of matching Users rows were correct. In some cases, when SQL Server’s assumptions match, queries perform just fine. (That’s really a good chunk of query tuning right there, and as I talk about in my Fundamentals of Query Tuning class, large variances between estimated and actual rows are usually the place to focus on when you’re tuning.)

Unfortunately, that assumption is not correct.

In fact, no users match our criteria.

So the local SQL Server keeps on dragging rows across the network from the remote server, checking its owner one at a time, and eventually exhausts the entire content of the Posts table. Eventually, the plan finishes, and here’s the actual plan:

The linked server query copied the entire Posts table across the network wire, one row at a time.

Yes, that’s 33 minutes and 29 seconds.

Problem #2: linked servers don’t cache data.

If you run the same linked server query a dozen times – even if the rows aren’t changing, even if the database is read-only, even if all twelve queries run at the same time, SQL Server makes a dozen different connections over to the linked server and fetches the data from scratch, every single freakin’ time.

This is another great example of a development pattern that works fine in development, especially with small database sizes – but then falls down dramatically at scale, with larger databases and more concurrent queries.

Even worse, it penalizes both servers involved with the linked server query. It’s hard on the local server, and it’s hard on the remote server that holds the single source of truth for the table.

There are many more reasons linked server queries perform poorly – but those two alone are showstoppers.

So when are linked servers okay?

They’re fine for one-off utility queries, things you only have to do a couple of times. For example, say someone really screws up a table, and you need to restore that table’s contents. Since SQL Server still can’t restore a freakin’ table, a common approach is to:

Restore the database backup onto a different server
Pull or push just the rows you need (or all of them) across a linked server connection

That way, you don’t risk accidentally restoring the backup over the entire production database, and you can take your time picking and choosing the specific rows you want.

Otherwise, if you need data from a different SQL Server, I’ve got some tough advice: connect to the server that has the data you want. If you need fast results, that’s the most surefire game in town.

And for the record, about that note your friend passed to me, the answer is no, I do not want to be your valentine. But if maybe you asked me directly, the answer might be different. There’s a lesson in there.

SQL ConstantCare® Population Report: Summer 2021

Database Performance Tuning is Getting Much Harder.

97 Comments. Leave new

Douglas Coats
July 15, 2021 1:20 pm

Random question: Does this apply to using EXECUTE AT which requires a linked server? Like in stances where you may have no choice but to use a linked server, “execute at” could run those statements on the remote server (which implies whatever statements are tuned/ optimized/dev’d on that server) – what is your opinion on that in regards to this post? (generally curious but i assume im going to get a link to paid training lol)

Reply
- Brent Ozar
  July 15, 2021 1:51 pm
  
  Why not just connect to that server and run the query you want?
  
  (This is me saying, “If you only have one tool and you’re trying to use it for everything, then the answer is to learn more tools, not find creative ways to use a hammer to screw in screws.”)
  
  Reply
  - Lars
    July 16, 2021 7:55 am
    
    Bcos there might be a security policy in the way?
    
    Reply
    - Brent Ozar
      July 16, 2021 7:56 am
      
      If you have permissions to query via linked server, but not query directly, then someone doesn’t understand the permissions you have via linked servers.
      
      Reply
  - Steve
    July 16, 2021 12:51 pm
    
    We’ve suffered from the “favourite toy” problem a number of times at our place. We find a new SQL server feature that seems like magic and works really well for the situation where we initially found it and then start using it whenever we can until we realise that it isn’t good for every problem.
    
    Reply
    - Brent Ozar
      July 16, 2021 12:52 pm
      
      Favourite toy! I love that name, that’s good.
      
      Reply
  - Keith
    July 16, 2021 5:53 pm
    
    The ‘learn a new tool’ thing works great until you end up having to support all the people who learned the new tools poorly and the problems they create. Especially python via python bootcamps.
    
    I don’t agree that use of linked servers are only good for one off uses. They are ok whenever you can tolerate the poor performance. Resources who can write at least fair quality SQL are ubiquitous. resources who can write fair python, .net, java or SSIS packages that can be supported over time are not ubiquitous, are expensive, and are in an international shortage.
    
    For me, turning most of the ETLs that use to run in a maze of SSIS flow hell, into something where most of the data manipulation being done by TSQL through linked servers that 3-4 other staff members could work on without creating a mess, allowed me to actually have the time to work on performance tuning other things where the poor performance was not tolerable, and also deal with far less spaghettification from other staff not knowing how to put together something else better, so that it can later be appended to and modified without becoming a mess. If it does become a performance problem, then it can be dealt with then, no big deal.
    
    Reply
    - Keith
      July 16, 2021 6:19 pm
      
      My mistake, I missed your second bullet point about ‘pull or push just the records you need (or all of them)
      
      Reply
    - Brent Ozar
      July 17, 2021 5:09 am
      
      They are ok whenever you can tolerate the poor performance.
      
      And not just you, but everyone else on the server has to be able to tolerate the poor performance due to blocking, network utilization, TempDB utilization, higher CPU, etc. I say that because sometimes the query author doesn’t mind killing the server – but everyone else has a bit of a problem with that, heh.
      
      Reply
      - keith
        July 19, 2021 3:38 pm
        
        exactly, but its fairly easy to find when it is tolerable by everyone on the server – or just run them during maintenance windows and disregard.
Dave Wentzel
July 15, 2021 1:51 pm

Excellent post. Years ago I worked for a huge ISV and my idea for _some_ analytics was to use Linked Server queries from the DW SQL Server back to the OLTP server. Yep, I heard you gasp just now. So did everyone else…but it actually worked…sometimes. I thought if we took necessary precautions we could limit things like table lock escalations and readers blocking writers (we weren’t allowed to turn on RCSI b/c some DBA read that it was bad for tempdb perf).

Your example query would’ve instantly raised my hackles b/c just thinking through how I would do it if I were SQL Server optimizer … I’d prolly pull the whole table across the network too. I always told people to never do linked server queries unless you exhausted all other avenues AND you got some code reviewers AND you understood predicate pushdown.

I’m too lazy to test this myself but I’m wondering if you can structure that query such that you can get PP to work in your favor. BTW, PP and partition elimination is the Number 1 way to performance tune a big data query on Spark/databricks/presto/trino/starburst/synapse…basically, the rules of query engines are all kinda the same. I always found with linked server queries…if you could get PP to work it was usually ok…the trick was more about ensuring a bad query plan didn’t ruin your day. But there’s tricks for that too.

Thoughts on this simple change:

SELECT TOP 10 u.DisplayName, p.*
FROM ( select * from dbo.Users u WHERE u.UpVotes + u.DownVotes < 0) as u
INNER JOIN [RemoteServer].StackOverflow.dbo.Posts p ON u.Id = p.OwnerUserId
ORDER BY p.CreationDate DESC;

In SQL 2008 that was usually enough to force the join ordering and ensure PP worked. Not sure how it works on modern sql versions. If it didn't work I'd sometimes resort to temp tables, CTEs or other tricks to ensure the optimizer didn't defeat my creativity.

Admittedly, you are right…this is probably a huge foot gun.

Reply
- Brent Ozar
  July 15, 2021 1:53 pm
  
  “I’m too lazy to test this myself but I’m wondering if you can structure that query such that you can get PP to work in your favor.”
  
  Dave – just to make sure I understand – I took the time to write a post to teach you for free, and I went to the lengths of using open source databases so you could reproduce it yourself, and I EVEN GAVE YOU THE CODE, and you can’t be bothered to test your own ideas?
  
  Did I capture that correctly? 😉
  
  Reply
  - Dave Wentzel
    July 15, 2021 1:55 pm
    
    Yeah, you did. I’m ashamed. Welcome to the internet.
    
    My point was more…if we agree there may be times when linked server queries solve problems…are there tricks that may eliminate risk?
    
    Reply
    - Brent Ozar
      July 15, 2021 1:59 pm
      
      That’s a great question! You should find them, and then write a blog post about them to share what you learned with the public. 😉
      
      (Don’t worry, I’m chuckling as I write this.)
      
      Reply
      - JohnR
        July 15, 2021 4:20 pm
        
        suggestions on where to post findings / suggestions?
      - Brent Ozar
        July 15, 2021 4:22 pm
        
        Sure, start a blog: https://www.brentozar.com/archive/2008/12/how-start-blog/
        
        Or post an answer: https://stackoverflow.com/help/self-answer
      - Robert Eder
        July 15, 2021 6:49 pm
        
        You know you have watched too many Brent videos when you can read his response and visualize all his colorful expressions he would use in his videos. There may even be one or more where he has given this response. It makes reading the response so much more entertaining!
      - Brent Ozar
        July 15, 2021 6:55 pm
        
        HAHAHA, thanks sir.
Sql Martin
July 15, 2021 3:06 pm

Thank you for your post, but in my opinion, there are more than 2 scenarios in which you might consider using a linked server.

I haven’t experienced a problem in the following situation. I had to transfer and modify data from one server to another.
Let’s name server A as the remote server and server B as the local server.

There was only one table that should be processed on server A.
There is an index on the column that represents the status of every row. My approach was to query only rows that were not transferred and copy them into a temporary table on server B. I decided to use chanks less than 5 thousand records in order to avoid any kind of table locking. After the work has been done on server B, I had to update the status of the working table on server A. The first operation copying data to the temporary table on server B was accomplished without using transaction.
The last operation updating the working table on server A was part of the transaction. The working table allows only insert statements.
And this solution of T-SQL working just fine.
I was able to pull up and transform up to 5000 records from one server to another.
The requirement was to do the job every 15 minutes.
What would be the alternative for that? And how long does the developing process should take?

Thank you for your post again, and I hope my writing is understandable enough.

Reply
- Brent Ozar
  July 15, 2021 3:14 pm
  
  If you need to move data, I’d use a tool designed for that, like SSIS or Azure Data Factory.
  
  Like I mentioned to another commenter here: (This is me saying, “If you only have one tool and you’re trying to use it for everything, then the answer is to learn more tools, not find creative ways to use a hammer to screw in screws.”)
  
  Reply
  - Sql Martin
    July 15, 2021 3:42 pm
    
    SSIS ends up using T-SQL. So, there is no big difference. On the other hand, SSIS involves another level of complexity.
    IMO we are talking about two tools that are basically the same.
    So, I would like to point out that remote queries might be useful, and if you prove that do not harm your system, I don’t see why don’t use them.
    
    Reply
    - Brent Ozar
      July 15, 2021 3:51 pm
      
      SSIS does indeed use T-SQL, but not linked server queries. It can use much more efficient ways of moving data, caching changes, change detection, and much more.
      
      I would strongly, strongly disagree with the statement that linked server queries and SSIS are the same. I’d say that if you truly believe they’re the same, it’s time for you to start learning more about SSIS or Azure Data Factory.
      
      Reply
      - Sql Martin
        July 15, 2021 4:00 pm
        
        I would like to repeat that remote queries might be useful, and if you prove that do not harm your system, I don’t see why don’t use them.
        So, I agree to disagree.
      - AM
        March 25, 2022 5:31 am
        
        Could you point towards some details on how SSIS connection to SQL server differs from linked server? The DBA is insisting that SSIS packages on server A *can’t* truncate staging tables on Server B because cross-server truncate isn’t supported and sends me links about how you can’t truncate across linked servers. I’ve tried explaining that SQL connections in SSIS aren’t the same as linked server with no luck. We have this configuration working in several other servers – it’s just a matter of setting up the package host SQL Agent user on the target database server.
        Is there some reason to avoid using truncate table in an SSIS package when the package is hosted on a different server than the database?
      - Brent Ozar
        March 25, 2022 9:34 am
        
        I don’t blog about SSIS here, but check with Andy Leonard at EntDNA.com.
Aaron
July 15, 2021 3:50 pm

We have found that using Openquery instead of 4 part naming can help mitigate SOME of these issues.

Unfortunately, our system architecture migrated from a data warehouse for item information into a full OLTP system for maintaining items as we removed a retched 3rd party item maintenance system. Eventually we’ll migrate the OLTP piece back out of the DW database, but of course other things have had higher priorities, so here we still are doing things wrong in production, because priorities.

Reply
- Brent Ozar
  July 15, 2021 3:53 pm
  
  Openquery involves hard coding connection info into queries, though. Reminds me of the time a client needed to fail over to their disaster recovery system, but their app kept failing because people had used IP addresses in their openquery commands. Good times.
  
  Reply
- Sql Martin
  July 15, 2021 3:54 pm
  
  Just a side note, you can always replace your four-part name using synonyms.
  
  Reply
- Keith
  July 16, 2021 6:22 pm
  
  you can also get bad plan cache bloat through openquery, sometimes even if you enable ad hoc query optimization. you can write your dynamic sql to parameterize the passed query, but it is tough, time consuming and horribly sensitive to embedded quotes – you may just want to recompile it each time it runs if you are doing that
  
  Reply
Jeff
July 15, 2021 4:01 pm

Hi Brent – do you foresee PolyBase with pushdown optimization helping this at all? Have you seen any adoption of it?

Reply
- Brent Ozar
  July 15, 2021 4:17 pm
  
  No, I haven’t seen any adoption of PolyBase.
  
  Reply
  - Recce
    July 16, 2021 7:51 am
    
    We are making VERY limited use of Polybase to support an ETL process from a hosted Oracle DB.
    
    The “legacy” SSIS ETL does an incremental load on a rolling window, but changes are made outside of that window. So SSIS for the ETL, and Polybase to correct any “drift” detected outside of the load window.
    
    Reply
James Smth
July 15, 2021 4:25 pm

I agree with your analysis and SSIS is the go to tool. I use Linked Servers occasionally when I have the following need – one quick no return read of a complete (small) table that I need to link locally. I construct and use my temp table for my local need never to return to the share database again. My reasoning, it is super easy and once setup doesn’t require anything more from me. SSIS is an underated tool that accomplishes amazing feats which is also its drawback – it looks complicated to those that have not dived in.

Reply
- Keith
  July 16, 2021 5:58 pm
  
  It’s other issue is that while it seems intimidating and complicated at first, once most people learn it just a bit, it is so easy to work with that it enables lazy development that creates a mess later.
  
  Reply
David Markham
July 15, 2021 4:32 pm

I have inherited systems with a host of linked servers for SSRS reports. How would you suggest I start to get the ball rolling to move away from linked servers.

Reply
- Brent Ozar
  July 15, 2021 4:38 pm
  
  Read the second to last paragraph of the post.
  
  Reply
  - Brent Ozar
    July 15, 2021 4:42 pm
    
    Just to be clear – I’m not saying you need to go proactively rewrite all of your SSRS reports! That’s hard work. But when people come to you with performance complaints on a report that uses linked servers, try connecting directly to the server that has the data you want, rather than using the phone-a-friend technique, and see if that helps.
    
    Reply
    - David Markham
      July 15, 2021 4:56 pm
      
      Gotcha. Thanks.
      
      Reply
Simon Holzman
July 15, 2021 4:45 pm

The benefit of using Linked Servers is that individual users do not need permissions to run queries on the remote server. We also do not need to run a standalone application to talk to both databases or to have access to the remote database assigned to the Computer running the job – the relationship is Server to Server.

Linked Server queries were the bomb (in a good way) in SQL Server 2008 and (I think) 2012. However, Microsoft Borked them in SQL Server 2014. The impact was when sending data to the other end while receiving data was still fine so my impression was that Microsoft started doing a permissions check on each record sent rather than doing a single permissions check at the start of the query.

When I say “borked”, what I mean is that a query that literally took under a minute to run in SQL Server 2008 took over 24 hours to run in SQL Server 2014.

The result is that, if you are reading data remotely, it still runs blindingly fast but, if you make that read dependent on anything at your end, it slows to a crawl because that is acting as sending your data to the other end.

The kludge that I have developed is to have a job running at both ends which basically talk to each other to pull the data that the other side makes available by sending a really small update.

Thus, if you want to send updated data from Server A to Server B, Server A pulls the current data on Server B, generates a dummy table with the new data, and then tells Server B to pull that dummy table. Server B then pulls that dummy table and merges the data into its own tables.

Yes, this is far from ideal but it works and is still blindingly fast compared with most alternatives.

Reply
- Brent Ozar
  July 15, 2021 4:50 pm
  
  Check in with your security folks though. It sounds like you’re using permissions set by an individual account, which means that anyone who can run a query on server A gets the same permissions on server B – able to see all kinds of things that they may not be supposed to see. That’s a nightmare when you’re hit by things like SQL injection attacks and privilege escalation attacks.
  
  Reply
  - Simon Holzman
    July 15, 2021 6:09 pm
    
    I appreciate that risk but the logon used by the Linked Server can have specific permissions so it does not have to be any more of a security hole as any other access is.
    
    Reply
    - Keith
      July 16, 2021 5:59 pm
      
      it does. unless you are using only SQL authentication to map logins to linked servers (which has its own major security problems) anyone with a public logon on the server can use the linked server.
      
      Reply
Jane Fleming
July 15, 2021 5:14 pm

I have a situation where two different SQL servers with databases used as warehouses have some related information.
I create cache tables on Server A for the information I need that gets updated on Server B.
When Server B finishes his daily refresh, his SQL Agent refresh job triggers a SQL Agent import job on Server A.
Server A truncates his cache table and then does a limited select to Server B (a linked server) to repopulate his cache table.
For the rest of the day thereafter, Server A’s queries run against his cache tables.

Live queries or joins across a linked server, though… nope.

Reply
Kerry Bossingham
July 15, 2021 5:15 pm

Running raw scripts whether from an app or over a linked server I agree is a bad. Encapsulate your query in a stored procedure on the target server and return only what you need.

Reply
- Sir Logs Alot
  August 15, 2022 4:24 pm
  
  Great tip!
  
  Reply
Todd Gilbert
July 15, 2021 5:59 pm

You are wrong about where linked servers are ok. Linked Servers can be effective for heavy duty applications, however getting linked server queries to perform well is hard. Really hard! Creating fear in a really effective tool is not helpful.

Reply
Jeff Shervey
July 15, 2021 6:06 pm

Great article as always. Truth is… I find them equally knowledgeable and cynical/funny at the same time. Your a scholar, author/blogger and a comedian all rolled into one.

Reply
Pat
July 15, 2021 8:16 pm

Our IT department has Linked Servers across many SQL Servers where we report our data out of. We are in a transition state for the time being until we can get all of our main data sources onto one large server. I don’t know if this was mentioned above but, I have had great performance using Linked Servers when I do a bulk copy of the data on one server and then create an internal table on the main server I’m running on. If I need say, Employee Number, Name, State, and Zip and it’s on another server, I just bulk copy over all the rows for just the columns I need and use mostly no criteria in the WHERE clause. I have had success when coping over10’s of millions or rows over to my main server within a minute or two depending on the table and column sizes. Then I JOIN that internal table to whatever data I am using on the main server I’m running on. I discovered that doing a JOIN on tables across the network to other servers (as mentioned above) is not a good idea unless it is a very small volume of data. If you have small volume then there is no need for the internal table. I hope this helps.

Reply
Gabriel Martinez
July 15, 2021 8:42 pm

Interesting post, any comments when you are link to a Oracle server, what are the recommendations

Reply
- Brent Ozar
  July 16, 2021 4:30 am
  
  Reread the second to last paragraph of the post.
  
  Reply
Kaye Lynn
July 15, 2021 9:07 pm

###### AGREE ######
I use SQL transactional replication to move data where it is needed to for:
data manipulation for integrations
lookup values
reporting
etc.

It keeps apps more modular/independent.
It keeps vendors out of other vendor’s SQL Servers to maintain the vendor’s OLA independently. AKA no finger-pointing.

It has worked well in replacing linked servers and improved performance all the way around. I know some are reluctant with SQL replication, but it has been a robust product since the inception. I actually used SQL Replication on SQL Server 4.2 to move lookup values to SQL Servers across the US in the 90s. Used Access as the front-end to manipulate the data.
Yep, I’m an old bird!
😉

Reply
- Keith
  July 16, 2021 6:01 pm
  
  but then you also have to deal with unreliable transactional SQL replication, and in the future, all the crappy queries that get written against the raw data structures of the live database
  
  Reply
Jeff Huelskamp
July 15, 2021 9:34 pm

I had always been told by my trusted DBAs that linked servers and linked server queries were a terrible idea, so we never used them at a previous job I had at a Fortune 100 company. I took over as the App Dev Manager at a different, smaller company a few years ago. It wasn’t long before I realized I was living in the 9 circles of hell, and linked servers were the main reason why. We are just now beginning to get our hands around all the tentacles previous regimes implemented, and I am hopeful before I reach retirement I am able to untangle this mess.

Having lived both sides, I would say this…save yourself some future heartburn and uncountable amounts of technical debt…just don’t use linked SQL servers…ever.

Reply
- Brent Ozar
  July 16, 2021 4:30 am
  
  Hahaha, 9 circles of hell, yep.
  
  Reply
tomas
July 15, 2021 10:42 pm

I would like to know what the future of linked servers will be. Because they are very practical, but they are performance killers in an environment where you have hundreds of developers who cross information between hundreds of tables, it begins to make a very large administrative burden demand when they fail for the records in dns and wins, it is a monserge to make them work. Microsoft should improve them in performance and especially in security.

Reply
- Brent Ozar
  July 16, 2021 4:30 am
  
  Tomas – interesting. Can you post links to the feedback items that you’ve posted on feedback.azure.com?
  
  Reply
Christoph
July 16, 2021 6:04 am

Hi Brent,

thanks for your insight. In certain environment it seems to be difficult to work around this problem.
We for example, have two main sql servers. One of them dedicated to the dealer management system. Which in that case is obligatory and must not be touched (Server A). The other one (Server B) for the developers and the “good” applications.
In some cases synced tables from A to B are indeed the only way to keep the execution time low, in other cases openrowset subquerys in joins seemed to be the right answer, but were slower in the end, because we had issues with index usage. https://ibb.co/Tt5r8q3
So in my experiance linked servers are not always a bad choice. But since iam not a professional DBA you might have an idea on why that might have been our fault 😉

Reply
- Brent Ozar
  July 16, 2021 6:07 am
  
  If you’re running linked server queries from B to A, then you’re already violating your requirement that server A must not be touched.
  
  Linked server queries are still queries, period, full stop.
  
  Reply
  - Christoph
    July 16, 2021 8:16 am
    
    Ideally – at least to fix this problem – all DBs should be on the same Server.
    To be precise: i meant “must not be touched” in the sense, that we can not just move A’s DB to Server B or vice versa.
    
    Reply
    - Brent Ozar
      July 16, 2021 8:34 am
      
      OK, so then you can connect to the server that has the data you need. That’s the approach I recommended in the post, and I explained why. Thanks!
      
      Reply
Accidental DBA
July 16, 2021 6:56 am

I work in a large global company that acquires new businesss every other day. We run many versions of sql server, across lots of domains, connecting through linked servers, using dynamic sql, xp_cmdshell everywhere. All the things you’re supposed to not do! It’s all legacy and there isn’t much cooperation or support from the business to improve anything.
Happy days (not)!

Reply
gerionimo
July 16, 2021 7:25 am

Great post mate!

Question though: I work in a datascience team and we’re using a linked server construction to read data from out DWH to our sandbox machine where we do all our datascience. It’s great, because it’s flexible and flexibility is what we need when we’re exploring the data. Performance is occasionaly very poor, and it’s becomming something of a running gag in the team. We can manage because flexibility is more important than performance. But still… I mean, there’s nothing more annoying than waiting for a 15 minute query to finish, get a coffee, come back and see that you’ve mistyped something in the where statement…

Anyhow, we looked into SSIS, but since most our analysis our one-offs, it’s more work to build the SSIS package than to wait for a query.

My question: isn’t there a third option that
1) is as easy to use as a linked server,
2) doesn’t invole SSIS packages or Datafactory,
3) is even the slightest bit faster than linked server

Reply
- Brent Ozar
  July 16, 2021 7:27 am
  
  Yes, restore a copy of the data warehouse to your sandbox. If your production DWH uses SAN snapshots, it can take a matter of seconds. Check with your DBA and SAN admin teams.
  
  Reply
  - gerionimo
    July 16, 2021 7:51 am
    
    Sure, that makes sense. It would definitly speed up our queries. I’ll start ordering some extra disks right away 😉
    
    But what about Polybase? A few posts back you mention that you haven’t seen any adoption for Polybase. Could this be a case where Polybase comes in handy?
    
    Reply
    - Brent Ozar
      July 16, 2021 7:52 am
      
      You don’t have to order extra disks. Do me a favor: hit the brakes on telling me “facts” until you’ve talked with your DBA team and SAN team about what snapshots are and how they work. It doesn’t involve doubling storage. Thanks!
      
      Reply
      - gerionimo
        July 16, 2021 8:30 am
        
        Thanks for the reply, I’m gonna have a look at those snapshots!
        
        p.s Note to self: don’t be sassy to Brent unless you know what your talking about.
      - Brent Ozar
        July 16, 2021 8:33 am
        
        😉
Travis
July 16, 2021 7:33 am

How do remote queries differ from external tables in SQL 2019?

Reply
- Brent Ozar
  July 16, 2021 7:39 am
  
  That’s a great question, but it’s outside of the scope of this blog post. Your best bet might be posting that at https://dba.stackexchange.com where someone might post a detailed comparison.
  
  Reply
- Brent Ozar
  July 16, 2021 7:40 am
  
  I should add – if you believe they’re better, and you’re considering replacing your linked servers with external tables, it’d be wise to test the performance differences in your specific scenario. For example, set up the same things I did here in this blog post, and compare/contrast the performance differences.
  
  Reply
  - Travis
    July 16, 2021 7:52 am
    
    Thanks Brent. I reviewed the documentation and it seems that external tables have the ability to scale out with Polybase whereas linked servers are single threaded, but as you pointed out nothing is better than running a test for specific work loads. As always much appreciated for the article.
    
    Reply
    - Brent Ozar
      July 16, 2021 7:53 am
      
      You bet!
      
      Reply
Recce
July 16, 2021 7:47 am

Linked servers are so easy to use it’s no surprise they are misused. But whenever coding involves cross a boundary (process, machine, internet), we need to stop and think about it. I remember a brief period with .Net when the idea was we shouldn’t worry about it and treat local and remote calls exactly the same. Shortly followed up with the concept of “chatty vs chunky” when making remote calls.

Reply
- Brent Ozar
  July 16, 2021 7:49 am
  
  Absolutely, and it’s been interesting to see people hit this problem again with cross-cloud and hybrid computing. As Jeremiah Peschka said, hhhhheeeellloooooo laaaaatttteeeennnnccccyyyyy.
  
  Reply
Simon
July 16, 2021 11:00 am

SQL Server Can restore a table, you just have to put each table in it’s only filegroup and back them up separately. Simples. 🙂

But on a serious note, I’ve come across the “Copy the entire table across the network” issue myself a few times. To make it worse, those who went before me tried to fix it by breaking the query up into chunks and running it in a loop, so the table copied across for the entire duration of the loop. As this was a single threaded archive process, the solution was to execute a remote query which would identify the rows needed for the batch and store their IDs remotely. Then execute a second remote query which returned rows for the stored IDs. Much faster.

Reply
George Stocker
July 16, 2021 1:08 pm

Interestingly for me what I run into is that people assume that once they connect to SQL Server that it’s all magic and there’s no network in the way.

And people — unnamed people — and certainly not the smart people who read this blog forget that the Fallacies of Distributed computing apply with Linked Servers too.

https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing

Reply
- Brent Ozar
  July 17, 2021 5:09 am
  
  George – very true!
  
  Reply
Kelly Herald
July 16, 2021 5:28 pm

I’ve seen linked server joins behave in 2 ways.
1) What you mentioned – SQL Server could choose to transfer the whole table over locally and then perform the join to the local table.
2) SQL Server could choose to send a query per every local record to the remote server a.k.a. RBAR, Row By Agonizing Row.

Both of these are serious performance killers. Whenever I’m reviewing someone’s code and I see a join of a local table to a remote table I immediately flag it as an issue to be reviewed further.

Reply
Russell H
July 16, 2021 10:13 pm

You’re right. There is a lesson in there. Thanks for this blog post.

By the way, another problem that I recall with linked queries from way back is that the table on the other side of the link needed to have a clustered index — no heaps or you couldn’t see any data.

Reply
Bala
July 16, 2021 10:15 pm

great post!

We have applications where they need to work with tables from two different sql servers.

Will Java or hibernate (with 2 data source connection ) handle this situation better than linked server ? Or they will do similar “pull the whole table and do the join on the application layer” kind of thing.

Reply
- Brent Ozar
  July 17, 2021 5:07 am
  
  Bala – tools like Java and hibernate are just tools. It’s up to you how you use those tools. Nothing’s going to be automatic and fast: you’ve gotta learn your tools.
  
  Reply
Mehdi
July 17, 2021 12:23 pm

Hello!
This is not a question and I do not expect an answer.
I always create a stored procedure on a source server and then call it by the linked server and no performance problems
Best Regards!

Reply
- Josh
  October 5, 2022 4:09 pm
  
  Yes, Mehdi, late post here but if you must use them, that is the way to do it in my experience. Pulling the data from target to source will copy it in one batch. Executing it at the target from the source so that it does a push from source to target breaks it up into smaller batches. You can see this in profiler or other newer ways of tracing.
  
  Having said that I agree with Brent that not using them is a better option – Just know all the options and know them well. That way there is a bigger tool chest to grab from.
  
  Reply
Sean C
August 13, 2021 1:42 pm

WOW…. I have never used linked server queries and after reading this, I will avoid using them like the plague. It’s like built-in, forced RBAR.

Thanks for the info!

Reply
Imran AShraf
September 15, 2021 5:01 pm

Just come across this post as I set up a solution using Linked Server as it was the only way I could get something set up to allow test team to test a solution.
The users maintain data on a separate SQL Server Instance A via a 3rd party application but they want to join this data to create report from SQL Server B. Getting the data via batch process was not going to be quick as the data updates can happen any time and the users running the reports can happen in adhoc frequency.
For this solution we are just looking at pulling 8000 rows.
Anyone other option that anyone can think of?

Reply
Sam
October 13, 2021 5:23 am

Hello!
I setup two databases on same instance and when I do cross database queries the execution plan shows the remote query 100% and when I do the same with linked server it shows the same. What I understand is query plan is stored with the database scope anything out of database scope is considered as remote query.
So the question is will there be any performance impact if I have databases on separate instance rather than same instance considering the query plan. One thing can be network between the servers is there any other factor to be considered

Reply
- Brent Ozar
  October 13, 2021 7:13 am
  
  That’s not really the point of the post. For questions, head to a QA site like dba.stackexchange.com.
  
  Reply
Pete Z
November 3, 2021 3:45 pm

We just ran into a performance brick wall last night with a change to a linked server query.
Server A (SQL Server 2012)
Server B (SQL Server 2005, gasping its last breaths…)
Server C (SQL Server 2016 & well equipped).
For various reasons, a nightly query was switched from “Server A join Server B” to “Server A join Server C”. The original join would take less than 10 seconds to retrieve relevant data from Server B, reliably. Unfortunately, the updated query took 4 hours (!) which caused all things downstream to be wrong or incomplete. Apparently my colleague hadn’t tested the query change before putting it into production. Really scary how changing the linked-server join to a “better” server yielded such terrible performance. As you can image, the IT team is mulling over next steps this morning…

Reply
Mehal
November 5, 2021 12:12 am

Hi Brent,

Thank you for this article. It is really informative and issues everyone facing is eye opening as well. I also have an issue and would like your feedback on it.

My Issue:
We have recursive processes that runs every night and executes around 40 stored procedures. Each of these SPs, have a select query (with where clause and linked server connection to Source Server DB) and result if this select query is inserted on the target server DB. SPs are triggered from target server. And whole ETL process takes hours to finish.

possible solution:
After reading this article and couple more, I am thinking,
>> rather than using SP to get data from a different server, lets use SSIS
>> SSIS can have its connections with both Source and Target servers.
First task will execute same select statement with where clauses and all directly on Source Server and store result set in a system object
Second task can load this system object on target Server DB.

My fear:
This plan looks fine, but we are fetching and processing more than 100,000 records each night. Will SSIS system Object be able to hold this much data?

I would really appreciate if I can get opinion about this issue.

Thanks and Regards,
Mehal

Reply
- Brent Ozar
  November 5, 2021 12:14 am
  
  Hi! For custom advice on your own situation, you can click Consulting at the top of the screen.
  
  Reply
David Robertson
February 28, 2022 6:56 pm

Hello,
We have a very large data warehouse covering multiple servers and straight forward linked server queries are “slower” but work well for us with 3 caveats:
1) Query Optimizer is “index aware” for the remote tables – but ONLY for Tables. Linking to a remote View causes table scans. Convert your local query to only hit only remote Tables and it is index aware again.
(btw, I’ve never seen this simple fact published anywhere, ever)
2) Make sure and don’t inadvertently(and unnecessarily) refer to a local asset as a linked asset.
Even though it is “local” it will do the slower “remote query”.
3) Don’t select p.* unless you really need all the columns, “all-the-columns” tends to result in table scans.

Reply
Roeland
March 16, 2022 9:11 am

Hi Brent,

Thanks for your blog, I stumbled upon it while searching for clues why our dba saw a lot of full table scans pounding our netwerk and database. Yes, indeed, a query over a linked server.

I thought I could outsmart it by adding a very limiting WHERE clause over the remote table. To my surprise, the actual execution plan still included a full table scan on the remote table and the filter was only applied locally after the entire table had been transferred. I ended up creating a view on the remote table and use this one in the query.

Reply
Brett Flippin
August 15, 2022 4:42 pm

Nothing like a linked server post to get everyone in their feelings. We all build bad solutions. Learn and move on and pretend this feature doesn’t exist in SQL Server. Your life and the lives of those maintaining your solutions after you move on will be better for it.

If a linked server is the ONLY way you can come up with to achieve your goals you need to buy books and training. I’d rather see a convoluted BCP ETL than linked servers. Or learn SSIS or Python or Powershell or C# or *gestures wildly at massive ecosystem of free and licensed ETL and reporting products*

Reply
Martin
August 15, 2022 5:02 pm

Here is one way to effectively use a remote query, and that is when the whole query (and thus the plan) is run on the remote server and the local server is just used to store the results in a table.
That way the query runs on the remote server and only the output is stored locally.

You need to setup the linked server properly.

Sometimes this is quicker then using a SSIS package, or export/import data task. Especially since DFTs are annoying as hell when it comes to changing meta data.

Reply
Brian T
August 16, 2022 2:21 am

Consider this, a Power Platform azure logic app connection through an on premise data gateway to a DB2 for i DB takes 13s to execute a 1 row select.

Same Azure Logic app, same data gateway thing to a MSSQL server, querying the same table via Linked Server? 3s.

Why?

Reply
Mark Earleywine
November 29, 2022 4:16 pm

I just learned about the “REMOTE” join hint that you can use to force the query to make the join on the remote server. So it seems like, in this specific case, that could be something to help improve performance. It took the query that ran 6 minutes in my test environment down to 1 second. I’m sure there are other drawbacks here I’m missing, but I just wanted to share. https://www.brentozar.com/pastetheplan/?id=SJGlxn7wo

Reply
- Roger
  September 1, 2023 8:46 am
  
  Thank you very much Mark, exactly what I needed 🙂
  
  Reply