Database Connection Hazards with Entity Framework

SQL Server
23 Comments

I recently came across a curious case where a SQL Server was suffering a number of long-running queries coming from an application written in Entity Framework. When I measured the average query execution times, I got some unexpected results:

  • CPU time: 12 milliseconds
  • Reads: 273
  • Elapsed time: 6800 milliseconds

Wait, what?

Looking at the wait stats for these queries, I saw there was a lot of ASYNC_NETWORK_IO — often 1000+ milliseconds. That didn’t make any sense either! How can a query with so little CPU time and so few reads take so long to complete? It’s not like the application was asking for millions of rows and couldn’t consume the results fast enough.

I ran the query myself in Management Studio:

  • CPU time: 17 milliseconds
  • Reads: 273
  • Elapsed time: 155 milliseconds

These numbers made a lot more sense, and confirmed that the application was to blame. But what exactly was the application doing for 6645 milliseconds?

Entity Framework will try to be clever about connections

After some research, I found that Entity Framework can be sneaky about how it manages database connections. An article about EF connection management on MSDN proved both enlightening and slightly terrifying:

Entity Framework Connection Management

Entity Framework will handle database connections automatically by default. Note two things here: EF will open the connection if you specify any LINQ or ObjectQuery method, and that connection won’t be closed until the ObjectResult has been completely consumed or disposed.

In this case, the EF code invoked the Where method, then went on to do a number of foreach loops, assuming the connection had been closed when it hadn’t. SQL Server was left holding the phone until EF said good-bye and hung up. The solution here was to open the connection, do as little as necessary, and then close the connection.

UPDATE: EF MVP Julie Lerman (@julielerman on Twitter) mentions in the comments below that the MSDN documentation is outright wrong about some methods opening a connection, and has passed this on to Microsoft so they can correct the error.

This would be the end of the connection management story, except…

Entity Framework would like to announce it will no longer be clever about connections (when you open them yourself)

Another MSDN article about EF connection management points out changes to Entity Framework 6 and later:

EF6+ Connection Management

 

Again, the seemingly innocuous and trivial “Note” is anything but. It’s a reasonable assumption on Entity Framework’s part; if you’re going to open it yourself, you’ll be the one who closes it. Still, it means we have to be careful with Entity Framework code when it comes to database connections. And now, depending on the Entity Framework version, we’ll see one of two different connection management behaviors.

How to spot Entity Framework keeping connections open

The tell-tale signs, as we discovered in this case are:

  • When running the query from the application
    • Relatively low CPU time but high elapsed time when running the query from the application.
    • ASYNC_NETWORK_IO waits for the query
  • When running the query from SQL Server Management Studio
    • Relatively similar CPU time and elapsed time when running the query from Management Studio.
  • Significant amounts of application code that execute in between the connection open event and close event. To prove the connection is left waiting during the open and close events, step through the code in a debugger and pause before the connection is closed. You should see the query racking up ASYNC_NETWORK_IO waits. (Remember, the events that open and close the connection may not be explicitly doing so.)

IT’S NOT A SQL SERVER PROBLEM. IT’S An entity framework problem.

Entity Framework is great for developers who don’t have the spare time or motivation to learn SQL Server querying, but that convenience comes with costs. One of those costs is keeping a suspicious eye on how it manages database connections. It’s tempting to look at a long-running query and blame SQL Server for being slow. However, if this happens to you and your Entity Framework-based application, it’s worth investigating further to see who’s leaving whom waiting.

Brent says: don’t be ashamed of using EF, either! I’m all about getting to market as quickly as possible. After all, if you don’t get your app out there, you don’t get paid, and DBAs need to get paid too.

Previous Post
Logical Query Processing
Next Post
The Easiest Way to Restore Transaction Logs to A Point In Time (Spoiler: Use AmazonRDS)

23 Comments. Leave new

  • Calling Where does nothing. Where is not even an EF method, it’s a BCL method. Starting to enumerate a query is the thing that opens the connection.

    I believe EF has an option to eagerly buffer query results. That seems like a good idea. ToList certainly does this.

    EF has different pitfalls than T-SQL. Both have them.

    Reply
    • Jonathan Allen
      July 24, 2015 11:42 pm

      It wasn’t the Where that caused the problem, but rather the combination of Where and the for-each loops.

      If instead he used Where(…).ToList(), it wouldn’t have held the connection open.

      Reply
  • I wholeheartedly agree with Doug re being very wary of depending on iteration to trigger a query execution because yes, it will hold onto the connection until the data has returned. That’s streaming for ya! Databinding directly to queries is another practice that should come with red flags. I much prefer and always advocate the explicit LINQ query execution methods e.g. ToList mentioned above, FirstOrDefault, etc. Those execute, open connection, grab results into memory and close connection.

    I also am a huge fan of keeping an eye on a profiler. If perf is bad, I don’t blame SQL Server, but the profiler can give hints about what is going on that can lead you to fine tuning your EF code or just using a stored proc or view where it just makes a lot more sense. Re the need for profiling being elevated by using ORMs – well, okay, I get Doug’s point … but that’s one which is probably a never ending back & forth. 😉

    But as the above comment mentions Where etc don’t open a connection. But this is a *documentation problem* — I can’t believe the docs state that Where, OrderBy and Select will open a connection. That is definitely an alarming statement, but it’s not correct. And not Doug’s fault. The connection gets open on execution. I was so surprised to read that in his screenshot that even after many years of working with EF, I went back to profile & debug to make sure I wasn’t going crazy.

    I’ll find out how to get that documentation changed.

    And, no I’m not ashamed to use EF. 😉

    Reply
    • Thanks for pointing out the error (and asking MS to fix the docs). This post isn’t meant to dissuade anyone from using EF — just to be aware that database connections need to be handled thoughtfully, otherwise the database server can easily get blamed for long-running queries.

      Reply
  • Well said Doug… I’d just backup Julie a little bit to say this is not an EF only problem, but a problem with any type of data access technology that is not used carefully. I disagree a little bit with the tone of EF being the problem… It’s really the improper use, poor understanding of the tool.

    Since it’s inception, best practices and recommendations for development with EF (and other ORMs) have included numerous warnings for application programmers to be aware of… None of which were more prevalent than the cautions of persistent open connections. The most ubiquitous being the recommended use of USING statements around EF queries.

    A well designed EF application will have this sort of pitfall dealt with in its architecture, thus steering the naïve developer away from such a bad practice.

    And as Julie states, monitoring queries delivered by your ORM is always advisable.

    And as Doug states, proving the DB is not the problem by comparing the query performance within the application and outside is also necessary to pinpoint bad perf.

    Reply
    • Kendra Little
      July 15, 2015 1:20 pm

      I’m sorry, I chuckled a little bit reading this: “I disagree a little bit with the tone of EF being the problem… It’s really the improper use, poor understanding of the tool.”

      I’ve just heard this argument used to defend every horrible technology in the book. I’m not even saying that you’re right or wrong, I’m just saying that defense doesn’t really do ANY technology any favors due to its tradition of use.

      Reply
      • I see your point and quite agree… I hadn’t thought of it from that perspective, but you’re right. I too have seen many technologies defended with the “improperly used defense”.

        Just ask my opinion of Silverlight… Ha!

        Perhaps the best tactic is to make an educated decision and proceed deliberately…

        Reply
        • The trouble with EF starts when you compare it with other ORMs that do not have such pitfalls *by design*. How come? Why EF punishes “clueless” developers while other ORMs guide them to the pit of success? Which release is current, 6th, and still I have to keep in the back of my mind how not to trip my queries? Why would anyone need such poor abstraction in first place?

          Reply
  • If you nest linq queries that haven’t been .ToList()ed thigs get dramatically worse. With just two levels of nesting a couple of queries in an app I service went to 6 minutes from 140 milliseconds.

    Reply
  • Recently we have to deal with the same issue: a very simply query very slow when is executed in code, but fast (as expected) when executed in SSMS.

    A partner of mine (developer) gave me a hand and saw that the type of entities used in the code (EF too, obviously) was marked as “self_tracking”. Don’t know what is really means (he tries to explain me but I’m not a developer, just a simply DBA), but he observed a high use of CPU (saw with Ants Profiler) in the machine where the code is executed when the query was executed.

    So I don’t think the problem is about open/closed connections, because we saw the issue before of after the query was executed, not during the execution as happen right now.

    Doug, could you check this? I mean, if you still have access to the code (or asking directly to the developer), could you see if the entities are “self_tracking” too?

    Thanks!!

    Reply
    • Thanks for the suggestion. It’s too late to check that code now, but this scenario doesn’t sound like it was the self tracking issue you’re describing because we never encountered high CPU — just a wide margin between CPU and duration.

      Reply
      • No, but the high use of CPU I talked about was on the server (different than SQL Server machine) where the code (.NET) was executed, not query (SQL Server) CPU.

        That’s why I said it seems the same issue, related with self_tracking and not with open/closed connections.

        Reply
  • I’m confused, since when is just sitting on an open connection considered part of elapsed time?

    Reply
  • Doug, the underlying message in this post is a good one, think about how the connection is handled with EF, and beware streaming results vs eager loading, although sometimes you do need to stream.

    However you statement regarding EF being for people who “don’t have the spare time to learn sql server” and being for rapid development I find to be a bit of an unnecessary attack on people who use EF professionally for it’s other benefits, and the irony is you post is full of factual inaccuracies, you haven’t taken the time to learn .NET, the BCL or EF. Do we need to be focusing on blame either?

    Please don’t take this as a troll, I love this blog, I just found some of this article to be alienating to developers.

    Reply
    • Thanks for the comments, Stuart. I wrote this post intending to help others (whether DBA or EF developer) diagnose and understand database connections in EF. It wasn’t a slam on EF developers for being somehow beneath SQL Server developers. We both know coding and shipping functional apps is hard work. What you may not know is that I was a web and desktop app developer for several years before committing full-time to SQL Server work, so I *do* know .net. And I probably would have embraced EF if it existed in 2003. (By the time EF 1.0 was released, I had already moved out of the app dev space.)

      EF being different enough from my old .net experience, I had to consult documentation to make sure I understood what EF was doing with its connection. I think you and I can agree that MSDN entires are trustworthy enough to recite as factual; this wasn’t a random blog post or outdated textbook I quoted. I’ve updated the post to reflect that Microsoft has been made aware of the errors in the page I reference. You say this post is “full of factual inaccuracies”, but I’m unable to spot any others. Can you point those out?

      Contrary to what it may seem, I’m not blaming any person for anything. Rather, I’m going through a troubleshooting exercise in which I had to identify the source of a query long in duration on a server that was reported having performance problems. Troubleshooting is supposed to identify a root cause, is it not? I stand by what I said: this incident was due to an EF problem, not a SQL Server problem. That much is repeatable, observable, and quantifiable. I went on to explain what I discovered in this process: EF developers have to be careful with connections because it can handle connections silently, and differently across versions. A simple, [hopefully] helpful heads-up to those working with EF.

      I stand by this too: EF is a great tool for app developers who don’t have either spare time or the desire to become proficient in a second, fundamentally different language like T-SQL (much less the SQL Server database engine as a whole). I don’t see why that opinion would alienate anyone.

      Reply
  • “Entity Framework is great for developers who don’t have the spare time or motivation to learn SQL Server querying, but that convenience comes with costs.”

    Do people really see EF like this? I never thought of it this way. If anything you need to know more SQL when using EF just because it may generate something you wouldn’t otherwise use. I always saw EF as a convenient way to materialize my queries into objects. I have worked on multiple projects without ORM and EF not only saves time from building a DataAccess layer that hydrates entities but also unifies the way it is done in multiple projects. This is quite valuable in my opinion but the first thing I do when I setup EF in a project is add a line in the data context to dump each and every query to the Output Window of VS when running in debug.

    Reply
    • Stilgar – that’s certainly the perspective I see from developers – they like EF because it generates code fast, including database code, that would ordinarily be painful to generate by hand.

      I don’t see that as a bad thing, either. I think it’s awesome – I’m all about shipping features and getting paid. If you take forever to learn every technology that you need as a developer, you’ll never get an app shipped. (Spoken as a former developer.)

      Reply
      • Painful? Really, SQL has been around for over 40 years, what’s so painful to learn basics and use common sense and good techniques. It’s about solving problems effectively not shipping out crappy software in a hurry. Take a few minutes, learn the basics invest in good techniques if you want robust work. I’ve seen developers lock production databases because they didn’t understand basic database and SQL techniques. EF is another unproductive Microsoft tool, it gets you there fast at dev-time but not fast enough during run-time. Solutions are not about developers, they’re for users. It’s like you want to take a trip to the Bahamas, but decide to go to New Jersey cause it’s faster. Mr. Former developer, you should know better.

        Reply
    • Stilgar,

      That’s a fantastic tip. I wish more devs were proactive like yourself!

      Reply
    • Jonathan Allen
      July 24, 2015 11:50 pm

      I definitely see it that way. By default, EF builds the kind of one-to-one mapping between tables and objects that I discourage.

      Consider a query that returns parent records and children records. There are lots of ways to do this, but EF defaults the way that results in the most network traffic. Every parent row is duplicated for each child row. And if their are grandchildren records, it will duplicate the parent row that many times instead.

      And it isn’t just network traffic either. We also have to pay the cost for serializing that data on one side and derserializing it on the other.

      Reply
  • Thanks for this post Doug. Especially the part about EF6.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.