Common Entity Framework Problems: N + 1

I wanna dance with common problems

One of the most common issues that I’ve seen with Entity Framework isn’t technically an Entity Framework problem at all. The N + 1 problem is an anti-pattern that is a problem with ORMs in general, which most often occurs with lazy loading. There was a lot going on in that paragraph, so let’s break it down.

The N + 1 problem occurs when an application gets data from the database, and then loops through the result of that data. That means we call to the database again and again and again. In total, the application will call the database once for every row returned by the first query (N) plus the original query ( + 1).

All of those calls could have been accomplished in one simple call. Let’s look at an example in code:

Here’s the SQL generated from this code:

In this example, we’re getting data from the Posts table, and the PostTags table where the Tag equals “sqlbulkcopy”. The problem starts to occur in this line:

Do you see it?

The problem is that in our original query we’re not getting data from the LinkedPosts entity, just data from Posts and PostTags. Entity Framework knows that it doesn’t have the data for the LinkPosts entity, so it very kindly gets the data from the database for each row in the query results.

Whoops!

Obviously, making multiple calls to the database instead of one call for the same data is slower. This is a perfect example of RBAR (row by agonizing row) processing.

This is the SQL generated from our code:

This query is sent to SQL Server 449 times, and the only thing that’s changing it the EntityKeyValue value.

Ugh.

How can we fix it?

There is one fast way. It’s not optimal, but it will be better! Use an Include (also called eager loading) in the LINQ statement. Using an Include will add ALL of the data from the LinkedPosts entity, but it’s a simple fix without much retesting. Who likes testing code? No one. That’s why companies pay through the nose for software that does it automatically.

Now when the LinkedPosts entity is called, the Posts entity will have all of the data for the LinkedPosts entity. It will not make any additional calls to the database. That’s a good thing, right? Databases are cranky. That’s why DBAs are cranky.

Here’s the SQL that’s generated:

See what I mean by it not being optimal? We could rewrite the LINQ statement to have it generate a more optimal query, but that’s not the point of this post. If the performance of the query isn’t satisfactory, you can go down the rewriting the LINQ statement route.

How can we find N + 1 issues?

Not to toot the company horn (but I’m totally going to), one of my favorite ways to find N + 1 problems from the database is by using sp_BlitzCache. After running sp_BlitzCache @SortOrder=’executions’ I get this:

n-plus-1-find-the-issue

 

Look at those tasty executions!

Captain, I think we found the problem. Now, it doesn’t tell me what line of code is causing the issue, but it does give the SQL statement. I’m sure if you work with the devs, you can figure out where the problem is and fix it. Having the problem statement makes searching the code base a little easier, and there’s a good chance someone will recognize where it comes from.

Previous Post
[Video] Office Hours 2018/7/25 (With Transcriptions)
Next Post
New Classes: Dashboard in a Day, Database DevOps, tSQLt, SQL Server Internals, and Avoiding NOLOCK

19 Comments. Leave new

  • Eric St-Georges
    July 30, 2018 8:43 am

    EF lazy loading, while convenient, is often the bringer of pain…

    Last time I worked with EF, I made the controversial (internally anyway) decision to ditch all navigation properties. It made the devs complain, but at least we avoided the N+1 issue for the most part as people were forced to write their entire queries.

    Reply
    • Alex Friedman
      July 31, 2018 3:28 am

      Very interesting, could you explain in more detail for the non-EF-savvy?

      Reply
      • Eric St-Georges
        December 11, 2018 9:07 am

        I just realized I never replied to this. So the “normal” way of working with EF is that you create object models representing each of your tables, and this model has “navigation properties”, basically properties mapping to other objects. For example, a Parent entity could have a property of Children that once accessed, it will fetch all the children for the current parent. The N+1 issue comes when you loop through a list of parents, fetching children for each of them. So 1 query to fetch the parents + N queries to fetch the children of N parents. By removing those navigation properties, I was attempting to force developers to explicitly join Parent and Children at the query level to avoid those issues. It worked well enough for that, but it confused a lot of people familiar with ORMs as to why the navigation properties were omitted.

        Reply
  • Shawn Clabough
    July 30, 2018 11:31 am

    I’m sure you know this already, but for those who read this article and don’t, here’s how I would further optimize the EF statement.

    var posts = context.Posts.AsNoTracking()
    .Where(t => t.PostTags.Any(pt => pt.Tag == “sqlbulkcopy”))
    .Include(p => p.LinkedPosts)
    .Select(p => new {Id = p.Id, Tags = p.TagsVarChar, LinkedPosts = p.LinkedPosts.Select(lp => new {Id = lp.Id, CreationDate = lp.CreationDate}).ToList()}).ToList();

    .AsNoTracking tell EF not to track the results. If this is a read-only process, this can improve EF performance. Do not use if it is not a read-only process.

    .Select into a anonymous object causes EF to only retrieve those columns instead of all columns. This can also be done with DTOs instead of anonymous objects.

    Reply
    • Shawn Clabough
      July 30, 2018 11:33 am

      Also, I would highly recommend LinqPad because it will show you the SQL statement that your linq code produces and you can also see if you have the N+1 issue if you execute your entire code block.

      Reply
    • Eric St-Georges
      July 30, 2018 2:11 pm

      FYI, because you’re using a projection (That Select method), the entity won’t be cached since you’re only pulling part of it, making the AsNoTracking statement moot. It doesn’t hurt, but it doesn’t help either.

      Reply
    • The Include can also be removed. It is not used when using projections.

      Reply
      • Shawn Clabough
        July 31, 2018 1:06 pm

        You’re both correct. I just have a habit of putting those in to be explicit. I’m curious to test performance of having or not having .AsNoTracking() is exactly the same, or by not having it, it still does some work before deciding not to track the projected entities.

        Reply
  • Rémi BOURGAREL
    July 31, 2018 2:28 am

    You can fix it also by using the advanage of unit of work pattern implemented by most ORM : get the post ids from the first result set (posts.Select(p =>p.Id).Distinct()) and then query all the linkedpost matching these ids (db.PostLinks.Where(l => postIds.Contains(l.Id)). Like this You’ll have 2 much cleaner SQL queries. And you can also filter the linked post if you want, something that you cannot do with Include

    Reply
    • Joseph Daigle
      August 1, 2018 9:13 pm

      Be extremely careful with that second query. EF (in EF6 anyway; I haven’t tested in EF Core) produces a SQL query that will make any DBA cry. Specifically, it creates a query with an non-parameterized IN clause containing each literal value from the in-memory collection.

      You’re probably better off with some sort of JOIN.

      Reply
  • Alex Friedman
    July 31, 2018 3:28 am

    This is super useful, thanks!

    Reply
  • Frank Lakmann
    August 1, 2018 4:20 am

    Hi, concerning the “it doesn’t tell me what line of code is causing the issue, but it does give the SQL statement”:
    There is a tool that does that: http://www.hibernatingrhinos.com/products/efprof
    I am not linked to that company, using their equivalent tool “nhibernate profiler” and like it a lot. It saved us many hours.

    Reply
  • I do begin wonder if EF/Hibernate/etc. are more trouble than they are worth.
    Their main selling point is that they allow people to “not worry” about the database, and “avoid boilerplate code”, because the framework will handle it all for them.
    But for data-centric applications they *should* be worrying about the database and how to interact with it.

    Reply
    • EF/nHibernate are NOT worth the trouble, IMNSHO.

      Guruism: Anything that allows developers to slap code together more quickly is inversely proportional to the performance, scalability and concurrency you will get from that code.

      If you just slap stuff together I have seen applications so bad you can’t buy big enough hardware to service SQL Server and if you use your gray matter and write “better” (still often FAR from optimal) database applications with an ORM then you are not getting the biggest benefit from it – which is fast code slappage.

      As a 20+ year SQL Server relational engine only consultant I can tell you that when I go to a new client and I find they write applications in an ORM I go KAAAACCCHHHHIIIIIIINNNNNNNGGGGG!!! I have a client that paid seven figures to a company to write two new applications for them in EF. I spent two weeks fixing stuff as fast as I could just to get bare minimum functionality out of it (and their clients were still NOT happy). They eventually wound up scrapping both applications and redoing from scratch.

      Reply
      • Kevin – I disagree, but probably the best example I can site is StackOverflow.com. It was initially built with LINQ to SQL. They seem to have done okay. 😉

        Sometimes you need to ship features fast to bring in revenue. Then, over time, you can decide which features/queries users use the most often, and spend money/time paying down technical debt on those – rather than trying to ship perfect code for every feature.

        Reply
      • John Obvious`
        August 14, 2018 8:46 am

        It is clear that too many businessmen don’t understand basic economics when hiring you. Your database expertise doesn’t translate into software development knowledge and economics. Most company spend FAR more on developers then the servers. Not that performance shouldn’t be a consideration. But the developer cost you want far exceeds the savings.

        You could further optimize by getting rid of all the business reasons to have the data and then your server can run cheap hardware and sit idle.

        Reply
  • This is a good reason why I don’t use Entity Framework…Because you don’t have control for optimization. Forget EF, write a good SQL query, retrieve it into a DataTable with ADO.NET, and process the results. No extra rows, no extra fluff, and you control exactly what is requested from the database.

    Reply
  • Paulo Morgado
    August 22, 2018 5:56 pm

    Kudos tor the article.

    I always say to my DBA friends that ORMs are not the problem. People are the problem. DBAs and developers.

    You need to know what you’re doing and the misuse of ORMs is done often by developers who most likeley would do even worst without it.

    And DBAs that just dismiss ORMs as bad are as bad as those developers. DBAs shouldn’t care how the queries are built. They should care about what queries are being issued at their databases.

    If a developer can’t make a decent query, that’s her/his problem. Learn how to or chose another tool.

    And a DBA that can demonstrate that the problem is not the tool but the one using it (like you did) is way better than any DBA that just talks about what she/he doesn’t know about.

    But I’m writting to let you know there’s a typo on the TSQL code on the blog post:

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}