How to Select Specific Columns in an Entity Framework Query

SQL Server
42 Comments

One of the most frequent complaints that I hear when presenting to DBAs about Entity Framework is that it’s “slow” and that “developers should be endlessly tortured for using it”. Ok, the second part I just made up but the sentiment exists. DBAs just don’t like developers using Entity Framework and with good reason. Entity Framework can make SQL Server work awfully hard if the developer isn’t careful. No, it’s not April Fool’s Day, we’re really going to go over some Entity Framework code. But I promise you it won’t hurt…much.

One of the biggest problems that I’ve seen developers make is retrieving too many columns in a call to the database. I know what you’re thinking, “Why in the world would they retrieve more columns than they need?” Well, because it’s easy.

Let’s try and get all of the rows in a table using Entity Framework.

The context object allows interaction with the database. Stuff like getting data from the database, saving data to the database, and putting data into objects. This line Is where the magic happens:

This one little line tells Entity Framework to go to the Posts table in the StackOverflow database, get ALL of the rows, and put them into C# objects. No SQL statement. No loading of data into business objects. Just one line and we have data from the database in the programmatic objects that we need them in. Super easy.

Of course, returning all of the rows from a table isn’t what your developers are probably doing but let’s see what kind of SQL Entity Framework generates from that one statement.

In case you were wondering, yes, this is every column from the Posts table. So in one simple, statement we generated a query that moves a ton of data that you probably don’t need. And let’s not talk about the additional CPU, I/O and the full scan of the clustered index that probably just happened.

Let’s take a look at a more real world example.

This one’s a bit more tricky but let’s walk through it. We’re getting data from the Posts table where the Tags column equals “<sql-server>” and selecting every column from the Posts table. We can tell because there are no specified properties in the Select. Even though this statement looks more complex it’s only three lines and looks somewhat like a SQL statement. But it’s really a LINQ (Language Integrated Query) statement, specifically a LINQ to Entities statement. This LINQ statement will be translated into this SQL statement:

See what I mean? The real question is “Do we need all of those columns?” Sometimes, the answer is “Yes” and that’s fine. But what if it’s “No”? How can we specify columns in our query? One easy way is to specify an anonymous type. Don’t be confused by the $2 word wizardry. Just think of an anonymous types as a way to put data into an object without defining an object. We can do that simply by using the “new” operator and selecting the properties from the object that we need. In this case, we only want to retrieve the Id and Title columns.

And the SQL generated:

There. That looks better. But what if the developer needed a strongly typed object returned in the query? We can do this seamlessly by defining the class for the object and calling it in the SELECT. These types of objects are commonly referred to as Data Transfer Objects or DTOs.

In case you were wondering, using a DTO will not change the SQL that’s generated.

That’s it. Now you can dig into some code and help tune those pesky Entity Framework queries.

Previous Post
Asynchronous Database Mirroring vs. Asynchronous Availability Groups
Next Post
Partitioned Views: A How-To Guide

42 Comments. Leave new

  • Nice article, thanks! I’ll be sharing it with a few of our development teams.

    Reply
  • Great post sir! But of course you have only just scratched the surface haven’t you? 😉
    When we start using EF to pull “child collections” that’s where things get interesting. Most of the time EF is pretty smart and will use a join to pull the 10,000+ related comment records for your Posts. Then again sometimes it quietly decides to perform 10,000 individual queries to get that same data. As a developer/DBA for my company, I have a love/hate relationship with ORMs.

    Reply
    • Jonathan Shields
      September 21, 2016 2:28 pm

      Yeah. Even with “Lazy loading” in EF we have a situation where all the child data is being returned from a queried table. Love to know how to conquer that one. I would like to use a stored proc instead but “3rd party says no”….

      Reply
    • Using the include syntax should force a join (in the case of a 1:1) or some flavor of a union (in the case of many on either side). Example:

      var results = context.Parents
      .Where(…) // yadda yadda yadda
      .Include(x => x.Child) // or x.Children, whatever the case may be.
      .Select(…); // yadda yadda yadda

      Reply
      • Slight correction: In hindsight, I’m not actually sure what specific conditions cause EF to start going into union territory, so YMMV. Always make sure you profile this stuff 🙂

        Reply
    • It’s just a fraction of a the surface. I think a big part of the problem is that developers (myself included) don’t really know how the ORM of choice really works. In your example, do the developers need the additional 10,000+ comment records? Do they know how or why the additional 10,000+ records were pulled? EF and other ORMs are full of settings to tweak things like query generation. We need to be better devs and understand how the tools we’re using work.

      Thanks for the comment!

      Reply
    • BBD (Big Bad Developer)
      April 12, 2017 11:57 am

      The hate usually comes from a lack of knowledge about your ORM and bad assumptions on how it works. Take the time to gain a deeper understanding of the ORM’s inner workings and you’ll feel much less hate and a bit more love. There’s very little not to love remaining in the big name ORMs like Entity Framework. EF has been around for many years now and if you’re still having problems with it, it’s likely your own fault, not the ORM’s.

      Reply
      • When you write an application, do you get to provide a huge manual and require users to read it… and blame them for not knowing the inner workings of your application well enough when they make mistakes that you basically set them up to make?

        You spends a decade learning to write great SQL… then EF comes along and now you have to learn how to make EF write good SQL. Working with EF is like working with a mediocre employee where you have to detail everything out to the nth degree if you want it done right.

        No… Let’s not victim blame here, lol.

        Reply
  • The following line will allow you to see all of the sql generated by Entity Framework (v6)
    context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);

    Reply
  • Nice article. I might have found a minor discrepancy.
    You said:
    “” and selecting every column from both the Posts and PostTags tables.”

    But the generated SQL is only pulling from the ‘Posts’ table:
    “….[Extent1].[TagsVarchar] AS [TagsVarchar]
    FROM [dbo].[Posts] AS [Extent1]
    WHERE N” = [Extent1].[Tags]”

    Could you clarify?

    Reply
  • Nice, I have shared this with my development team using Entity Framework. We are seeing just what you are talking about.

    Reply
  • But, if you are going to create DTO objects, then just use dapper with stored procedures and call it a day.

    Reply
    • There are lots of great tools out there, Dapper is definitely one of them. The big boy on the bock is EF. Not because it’s better but because it’s supported by MS. DTOs are pretty standard practice no matter which tech you choose.

      Reply
  • Sinister Penguin
    September 22, 2016 2:17 am

    Nice – worth the price of entry just for the explanation of anonymous types.

    Reply
  • I know it, I have to fix many codes here in company.
    select *.* from every EF object even for a combobox (instead of Id,Name) :/ crying

    Reply
  • Your post does nothing to talk me down from my ORM aversion. Would raw SQL fare better if it was easier to use? With QueryFirst, all your SQL is in .sql files, validated as you type, test run against the DB every time you save your file. Then all the ADO stuff, and the POCO, is generated for you. Running your query takes 1 line of code, and returns a list of POCOs. And if you change your DB schema, broken queries and invalid data accesses show up immediately as compile errors. It feels so obvious to use, I’m pretty much astonished to be the first to stumble on this, in 2016.

    Reply
    • Hey, I’m not here to talk you down. If you and your team want to go full SQL then go for it. The problem is that EF is being used and some devs don’t understand how it works. I’m just trying to give a bit of understanding how you could fix a common problem.

      You mention an interesting project, QueryFirst. Sounds promising. But I have a question, how did you stumble upon QueryFirst when, according to GitHub, you wrote it?

      Reply
      • Sorry, I’m not trying to hide anything. I meant the first to stumble on the approach. I did kinda profit from your post to talk about something completely different, but I thought that was the custom whenever the subject was Entity Framework 🙂

        Reply
  • I’m having some difficulty implementing this solution. In my MVC page when I try to use the .Select syntax the page builds, but I get an iEnumberable error when I attempt to execute the page. Am I missing something?

    Reply
    • Erik Darling
      March 21, 2017 1:24 pm

      Hi there! That’s not much to go on, and blog comments are a terrible place to troubleshoot code issues like that. You should post it to StackExchange.com or another Q&A site better suited to code issues.

      Thanks!

      Reply
  • Apologies, I didn’t mean to be in appropriate. I was hoping that I was missing something due to my unfamiliarity with the Entity Framework (perhaps a shorthand omission that a novice would not have known). I’ll be sure to resubmit this to an appropriate forum with sample code. Thanks.

    Reply
  • Selecting an anonymous object still puts additional un-needed columns with ef core.

    Reply
  • Mars Mayflower
    August 11, 2017 8:16 am

    This is just what I needed. Thanks for sharing!!

    Reply
  • This generate a sub select https://bugs.mysql.com/bug.php?id=75272

    Reply
  • What about saving back? Does the anonymous lose context? Load a few columns, make changes, save them back?

    Reply
    • Daniel Auger
      March 22, 2018 8:40 am

      Jay – Yes, the anonymous objects are “disconnected” from the change tracker and the underlying entity. Therefore this is not a good pattern for something like a desktop app. However, it’s fine for websites because reads and writes are separate requests and have to start with a fresh context on every operation. You’d have the same issue with a web app even if you returned the full entity on a get.

      Reply
  • Divyesh Vaghela
    June 9, 2018 7:26 am

    Thank you, this post cleared my confusion.

    Reply
  • What happens when you need to INSERT few columns than are list listed in the entity?

    Let’s say my entity holds 26 columns, but my particular insert statement only requires 5 columns of data to be inserted.

    How can I limit the number of columns sent through Entity Framework?

    Reply
  • Dylan Nicholson
    February 27, 2019 11:30 pm

    var posts = context.Posts; – sorry but this line doesn’t pull ANY records at all. Until you attempt to enumerate entities in that collection nothing is sent to the database. But it’s certainly true that if you write context.Posts.First().ID – where you only need the ID, it pulls back every column of the first row. Whereas context.Posts.Select(p -> p.ID).First() will pull back only the ID of the first row returned.

    Reply
  • Perfect!! Nice elegant. Thoroughly explained. This is exactly what I was looking for! Thank you Sir!

    Reply
  • Do you have suggestions for accomplishing this if I’m using EF and a Repository model?

    Reply
  • Thanks Brent! Just what I was looking for!

    Reply
  • Hi,
    I’m both a SQL Server DBA and a (begrudging) EF developer. So the DBA me is often very annoyed with the EF dev me, when it comes to the ridiculously inefficient queries that EF generates.

    Ideally I’d write my own custom SQL and cut out EF altogether, but EF has its advantages on the interface dev side, so this tip is very welcome, thanks Brent!

    My one note is that if you use .Select(…) to project EF data-bound entities to the cut-down object as in your example, you can’t do this for ‘Navigation properties’ that you might want to .GroupBy(…) later, as they will lose their original EF entity identity. So in these cases I’m still having to select the entire EF-bound class including all the unwanted columns…

    Reply
  • What if I need to select ALMOST ALL of the numerous columns, except a couple SPECIFIC ones?

    Reply
    • Then you list out the ones you need.

      I know it’s hard, making a list of columns. I understand that it seems like an insurmountable barrier, and that your body feels so weak sitting at the keyboard. It’s tough, your job – most mortals would collapse into a quivering blob of flesh were they faced with such a heroic challenge.

      But dig deep.

      I believe in you. I think you’ve got what it takes to list out the columns you need. I think that when you’re done with that task, poems will be written about you that make Homer’s Odyssey look like a limerick. You, Alexander, will have your image plastered on workplace walls everywhere as an icon of what it looks like to do the impossible.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.