How to Select Specific Columns in an Entity Framework Query

Last Updated February 10, 2018

One of the most frequent complaints that I hear when presenting to DBAs about Entity Framework is that it’s “slow” and that “developers should be endlessly tortured for using it”. Ok, the second part I just made up but the sentiment exists. DBAs just don’t like developers using Entity Framework and with good reason. Entity Framework can make SQL Server work awfully hard if the developer isn’t careful. No, it’s not April Fool’s Day, we’re really going to go over some Entity Framework code. But I promise you it won’t hurt…much.

One of the biggest problems that I’ve seen developers make is retrieving too many columns in a call to the database. I know what you’re thinking, “Why in the world would they retrieve more columns than they need?” Well, because it’s easy.

Let’s try and get all of the rows in a table using Entity Framework.

using (var context = new StackOverflowContext())
{
    var posts = context.Posts;

    // Do something with the data returned;
}

using (var context = new StackOverflowContext())

{

var posts = context.Posts;

// Do something with the data returned;

}

The context object allows interaction with the database. Stuff like getting data from the database, saving data to the database, and putting data into objects. This line Is where the magic happens:

var posts = context.Posts;

1	var posts = context.Posts;

This one little line tells Entity Framework to go to the Posts table in the StackOverflow database, get ALL of the rows, and put them into C# objects. No SQL statement. No loading of data into business objects. Just one line and we have data from the database in the programmatic objects that we need them in. Super easy.

Of course, returning all of the rows from a table isn’t what your developers are probably doing but let’s see what kind of SQL Entity Framework generates from that one statement.

SELECT
[Extent1].[Id] AS [Id],
[Extent1].[AcceptedAnswerId] AS [AcceptedAnswerId],
[Extent1].[AnswerCount] AS [AnswerCount],
[Extent1].[Body] AS [Body],
[Extent1].[ClosedDate] AS [ClosedDate],
[Extent1].[CommentCount] AS [CommentCount],
[Extent1].[CommunityOwnedDate] AS [CommunityOwnedDate],
[Extent1].[CreationDate] AS [CreationDate],
[Extent1].[FavoriteCount] AS [FavoriteCount],
[Extent1].[LastActivityDate] AS [LastActivityDate],
[Extent1].[LastEditDate] AS [LastEditDate],
[Extent1].[LastEditorDisplayName] AS [LastEditorDisplayName],
[Extent1].[LastEditorUserId] AS [LastEditorUserId],
[Extent1].[OwnerUserId] AS [OwnerUserId],
[Extent1].[ParentId] AS [ParentId],
[Extent1].[PostTypeId] AS [PostTypeId],
[Extent1].[Score] AS [Score],
[Extent1].[Tags] AS [Tags],
[Extent1].[Title] AS [Title],
[Extent1].[ViewCount] AS [ViewCount],
[Extent1].[TagsVarchar] AS [TagsVarchar]
FROM [dbo].[Posts] AS [Extent1]

SELECT

[Extent1].[Id] AS [Id],

[Extent1].[AcceptedAnswerId] AS [AcceptedAnswerId],

[Extent1].[AnswerCount] AS [AnswerCount],

[Extent1].[Body] AS [Body],

[Extent1].[ClosedDate] AS [ClosedDate],

[Extent1].[CommentCount] AS [CommentCount],

[Extent1].[CommunityOwnedDate] AS [CommunityOwnedDate],

[Extent1].[CreationDate] AS [CreationDate],

[Extent1].[FavoriteCount] AS [FavoriteCount],

[Extent1].[LastActivityDate] AS [LastActivityDate],

[Extent1].[LastEditDate] AS [LastEditDate],

[Extent1].[LastEditorDisplayName] AS [LastEditorDisplayName],

[Extent1].[LastEditorUserId] AS [LastEditorUserId],

[Extent1].[OwnerUserId] AS [OwnerUserId],

[Extent1].[ParentId] AS [ParentId],

[Extent1].[PostTypeId] AS [PostTypeId],

[Extent1].[Score] AS [Score],

[Extent1].[Tags] AS [Tags],

[Extent1].[Title] AS [Title],

[Extent1].[ViewCount] AS [ViewCount],

[Extent1].[TagsVarchar] AS [TagsVarchar]

FROM [dbo].[Posts] AS [Extent1]

In case you were wondering, yes, this is every column from the Posts table. So in one simple, statement we generated a query that moves a ton of data that you probably don’t need. And let’s not talk about the additional CPU, I/O and the full scan of the clustered index that probably just happened.

Let’s take a look at a more real world example.

using (var context = new StackOverflowContext())
{
    var posts = context.Posts
                       .Where(p => p.Tags == "<sql-server>")
                       .Select(p => p);

    // Do something;
}

using (var context = new StackOverflowContext())

{

var posts = context.Posts

.Where(p => p.Tags == "<sql-server>")

.Select(p => p);

// Do something;

}

This one’s a bit more tricky but let’s walk through it. We’re getting data from the Posts table where the Tags column equals “<sql-server>” and selecting every column from the Posts table. We can tell because there are no specified properties in the Select. Even though this statement looks more complex it’s only three lines and looks somewhat like a SQL statement. But it’s really a LINQ (Language Integrated Query) statement, specifically a LINQ to Entities statement. This LINQ statement will be translated into this SQL statement:

SELECT
[Extent1].[Id] AS [Id],
[Extent1].[AcceptedAnswerId] AS [AcceptedAnswerId],
[Extent1].[AnswerCount] AS [AnswerCount],
[Extent1].[Body] AS [Body],
[Extent1].[ClosedDate] AS [ClosedDate],
[Extent1].[CommentCount] AS [CommentCount],
[Extent1].[CommunityOwnedDate] AS [CommunityOwnedDate],
[Extent1].[CreationDate] AS [CreationDate],
[Extent1].[FavoriteCount] AS [FavoriteCount],
[Extent1].[LastActivityDate] AS [LastActivityDate],
[Extent1].[LastEditDate] AS [LastEditDate],
[Extent1].[LastEditorDisplayName] AS [LastEditorDisplayName],
[Extent1].[LastEditorUserId] AS [LastEditorUserId],
[Extent1].[OwnerUserId] AS [OwnerUserId],
[Extent1].[ParentId] AS [ParentId],
[Extent1].[PostTypeId] AS [PostTypeId],
[Extent1].[Score] AS [Score],
[Extent1].[Tags] AS [Tags],
[Extent1].[Title] AS [Title],
[Extent1].[ViewCount] AS [ViewCount],
[Extent1].[TagsVarchar] AS [TagsVarchar]
FROM [dbo].[Posts] AS [Extent1]
WHERE N'<sql-server>' = [Extent1].[Tags]

SELECT

[Extent1].[Id] AS [Id],

[Extent1].[AcceptedAnswerId] AS [AcceptedAnswerId],

[Extent1].[AnswerCount] AS [AnswerCount],

[Extent1].[Body] AS [Body],

[Extent1].[ClosedDate] AS [ClosedDate],

[Extent1].[CommentCount] AS [CommentCount],

[Extent1].[CommunityOwnedDate] AS [CommunityOwnedDate],

[Extent1].[CreationDate] AS [CreationDate],

[Extent1].[FavoriteCount] AS [FavoriteCount],

[Extent1].[LastActivityDate] AS [LastActivityDate],

[Extent1].[LastEditDate] AS [LastEditDate],

[Extent1].[LastEditorDisplayName] AS [LastEditorDisplayName],

[Extent1].[LastEditorUserId] AS [LastEditorUserId],

[Extent1].[OwnerUserId] AS [OwnerUserId],

[Extent1].[ParentId] AS [ParentId],

[Extent1].[PostTypeId] AS [PostTypeId],

[Extent1].[Score] AS [Score],

[Extent1].[Tags] AS [Tags],

[Extent1].[Title] AS [Title],

[Extent1].[ViewCount] AS [ViewCount],

[Extent1].[TagsVarchar] AS [TagsVarchar]

FROM [dbo].[Posts] AS [Extent1]

WHERE N'<sql-server>' = [Extent1].[Tags]

See what I mean? The real question is “Do we need all of those columns?” Sometimes, the answer is “Yes” and that’s fine. But what if it’s “No”? How can we specify columns in our query? One easy way is to specify an anonymous type. Don’t be confused by the $2 word wizardry. Just think of an anonymous types as a way to put data into an object without defining an object. We can do that simply by using the “new” operator and selecting the properties from the object that we need. In this case, we only want to retrieve the Id and Title columns.

using (var context = new StackOverflowContext())
{
    var posts = context.Posts
                       .Where(p => p.Tags == "<sql-server>")
                       .Select(p => new {p.Id, p.Title});

    // Do something;
}

using (var context = new StackOverflowContext())

{

var posts = context.Posts

.Where(p => p.Tags == "<sql-server>")

.Select(p => new {p.Id, p.Title});

// Do something;

}

And the SQL generated:

SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Title] AS [Title]
FROM [dbo].[Posts] AS [Extent1]
WHERE N'<sql-server>' = [Extent1].[Tags]

SELECT

[Extent1].[Id] AS [Id],

[Extent1].[Title] AS [Title]

FROM [dbo].[Posts] AS [Extent1]

WHERE N'<sql-server>' = [Extent1].[Tags]

There. That looks better. But what if the developer needed a strongly typed object returned in the query? We can do this seamlessly by defining the class for the object and calling it in the SELECT. These types of objects are commonly referred to as Data Transfer Objects or DTOs.

public class PostDto
{
    public int PostId { get; set; }
    public string PostTitle { get; set; }
}

static void BlogQueryDto()
{
    using (var context = new StackOverflowContext())
    {
        var posts = context.Posts
                           .Where(p => p.Tags == "<sql-server>")
                           .Select(p => new PostDto() 
                           { 
                               PostId = p.Id, 
                               PostTitle = p.Title 
                           });

        // Do Something
    }
}

public class PostDto

{

public int PostId { get; set; }

public string PostTitle { get; set; }

}

static void BlogQueryDto()

{

using (var context = new StackOverflowContext())

{

var posts = context.Posts

.Where(p => p.Tags == "<sql-server>")

.Select(p => new PostDto()

{

PostId = p.Id,

PostTitle = p.Title

});

// Do Something

}

In case you were wondering, using a DTO will not change the SQL that’s generated.

That’s it. Now you can dig into some code and help tune those pesky Entity Framework queries.

Asynchronous Database Mirroring vs. Asynchronous Availability Groups

Partitioned Views: A How-To Guide

42 Comments. Leave new

SteveA
September 21, 2016 11:20 am

Nice article, thanks! I’ll be sharing it with a few of our development teams.

Reply
- ammad
  March 8, 2018 5:41 am
  
  why few ?
  
  Reply
Bob McLaren
September 21, 2016 11:28 am

Great post sir! But of course you have only just scratched the surface haven’t you? 😉
When we start using EF to pull “child collections” that’s where things get interesting. Most of the time EF is pretty smart and will use a join to pull the 10,000+ related comment records for your Posts. Then again sometimes it quietly decides to perform 10,000 individual queries to get that same data. As a developer/DBA for my company, I have a love/hate relationship with ORMs.

Reply
- Jonathan Shields
  September 21, 2016 2:28 pm
  
  Yeah. Even with “Lazy loading” in EF we have a situation where all the child data is being returned from a queried table. Love to know how to conquer that one. I would like to use a stored proc instead but “3rd party says no”….
  
  Reply
- Daniel Auger
  September 22, 2016 11:37 am
  
  Using the include syntax should force a join (in the case of a 1:1) or some flavor of a union (in the case of many on either side). Example:
  
  var results = context.Parents
  .Where(…) // yadda yadda yadda
  .Include(x => x.Child) // or x.Children, whatever the case may be.
  .Select(…); // yadda yadda yadda
  
  Reply
  - Daniel Auger
    September 22, 2016 11:59 am
    
    Slight correction: In hindsight, I’m not actually sure what specific conditions cause EF to start going into union territory, so YMMV. Always make sure you profile this stuff 🙂
    
    Reply
- Richie Rump
  September 23, 2016 9:14 am
  
  It’s just a fraction of a the surface. I think a big part of the problem is that developers (myself included) don’t really know how the ORM of choice really works. In your example, do the developers need the additional 10,000+ comment records? Do they know how or why the additional 10,000+ records were pulled? EF and other ORMs are full of settings to tweak things like query generation. We need to be better devs and understand how the tools we’re using work.
  
  Thanks for the comment!
  
  Reply
- BBD (Big Bad Developer)
  April 12, 2017 11:57 am
  
  The hate usually comes from a lack of knowledge about your ORM and bad assumptions on how it works. Take the time to gain a deeper understanding of the ORM’s inner workings and you’ll feel much less hate and a bit more love. There’s very little not to love remaining in the big name ORMs like Entity Framework. EF has been around for many years now and if you’re still having problems with it, it’s likely your own fault, not the ORM’s.
  
  Reply
  - Brandon
    December 23, 2021 6:19 pm
    
    When you write an application, do you get to provide a huge manual and require users to read it… and blame them for not knowing the inner workings of your application well enough when they make mistakes that you basically set them up to make?
    
    You spends a decade learning to write great SQL… then EF comes along and now you have to learn how to make EF write good SQL. Working with EF is like working with a mediocre employee where you have to detail everything out to the nth degree if you want it done right.
    
    No… Let’s not victim blame here, lol.
    
    Reply
Mark Pearson
September 21, 2016 11:49 am

The following line will allow you to see all of the sql generated by Entity Framework (v6)
context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);

Reply
- Richie Rump
  September 23, 2016 9:15 am
  
  I did not know that. Thanks!
  
  Reply
Beau D'Amore
September 21, 2016 12:02 pm

Nice article. I might have found a minor discrepancy.
You said:
“” and selecting every column from both the Posts and PostTags tables.”

But the generated SQL is only pulling from the ‘Posts’ table:
“….[Extent1].[TagsVarchar] AS [TagsVarchar]
FROM [dbo].[Posts] AS [Extent1]
WHERE N” = [Extent1].[Tags]”

Could you clarify?

Reply
- Richie Rump
  September 23, 2016 9:28 am
  
  Nice catch! I’ve changed it to ” selecting every column from the Posts table.”
  
  Thanks for the assist!
  
  Reply
Tom Norman
September 21, 2016 12:48 pm

Nice, I have shared this with my development team using Entity Framework. We are seeing just what you are talking about.

Reply
E
September 21, 2016 5:31 pm

But, if you are going to create DTO objects, then just use dapper with stored procedures and call it a day.

Reply
- Richie Rump
  September 23, 2016 9:18 am
  
  There are lots of great tools out there, Dapper is definitely one of them. The big boy on the bock is EF. Not because it’s better but because it’s supported by MS. DTOs are pretty standard practice no matter which tech you choose.
  
  Reply
Sinister Penguin
September 22, 2016 2:17 am

Nice – worth the price of entry just for the explanation of anonymous types.

Reply
Istvan
September 22, 2016 3:43 am

I know it, I have to fix many codes here in company.
select *.* from every EF object even for a combobox (instead of Id,Name) :/ crying

Reply
Simon Boddy
September 22, 2016 10:37 am

Your post does nothing to talk me down from my ORM aversion. Would raw SQL fare better if it was easier to use? With QueryFirst, all your SQL is in .sql files, validated as you type, test run against the DB every time you save your file. Then all the ADO stuff, and the POCO, is generated for you. Running your query takes 1 line of code, and returns a list of POCOs. And if you change your DB schema, broken queries and invalid data accesses show up immediately as compile errors. It feels so obvious to use, I’m pretty much astonished to be the first to stumble on this, in 2016.

Reply
- Richie Rump
  September 23, 2016 10:46 am
  
  Hey, I’m not here to talk you down. If you and your team want to go full SQL then go for it. The problem is that EF is being used and some devs don’t understand how it works. I’m just trying to give a bit of understanding how you could fix a common problem.
  
  You mention an interesting project, QueryFirst. Sounds promising. But I have a question, how did you stumble upon QueryFirst when, according to GitHub, you wrote it?
  
  Reply
  - Simon Boddy
    September 23, 2016 2:30 pm
    
    Sorry, I’m not trying to hide anything. I meant the first to stumble on the approach. I did kinda profit from your post to talk about something completely different, but I thought that was the custom whenever the subject was Entity Framework 🙂
    
    Reply
TCW
March 21, 2017 12:42 pm

I’m having some difficulty implementing this solution. In my MVC page when I try to use the .Select syntax the page builds, but I get an iEnumberable error when I attempt to execute the page. Am I missing something?

Reply
- Erik Darling
  March 21, 2017 1:24 pm
  
  Hi there! That’s not much to go on, and blog comments are a terrible place to troubleshoot code issues like that. You should post it to StackExchange.com or another Q&A site better suited to code issues.
  
  Thanks!
  
  Reply
TCW
March 21, 2017 3:51 pm

Apologies, I didn’t mean to be in appropriate. I was hoping that I was missing something due to my unfamiliarity with the Entity Framework (perhaps a shorthand omission that a novice would not have known). I’ll be sure to resubmit this to an appropriate forum with sample code. Thanks.

Reply
- Erik Darling
  March 21, 2017 6:06 pm
  
  No trouble at all. Just want you to get better help than we can provide!
  
  Reply
Tonto
July 21, 2017 4:32 am

Selecting an anonymous object still puts additional un-needed columns with ef core.

Reply
- Mars Mayflower
  August 11, 2017 8:17 am
  
  Yes but are those columns empty? If so, then at least it saves memory space?
  
  Reply
Mars Mayflower
August 11, 2017 8:16 am

This is just what I needed. Thanks for sharing!!

Reply
Edson
February 14, 2018 8:30 am

This generate a sub select https://bugs.mysql.com/bug.php?id=75272

Reply
Jay
March 21, 2018 2:47 pm

What about saving back? Does the anonymous lose context? Load a few columns, make changes, save them back?

Reply
- Daniel Auger
  March 22, 2018 8:40 am
  
  Jay – Yes, the anonymous objects are “disconnected” from the change tracker and the underlying entity. Therefore this is not a good pattern for something like a desktop app. However, it’s fine for websites because reads and writes are separate requests and have to start with a fresh context on every operation. You’d have the same issue with a web app even if you returned the full entity on a get.
  
  Reply
Divyesh Vaghela
June 9, 2018 7:26 am

Thank you, this post cleared my confusion.

Reply
ezG
January 14, 2019 2:38 pm

What happens when you need to INSERT few columns than are list listed in the entity?

Let’s say my entity holds 26 columns, but my particular insert statement only requires 5 columns of data to be inserted.

How can I limit the number of columns sent through Entity Framework?

Reply
- Brent Ozar
  January 15, 2019 4:08 am
  
  ezG – that sounds like a great question to post over at https://StackOverflow.com. (Just want to teach you to ask questions at the right places.) Thanks!
  
  Reply
  - Dylan Nicholson
    February 27, 2019 11:35 pm
    
    You might want to see https://stackoverflow.com/questions/40619319/entity-framework-not-including-columns-with-default-value-in-insert-into-query – if anything it appears getting EF to updates values when the C# defaults don’t match the DB-defined defaults is more of an issue that your concern.
    
    Reply
Dylan Nicholson
February 27, 2019 11:30 pm

var posts = context.Posts; – sorry but this line doesn’t pull ANY records at all. Until you attempt to enumerate entities in that collection nothing is sent to the database. But it’s certainly true that if you write context.Posts.First().ID – where you only need the ID, it pulls back every column of the first row. Whereas context.Posts.Select(p -> p.ID).First() will pull back only the ID of the first row returned.

Reply
Raymond
May 13, 2019 2:51 am

Perfect!! Nice elegant. Thoroughly explained. This is exactly what I was looking for! Thank you Sir!

Reply
Matthew
August 8, 2019 8:23 am

Do you have suggestions for accomplishing this if I’m using EF and a Repository model?

Reply
Tim
October 17, 2020 6:23 pm

Thanks Brent! Just what I was looking for!

Reply
Richard Welsh
September 1, 2021 4:23 pm

Hi,
I’m both a SQL Server DBA and a (begrudging) EF developer. So the DBA me is often very annoyed with the EF dev me, when it comes to the ridiculously inefficient queries that EF generates.

Ideally I’d write my own custom SQL and cut out EF altogether, but EF has its advantages on the interface dev side, so this tip is very welcome, thanks Brent!

My one note is that if you use .Select(…) to project EF data-bound entities to the cut-down object as in your example, you can’t do this for ‘Navigation properties’ that you might want to .GroupBy(…) later, as they will lose their original EF entity identity. So in these cases I’m still having to select the entire EF-bound class including all the unwanted columns…

Reply
Alexander
July 15, 2022 11:42 pm

What if I need to select ALMOST ALL of the numerous columns, except a couple SPECIFIC ones?

Reply
- Brent Ozar
  July 17, 2022 11:01 am
  
  Then you list out the ones you need.
  
  I know it’s hard, making a list of columns. I understand that it seems like an insurmountable barrier, and that your body feels so weak sitting at the keyboard. It’s tough, your job – most mortals would collapse into a quivering blob of flesh were they faced with such a heroic challenge.
  
  But dig deep.
  
  I believe in you. I think you’ve got what it takes to list out the columns you need. I think that when you’re done with that task, poems will be written about you that make Homer’s Odyssey look like a limerick. You, Alexander, will have your image plastered on workplace walls everywhere as an icon of what it looks like to do the impossible.
  
  Reply