I Feel Sorry for Untrained Developers Using Entity Framework.

Last Updated January 10, 2025

Most of the time, I love Entity Framework, and ORMs in general. These tools make it easier for companies to ship applications. Are the apps perfect? Of course not – but they’re good enough to get to market, bring in revenue to pay salaries, and move a company forwards.

However, just like any tool, if you don’t know how to use it, you’re gonna get hurt.

One classic example popped up again last month with a client who’d used EF Core to design their database for them. The developers just had to say which columns were numbers, dates, or strings, and EF Core handled the rest.

But if you create a string without specifying its length, EF defaults to using NVARCHAR(MAX). That is not a bug. That is by design, and it’s explained in the documentation:

The wording on that really pisses me off because NO, IT IS NOT SQL SERVER DOING THIS MAPPING. There is absolutely nothing in the database engine that’s saying strings are nvarchar(max). This is an Entity Framework problem, and stop pointing the blame at the innocent database engine. (Update 2025/01/10 – the documentation has been updated by Erik Ejlskov Jensen to point out that no, this isn’t a SQL Server problem.)

The documentation goes on to explain how you can manually set the right column lengths, and I’ve sat through several development conference sessions that emphasize how important it is for developers to do that. The problem here is that most folks don’t read the documentation, let alone attend conferences to learn how to use their tools. (I don’t blame the folks – I blame the companies who are pressuring developers to ship quickly without training.)

Demoing One of the Problems It Causes

Let’s create a table with two string columns – one NVARCHAR(100) and one NVARCHAR(MAX). Then, let’s load a million rows into it, putting the same contents in both the short and long columns.

DROP TABLE IF EXISTS dbo.Test;

CREATE TABLE dbo.Test
    (Id INT PRIMARY KEY CLUSTERED,
     ShortString NVARCHAR(100),
     LongString NVARCHAR(MAX));
GO
INSERT INTO dbo.Test(Id, ShortString, LongString)
    SELECT value, N'Brent Ozar', N'Brent Ozar'
    FROM generate_series(1,1000000);

DROP TABLE IF EXISTS dbo.Test;

CREATE TABLE dbo.Test

(Id INT PRIMARY KEY CLUSTERED,

ShortString NVARCHAR(100),

LongString NVARCHAR(MAX));

INSERT INTO dbo.Test(Id, ShortString, LongString)

SELECT value, N'Brent Ozar', N'Brent Ozar'

FROM generate_series(1,1000000);

Then, we’ll run identical queries against the short & long string version:

SELECT TOP 250 ShortString, Id
    FROM dbo.Test
    ORDER BY ShortString;

SELECT TOP 250 LongString, Id
    FROM dbo.Test
    ORDER BY LongString;

SELECT TOP 250 ShortString, Id

FROM dbo.Test

ORDER BY ShortString;

SELECT TOP 250 LongString, Id

FROM dbo.Test

ORDER BY LongString;

And review their actual execution plans:

The bottom query is the one that hits the NVARCHAR(MAX) column. Your first signs of danger are the yellow bang on the SELECT, and the 99% query cost estimate on the second query, indicating that SQL Server thinks the NVARCHAR(MAX) one is going to be quite a bit more expensive. However, as is often the case with SQL Server, the really big danger isn’t even shown visually.

Hover your mouse over each SELECT operator, and you’ll get a popup tooltip. One of the lines on that tooltip will say Memory Grant. Here’s the one for the NVARCHAR(100) query:

When the datatype is NVARCHAR(100), SQL Server allocates 210MB of memory to run the query because it believes it won’t need too much memory to sort that small of a column. However, check the same metric on the NVARCHAR(MAX) query:

Because SQL Server thinks that the contents of NVARCHAR(MAX) columns are larger, this query gets a 5GB memory grant, 24x larger. Depending on your server size and workloads, this can be a real problem because the more queries you have running simultaneously, the quicker your database server will run out of memory.

There’s a Fast Fix. It Doesn’t Work Here.

If you frequently search or order by a column, all you have to do is index it, right? Let’s try it:

Index creation error

The ShortString index gets created – but SQL Server can’t create an index on LongString because we can’t use NVARCHAR(MAX) as a key column in an index.

That’s a bummer, and you could argue that it’s a SQL Server limitation that could be fixed. For example, you can create a computed column on a shorter version of the column, and index that:

CREATE OR ALTER VIEW dbo.Test_View
    WITH SCHEMABINDING AS
SELECT Id, 
    CAST(LongString AS NVARCHAR(100)) AS LongString_Truncated
    FROM dbo.Test;
GO
CREATE UNIQUE CLUSTERED INDEX CLIX
    ON dbo.Test_View (LongString_Truncated, Id);

CREATE OR ALTER VIEW dbo.Test_View

WITH SCHEMABINDING AS

SELECT Id,

CAST(LongString AS NVARCHAR(100)) AS LongString_Truncated

FROM dbo.Test;

CREATE UNIQUE CLUSTERED INDEX CLIX

ON dbo.Test_View (LongString_Truncated, Id);

There’s nothing to say that SQL Server couldn’t do similar work in order to index abridged versions of NVARCHAR(MAX) columns when it could check to see if there any truly long values in that column. It just doesn’t, though, and I understand that it would require Microsoft to do some work.

It’s much easier for Microsoft to say, “Yo, EF developers, read the docs and you’ll never have problems like this.” To some extent, that’s fair, because I can see how someone would expect people to be well-trained on the tools they use every day to do their jobs. This is only one tiny example of the many, many problems you can hit with Entity Framework if you don’t get trained on how to use it first.

If you’re looking for training resources, start with anything by Julie Lerman in whatever way you enjoy consuming training material – videos, books, etc.

Who’s Hiring in the Microsoft Data Platform Community? January 2025 Edition

[Video] Office Hours: Long Answers Edition

23 Comments. Leave new

Jeff Moden
January 8, 2025 2:58 pm

Love it! Nice simple test that leaves no doubt. Thanks, Brent!

Reply
- Brent Ozar
  January 8, 2025 3:01 pm
  
  Glad you liked it, sir!
  
  Reply
KHOO TECK YUAN
January 8, 2025 3:09 pm

what’s even more funny is some people expect EF will give system better performance.
EF and performance hardly go hand in hand.
I have seen some project where they use GUID as primary key, nvarchar to store even integer value in DB. Making the column very unintuitive.
Scaling up by hyperscaling the DB in Azure in hope for better performance.

Reply
Richie Rump
January 8, 2025 3:09 pm

“Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”

Just because EF can create a database doesn’t mean you should let it.

Reply
- Montro1981
  January 8, 2025 5:55 pm
  
  EF creating the database schema is not a problem of you’re very explicit in telling it what you want.
  
  Reply
Ben Thul
January 8, 2025 3:38 pm

“The wording on that really pisses me off because NO, IT IS NOT SQL SERVER DOING THIS MAPPING. There is absolutely nothing in the database engine that’s saying strings are nvarchar(max). This is an Entity Framework problem, and stop pointing the blame at the innocent database engine.”

Channel that anger and create a PR against the docs. There’s a little pencil at the top of the page that lets you do that. Might they reject that for any of myriad reasons? Sure. But might they not? One way to find out.

Reply
- Brent Ozar
  January 8, 2025 3:52 pm
  
  Or – or, hear me out – or – I could go on with my life. 😉
  
  Reply
Kevin
January 8, 2025 5:19 pm

This was an article long overdue for writing. Great that, in true Brent Ozar fashion, you created a beautiful example that demonstrates the problem, and provide a great resource on how we can better ourselves as developers.

Reply
Scott
January 8, 2025 5:24 pm

You know what I always say – “When in doubt, engineers will max it out”

Reply
Pat
January 8, 2025 5:47 pm

Also part of the problem is that database developers are overlooked in favor of “full-stack” developers, who are supposed to know everything about everything 🙂

Reply
Montro1981
January 8, 2025 5:54 pm

This is just the tip of the EF/ORM iceberg, my brother (a software developer (C# mostly)) and myself have seen some atrocious queries coming out of EF (and other ORMs) retrieving thousands or even millions of rows to check the existence of a single row.

EF/ORMs are great when used properly and yes reading documentation is 100% part of that process

Reply
Keith
January 8, 2025 6:01 pm

I have a love/hate relationship with EF.

On one hand, it writes better SQL than most web or fullstack developers and will generally not face absolute atrocities in TSQL with EF and helps devs avoid logical bugs. It is also AMAZING for database schema source control.

On the other hand, it doesn’t write particularly efficient SQL, it can be a little difficult to read in complex queries and when a performance issue is found to be from a query, there is nothing I can do about it and usually nothing I can express to the developer to help make it more efficient if they don’t already know how to tune it.

Reply
Adam Jacobson
January 8, 2025 6:38 pm

My willingness to read documentation, buy books, attend conferences, etc is a considerable part of my success as a consultant.
So, I’m grateful that most people can’t be bothered. (Although I do try to get customers to go to do all of the above. It almost never works).

Reply
Philippe
January 9, 2025 6:45 am

Why not replacing EF by AI? Dictate AI what you want, paste stuff to get code generated, etc. The do the code review and go through a second pass to refine the stuff. Of course you need to know what you are doing but the result is controlled and readable, you can “teach” AI the naming conventions and the coding style so you do not end with totally unreadable names and code.
AI is like an advanced secretary, typing lot of codes faster than you can do. EF is a black box.

Reply
- Brent Ozar
  January 9, 2025 1:42 pm
  
  Of course you need to know what you are doing
  
  lemme just stop you right there
  
  Reply
  - Jeff Moden
    January 9, 2025 2:43 pm
    
    🙂
    
    Reply
Matt Hanson
January 9, 2025 11:36 am

The critical design flaw here is in EF, and it was in allowing a default mapping of string to a sql server data type. When a developer tries to generate a schema from that, they should get an error, and the tool should tell them they HAVE to decorate the string property with an attribute before they can proceed.

To be fair, nvarchar(max) IS the logical mapping of .net string onto sql server data types, but the entirely predictable human behavior is that people will follow the path of least resistance. So when that path leads to somewhere bad, it needs to be blocked.

My guess is that this was internally debated years ago on the EF Core team, and the overriding goal was to maximize adoption. So they didn’t want to make all their demos and examples cluttered with more attributes. In the beginning, it’s worse to have a reputation for having a steep learning curve. But in the long run, it’s worse to have a reputation for being full of foot guns.

Reply
Entity Framework and Default Data Lengths – Curated SQL
January 9, 2025 1:05 pm

[…] Brent Ozar points out one issue you might run into when using Entity Framework: […]

Reply
Erik Ejlskov Jensen
January 10, 2025 12:19 pm

Just FYI, the docs have been updated (I created a PR to calm you down)

Reply
Brian Boodman
January 10, 2025 4:34 pm

I’ll offer a hilarious example of misusing EF from Microsoft’s Identity codebase ( https://github.com/aspnet/AspNetIdentity/blob/main/src/Microsoft.AspNet.Identity.EntityFramework/UserStore.cs ):

public virtual Task FindByNameAsync(string userName)
{
ThrowIfDisposed();
return GetUserAggregateAsync(u => u.UserName.ToUpper() == userName.ToUpper());
}

To a non-DBA, it’s reasonable to assume that ToUpper() does not impact the algorithmic complexity of the search. However, what actually happens is that EF generates a WHERE clause that looks something like this:
WHERE UPPER(username) = UPPER(@username) .

And now SQL cannot use the index on the username column. This can easily turn a 10ms query into a 10s query.

Reply
- Dopinder
  January 16, 2025 4:00 am
  
  Dapper Micro ORM is similarly dangerous. In client c# code, you can lazily chose not to specify column type for passed params and Dapper will happily declare those as varchar(max) in the generated TSQL. Wish this would flag a compiler warning.
  
  The DBA can’t really fix these but he can at least educate the developers.
  
  Reply
ewm2
January 10, 2025 4:47 pm

Great blog post.

Reply
Francisco Benitez
January 28, 2025 7:24 pm

I had this kind of problem, and I showed the max length of the column and I purpose developers to use it as reference.

Reply