The Case of Entity Framework Core’s Odd SQL

Development
17 Comments

Recently, Julie Lerman contacted me about some strange SQL that Entity Framework Core (EF Core) was generating. Now, EF Core is fairly new and works with Microsoft’s newest framework .NET Core. I haven’t had a chance to get my hands on EF Core yet so I was excited to see what it was doing. And of course when I saw the SQL my jaw hit the floor. Here’s an example of the SQL she sent me to accomplish an insert:

If you’re an experienced SQL tuner, you’ll notice some issues with this statement. First off the query has not one but two table variables. It’s generally better to use temp tables because table variables don’t have good statistics by default. Secondly, the statement uses a MERGE statement. The MERGE statement has had more than it’s fair share of issues. See Aaron’s Bertrand’s post “Use Caution with SQL Server’s MERGE Statement” for more details on those issues.

But that got me wondering, why would the EF team use SQL features that perform so poorly? So I decided to take a closer look at the SQL statement. Just so you know the code that was used to generate the SQL saves three entities (Katana, Kama, and Tessen) to the database in batch. (Julie used a Samurai theme so I just continued with it.)

A Deeper Look

The first part of the SQL creates a table variable called @toInsert0. This table variable will have the values that we want to insert into the Weapons table. It also adds a column called _Position, this will keep the order that the row was inserted in. This is needed later so that the WeaponId can returned in the order it was received.
The next bit is pretty straight forward. This inserts the data of the three entities into the @toInsert0 table variable. The @toInsert0 will be later used to insert the data into the Weapons table.
A second table variable, @inserted0, is created but this time with WeaponId and _Position as columns. This table will hold the new WeaponId’s that will be generated by inserting the data into the Weapon table.
This is where the magic happens. EF Core is using the MERGE statement to insert data from the first table variable (@toInsert0) into the Weapons table. The join that it is using is 1=0, which will always be false. So this statement will always run the WHEN NOT MATCHED section and insert the data into the Weapons table. Then, the OUTPUT line will put the newly created WeaponId and _Position into the @inserted0 table variable. This is necessary in order to return the newly created WeaponId to the client. In this case, WeaponId is an IDENTITY column which means SQL Server will automatically assign a value for WeaponId upon insert.
The final statement returns the newly created WeaponId(s) to Entity Framework. A few things to note: you could rewrite this statement to exclude the join and just select the data from the @inserted0 table variable, but I understand why they did it. What they’re doing is insuring that the data returning actually exists in the Weapons table. The other thing to note is the ORDER BY. In general, we want to avoid ORDER BY but the EF team needed to guarantee the order that was returned is the same order that was inputted so that the WeaponIds can be assigned to the correct weapon entity.

With a bit of testing, I realized that this statement wasn’t only being used for batch requests, but also for entity inserts with more than one entity being saved. Also, it appears that the limit per statement is 2,000 attributes. For example, if 6,500 attributes are being initially saved to the database EF Core will create four of these statements: three statements of 2,000 attributes and one statement of 500 attributes . The number of entities that are saved in each statement depends on the number of attributes being saved. In this case, 1,000 entities were saved per statement for a total of 3,250 entities saved to the database.

Rationalizing It All

Now that we’ve taken a look at the generated SQL, shouldn’t we be enraged? Wouldn’t we rip our devs a new one if we saw table variables, merge statements, and unnecessary sorts in their SQL? But let’s take a look at the core problem that the EF team is trying to solve: performance. Yea, I said it. In previous versions of Entity Framework, data was inserted one statement at a time. So if you needed to insert 5,000 rows into a table, you got 5,000 insert statements. This was necessary because the application needed the SQL-Server-generated identity value. But with this new approach, you could insert 5,000 rows of data with one statement.

But wouldn’t it be faster with temp tables? Sure! But you can’t guarantee a temp table name would be unique in your session (because you may have done other things along the way.) You could name a temp table with a guid, but there’s no guarantee that a guid is unique either. So the safest route would be a table variable and not a temp table.

But why use the MERGE statement if it’s so buggy? I think this goes back to the performance question. We could use n number of inserts but that’s not solving the performance problem that we originally had. In short, I think I’m OK with the MERGE statement. It’s only being used for inserts, and it’s faster than 5,000 insert statements.

But what about that ORDER BY in the SELECT statement? You kinda got me there. Could they be doing the sorting in the code but they went the SQL route. But it’s not so bad, since they’re creating a new SQL statement for every 2,000 entities SQL Server will only be sorting 2,000 rows.

To decide if it’s a bad thing, let’s compare the older version of Entity Framework (EF 6) with the new version (EF Core). In my totally unscientific tests, saving 10,000 entities to the database using the EF 6 took 4.06 seconds while saving the same 10,000 entities with EF Core took 1.46 seconds. So yes, the SQL isn’t perfect, but it’s 248% faster.

Brent says: hoo, boy. This is…not ideal.

Previous Post
[Video] Office Hours 2017/05/03 (With Transcriptions)
Next Post
Building a Faux PaaS, Part 1: The SQL Server DevOps Scene in 2017

17 Comments. Leave new

  • Mister Magoo
    May 8, 2017 12:37 pm

    The use of MERGE here is almost certainly because INSERT doesn’t allow you to OUTPUT any columns that are not from the INSERTED virtual table.
    Using MERGE allows you to do exactly this – OUTPUT the generated IDENTITY values along with data from the source (@toInsert0) thus providing a way to link the new rows to the source data.
    There is not likely to be a performance issue from the use of TABLE variables here as the row count is so low.
    It is a fairly common pattern in my experience.

    Reply
    • Mister Magoo – we’re less concerned with the table variable, and more concerned with this hot mess:

      https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

      If Microsoft wants to start relying on the MERGE statement, then I surely hope they fix the bugs the community has pointed out. Some of those bugs will give EF users some serious heartburn.

      Reply
      • Mister Magoo
        May 9, 2017 6:07 am

        Hi Brent, Thanks for the link, but I can’t see any relevant bugs in that list for an INSERT only pattern of this type, but maybe I’m misunderstanding them. Most of the bugs seem to be related to multiple operations in one MERGE or UPDATE/DELETE actions? If you are aware of a relevant bug for a simple INSERT with OUTPUT I would be interested to read it.
        I agree that the bugs make using MERGE for anything else a big issue, but it seems like it not only works OK for a simple INSERT pattern, but is possibly the best option as it allows you to OUTPUT columns from the source data alongside columns from INSERTED – exactly how it works here.

        Reply
        • Mister Magoo – when you read this, imagine me having a smile on my face as I say these words. (When I say something like this, sometimes I need to be really clear that I’m having some fun.)

          MERGE was introduced to accomplish the common upsert problem in data warehouses where you need to do inserts, updates, and deletes in a single operation. As we started using it, we ran across so many bugs that most of us hit the eject button. It didn’t work for the thing it was designed to do.

          Now, you’re coming along and saying, “Yes, but it probably works fine to do something it wasn’t designed to do.”

          Okay. I’ll make my bets, you make yours. Good luck with that.

          Reply
  • Great post! You are mostly right about the motivation, and good point that what could possibly to the ordering on the client (I will create an issue to investigate feasibility and measure the impact).

    We actually decided to adopt this MERGE-based solution after hitting a few issues with our previous (simpler) batching implementations:

    https://github.com/aspnet/EntityFramework/issues/4080 Order of IDENTITY values wasn’t deterministic
    https://github.com/aspnet/EntityFramework/issues/4478 Batch inserts don’t work with uncomparable columns

    Reply
    • Richie Rump
      May 9, 2017 9:31 am

      Thanks for the insight Diego. I’ve been following EF since the alpha of version 1 (now version 4). I actually put an application into production with the EF 1 beta. Since this is a blog for data professionals I wanted to give a bit of understanding of the challenges that EF is trying to solve. Even though MERGE has it’s problems the performance bump is impressive. I’m looking forward to playing with EF Core even more and seeing what other nuggets it stores inside.

      Reply
    • Zac Torkelson
      May 9, 2017 4:42 pm

      Were Table Value Constructors[1] considered instead of the temporary toInsert table variable? I have had good results with this technique in the past and would be interested in your benchmarking results.

      Something like (apologies for the formatting and any syntax errors as I am presently mobile):


      MERGE [Weapons] USING (VALUES
      (@p0, @p1, 0),
      (@p2, @p3, 1),
      (@p4, @p5, 2)
      ) AS i ([IsPointy], [Name], [_Position]) ON 1=0
      WHEN NOT MATCHED THEN
      INSERT ([IsPointy], [Name])
      VALUES (i.[IsPointy], i.[Name])
      OUTPUT INSERTED.[WeaponId], i._Position
      INTO @inserted0;

      [1]: https://docs.microsoft.com/en-us/sql/t-sql/queries/table-value-constructor-transact-sql

      Reply
  • Eric St-Georges
    May 8, 2017 6:44 pm

    Do you have the c# code that generated this? I’m curious to see what caused this.

    Reply
  • Yet another reason I steer clear of all flavours of EF and use a micro ORM.

    I use a single bare metal file one (which should tell you which it is) and that suuports most databases you can name without the fancy syntax thst isn’t worth the limitations and when that fails me I’ll rewrite the guts to a nano ORM of my own design and port and reuse that code add infinitum from application to application.

    My problems are solved and yours can be too! 😉

    Reply
  • Isn’t there a limit of ~2000 parameters in a query? Are the 2000 rows per batch only if there is one value per row?

    Reply
    • Richie Rump
      May 11, 2017 9:35 am

      Great catch. I mistook the number of variables with the number of rows. Whoops. It does limit the batch to the number of variables. In this case, 1,000 rows were loaded which makes perfect sense since we’re only inserting the Name and Is_Pointy values. I’ll update the post to reflect this.

      Thanks!

      Reply
  • After reading this, I found myself wondering why we couldn’t just avoid the temp tables altogether and do a single insert instead of a merge, outputting the multiple autogenerated IDs right from that insert. The answer seems to be that although SQL Server typically inserts the data in the same order as the provided insert statement and typically outputs the ids in that order as well, those behaviors are explicitly not guaranteed. The merge statement is needed to ensure the generated Ids can be matched back up with the correct entities. More detail here:
    https://dba.stackexchange.com/questions/155735/is-it-safe-to-rely-on-the-order-of-an-inserts-output-clause

    Reply
    • Generally speaking, I would recommend against the MERGE statement. https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

      Reply
      • Yes, I know you do because you already put that link in your article above and repeated it in the comments. But recall that the topic of the article is *entity framework’s* use of merge for bulk inserts.

        The intention of my comment was to explain why EF uses [multiple queries, including the controversial merge, and 2 table variables] to accomplish what seems like it could be done via a single insert statement that just outputs the generated ids. The answer is so that it can match the generated ids back up with the inserted entities.

        I just wanted to spare other readers from needing to do dig around elsewhere on google to find that answer, especially since Julie Lermon is still linking to this article from Pluralsight.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.