Another reason why scalar functions in computed columns is a bad idea

As if there weren’t enough reasons

In my last blog post I talked about different things that cause plans, or zones of plans, to execute serially. One of the items I covered was computed columns that reference scalar functions. We know that they’ll make queries go serial, but what about other SQL stuff?

Oh no my index is fragmented

If you’re running Expensive Edition, index rebuilds can be both online and parallel. That’s pretty cool, because it keeps all your gadgets and gizmos mostly available during the whole operation, and the parallel bit usually makes things faster.

That is, unless you have a computed column in there that references a scalar function. I decided to write my test function to not perform any data access so it could be persisted. It’s dead simple, and I’m tacking it on to a column in the PostLinks table of the Stack Overflow database.

For this one, all we have to do is turn on actual execution plans and rebuild the index, then drop the column and rebuild again.

Here are my execution plans. The rebuild I ran when the table had my computed column in it stayed serial.

Hi, I'm garbage.

Hi, I’m garbage.

Parallel, sans computed column:

Dude, you're getting a parallel.

Dude, you’re getting a parallel.

But there’s a bigger fish in the pond

Probably the most important maintenance item you should be doing, aside from backups, is running DBCC CHECKDB. Seriously, if you’re not doing them both, start today. Ola Hallengren has basically done all the work for you. Back when I had a real job, I used his scripts everywhere.

Before we were so rudely interrupted by a soap box, we were talking about parallelism. This part was a little bit more complicated, but don’t worry, you don’t have to follow along. Just look at the pretty pictures. Sleep now. Yes. Sleep.

The first couple times I tried, the DBCC check never went parallel. Since I’m on my laptop, and not a production server, I can set Cost Threshold for Parallelism to 0. You read that right, ZE-RO! Hold onto your drool dish.

With that set, I fire up Ye Olde Oaken sp_BlitzTrace so I can capture everything with Extended Events. You’ll need all three commands, but you’ll probably have to change @SessionId, and you may have to change @TargetPath. Run the first command to start your session up.

With that running, toss in your DBCC command. I’m only using DBCC CHECKTABLE here to simplify. Rest assured, if you run DBCC CHECKDB, the CHECKTABLE part is included. The only checks that DBCC CHECKDB doesn’t run are CHECKIDENT and CHECKCONSTRAINT. Everything else is included.

Run DBCC CHECKTABLE, add the computed column back, and then run it again. When those finish, run the sp_BlitzTrace commands to stop and read session data. You should see execution plans for each run, and they should be way different.

Hell Yeah.

Hell Yeah.

Hell No.

Hell No.

So even DBCC checks are serialized. Crazy, right? I’d been hearing about performance hits to varying degrees when running DBCC checks against tables with computed columns for a while, but never knew why. There may be a separate reason for regular computed columns vs. ones that reference scalar functions. When I took the equivalent SQL out of a function, the DBCC check ran parallel.

Of course, those online index rebuilds running single threaded might be a blessing in disguise, if you haven’t patched SQL recently.

I don’t have much of a grand closing paragraph here. These things can seriously mess you up for a lot of reasons. If you’re a vendor, please get away from using scalar functions, and please please don’t use them in computed columns.

Thanks for reading!

,
Previous Post
It’s Now Easier to Query sp_BlitzFirst’s Historical Tables
Next Post
SQL Server Agent is an application server.

22 Comments. Leave new

  • Graeme Martin
    January 26, 2016 1:43 pm

    DECLARE @Out INT; –?

    Reply
  • Alex Friedman
    January 27, 2016 7:32 am

    Expensive Edition heh

    Reply
  • You say ” . If you’re a vendor, please get away from using scalar functions, […]”
    What do you think about using scalar functions in check constraints?

    Reply
  • Scalar Functions ALWAYS cause any queries using them to run single-threaded. It’s horrific in computed columns because the function basically causes the query to run RBAR. Why would anyone think this is a smart idea?

    Reply
    • If the computed column is PERSISTED, it’s not the problem that you would think.

      Reply
      • Erik Darling
        July 26, 2017 3:33 pm

        The computed column that I use in this demo is persisted.

        As far as regular query use goes, perhaps not, but you should see what happens when you insert a bunch of rows 🙂

        Reply
        • Agreed but it’s either going to happen from the table or it’s going to happen in code. iSFs will certainly make such a thing faster but if the column is supposed to change when someone updates other columns, it’s one way to ensure that happens.

          I DO agree that computed columns can be a royal PITA when it comes to table changes especially if you’ve made the mistake of functions that call functions but sometimes the promise of future pain is worth simplicity in the present. 😉 It’s a whole lot better than some nub that want’s to build the formula into a regular view and then query on it. 😉

          Reply
          • Erik Darling
            July 26, 2017 3:54 pm

            It would be really nice if iTVFs were allowed in computed columns as long as they had a TOP 1 in them (with some schemabinding guarantees), since there’s apparently no way to fix scalar functions.

          • What would be even better would be if MS were to fix scalar functions to make them more practical without having to do some trick to make it fast.

          • Jeff – yeah, amen to that!

        • Understood. I was replying to Haris. Computed columns have served me well when I needed them to. It’s like voting for POTUS… sometimes (all the time with POTUS), you have to choose between the lesser of two evils. 😉

          Reply
  • I was caught out by this today. It’s garbage because now you have to include all of your simple computed columns as part of the table definition, rather than simply referencing the UDF in one place.

    If you have lots of computed columns that reference other computed columns that’s a lot of copy and pasting to create a technical debt / maintenance nightmare.

    Not fixed in SQL 2016 either.

    Reply
    • And you want to do this because you want to bury business logic so deep that no one can figure out how the application works?

      Reply
    • As a bit of a sidebar and except when you make the mistake of shrinking a database, fragmentation doesn’t matter. I’ve not rebuilt my indexes since 17 Jan 2016 and performance actually got better over the first 3 months of that.

      Reply
      • Erik Darling
        July 26, 2017 3:35 pm

        I generally agree that rebuilds shouldn’t be run as regular maintenance for performance, but I would say that for good reasons to rebuild (like changing a setting, definition, or fill factor of an index), this will be a problem.

        Reply
        • Heh… “Fill Factor”. Prior to 2016, I would have agreed. Now, not so much. I’ve found that indexes will create an effective “natural fill factor” due to page splits if you just leave them alone. The extra space produced by the page splits is normally used quite nicely when other data is added to the table because NCIs are normally not temporal or otherwise ever increasing in nature. Although I have a personal hatred for GUIDs (actually a misnomer since the advent of Type 4 GUIDs), even those benefit from just leaving the indexes alone to do what they were designed to do.

          Again, the exception to that rule is the horrific method they used to shrink a database. I don’t know what else they did there to screw things to the floor so badly but that’s the only time that I’ll rebuild or reorganize indexes anymore and seems to work very nicely whether you have a heavy OLTP, batch, or combination of predominate queries. Brent was definitely sucking the right kind of bong water when he first made that suggestion for SAN based systems. 😉

          Reply
          • Hey Jeff — yeah, Fill Factor was just an example of an option you can change with a Rebuild. It’s not something I run around changing.

            Thanks!

  • Daniel C O'Brien
    November 27, 2017 2:55 pm

    Is there a way to create a computed column that queries another table without using a UDF? I realize it would not be deterministic and could not be persisted. I’m on SQL Server 2016 if it makes any difference.

    Reply
    • No, but that doesn’t mean there’s not a better way to do what you want. Try asking the question on dba.stackexchange.com.

      Reply
  • Daniel C O'Brien
    November 27, 2017 3:32 pm
    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}