Scalar functions in computed columns cause all queries that hit that table to execute serially. But it gets worse!
Scalar functions in computed columns
cause index maintenance to go single-threaded.
If you’re running Expensive Edition, index rebuilds can be both online and parallel. That’s pretty cool, because it keeps all your gadgets and gizmos mostly available during the whole operation, and the parallel bit usually makes things faster.
That is, unless you have a computed column in there that references a scalar function. I decided to write my test function to not perform any data access so it could be persisted. It’s dead simple, and I’m tacking it on to a column in the PostLinks table of the Stack Overflow database.
CREATE FUNCTION dbo.PIDMultiplier (@pid int)
WITH RETURNS NULL ON NULL INPUT,
DECLARE @Out BIGINT;
SELECT @Out = @pid * 2
ALTER TABLE dbo.PostLinks
ADD Multiplied AS dbo.PIDMultiplier(PostId) PERSISTED;
For this one, all we have to do is turn on actual execution plans and rebuild the index, then drop the column and rebuild again.
ALTER TABLE dbo.PostLinks REBUILD WITH ( ONLINE = ON );
ALTER TABLE dbo.PostLinks DROP COLUMN Multiplied;
Here are my execution plans. The rebuild I ran when the table had my computed column in it stayed serial.
Parallel, sans computed column:
They cause DBCC CHECKDB
to go single-threaded, too.
Probably the most important maintenance item you should be doing, aside from backups, is running DBCC CHECKDB. Seriously, if you’re not doing them both, start today. Ola Hallengren has basically done all the work for you. Back when I had a real job, I used his scripts everywhere.
Before we were so rudely interrupted by a soap box, we were talking about parallelism. This part was a little bit more complicated, but don’t worry, you don’t have to follow along. Just look at the pretty pictures. Sleep now. Yes. Sleep.
The first couple times I tried, the DBCC check never went parallel. Since I’m on my laptop, and not a production server, I can set Cost Threshold for Parallelism to 0. You read that right, ZE-RO! Hold onto your drool dish.
With that set, I fire up Ye Olde Oaken sp_BlitzTrace so I can capture everything with Extended Events. You’ll need all three commands, but you’ll probably have to change @SessionId, and you may have to change @TargetPath. Run the first command to start your session up.
EXEC dbo.sp_BlitzTrace @SessionId = 61 , @Action = 'start' , @TargetPath = 'c:\temp\' , @TraceParallelism = 1 , @TraceExecutionPlansAndKillMyPerformance = 1
EXEC dbo.sp_BlitzTrace @Action = 'stop'
EXEC dbo.sp_BlitzTrace @Action = 'read'
With that running, toss in your DBCC command. I’m only using DBCC CHECKTABLE here to simplify. Rest assured, if you run DBCC CHECKDB, the CHECKTABLE part is included. The only checks that DBCC CHECKDB doesn’t run are CHECKIDENT and CHECKCONSTRAINT. Everything else is included.
DBCC CHECKTABLE('dbo.PostLinks') WITH NO_INFOMSGS, ALL_ERRORMSGS
ALTER TABLE dbo.PostLinks
ADD Multiplied AS dbo.PIDMultiplier(PostId) PERSISTED;
Run DBCC CHECKTABLE, add the computed column back, and then run it again. When those finish, run the sp_BlitzTrace commands to stop and read session data. You should see execution plans for each run, and they should be way different.
So even DBCC checks are serialized. Crazy, right? I’d been hearing about performance hits to varying degrees when running DBCC checks against tables with computed columns for a while, but never knew why. There may be a separate reason for regular computed columns vs. ones that reference scalar functions. When I took the equivalent SQL out of a function, the DBCC check ran parallel.
ALTER TABLE dbo.PostLinks ADD Multiplied AS PostId * 2 PERSISTED;
Of course, those online index rebuilds running single threaded might be a blessing in disguise, if you haven’t patched SQL recently.
I don’t have much of a grand closing paragraph here. These things can seriously mess you up for a lot of reasons. If you’re a vendor, please get away from using scalar functions, and please please don’t use them in computed columns.
Thanks for reading!
DECLARE @Out INT; –?
You’d be surprised.
Expensive Edition heh
You say ” . If you’re a vendor, please get away from using scalar functions, […]”
What do you think about using scalar functions in check constraints?
I feel that you would probably win a marksman badge for shooting yourself in the foot.
Scalar Functions ALWAYS cause any queries using them to run single-threaded. It’s horrific in computed columns because the function basically causes the query to run RBAR. Why would anyone think this is a smart idea?
If the computed column is PERSISTED, it’s not the problem that you would think.
The computed column that I use in this demo is persisted.
As far as regular query use goes, perhaps not, but you should see what happens when you insert a bunch of rows 🙂
Agreed but it’s either going to happen from the table or it’s going to happen in code. iSFs will certainly make such a thing faster but if the column is supposed to change when someone updates other columns, it’s one way to ensure that happens.
I DO agree that computed columns can be a royal PITA when it comes to table changes especially if you’ve made the mistake of functions that call functions but sometimes the promise of future pain is worth simplicity in the present. 😉 It’s a whole lot better than some nub that want’s to build the formula into a regular view and then query on it. 😉
It would be really nice if iTVFs were allowed in computed columns as long as they had a TOP 1 in them (with some schemabinding guarantees), since there’s apparently no way to fix scalar functions.
What would be even better would be if MS were to fix scalar functions to make them more practical without having to do some trick to make it fast.
Jeff – yeah, amen to that!
Understood. I was replying to Haris. Computed columns have served me well when I needed them to. It’s like voting for POTUS… sometimes (all the time with POTUS), you have to choose between the lesser of two evils. 😉
I was caught out by this today. It’s garbage because now you have to include all of your simple computed columns as part of the table definition, rather than simply referencing the UDF in one place.
If you have lots of computed columns that reference other computed columns that’s a lot of copy and pasting to create a technical debt / maintenance nightmare.
Not fixed in SQL 2016 either.
And you want to do this because you want to bury business logic so deep that no one can figure out how the application works?
As a bit of a sidebar and except when you make the mistake of shrinking a database, fragmentation doesn’t matter. I’ve not rebuilt my indexes since 17 Jan 2016 and performance actually got better over the first 3 months of that.
I generally agree that rebuilds shouldn’t be run as regular maintenance for performance, but I would say that for good reasons to rebuild (like changing a setting, definition, or fill factor of an index), this will be a problem.
Heh… “Fill Factor”. Prior to 2016, I would have agreed. Now, not so much. I’ve found that indexes will create an effective “natural fill factor” due to page splits if you just leave them alone. The extra space produced by the page splits is normally used quite nicely when other data is added to the table because NCIs are normally not temporal or otherwise ever increasing in nature. Although I have a personal hatred for GUIDs (actually a misnomer since the advent of Type 4 GUIDs), even those benefit from just leaving the indexes alone to do what they were designed to do.
Again, the exception to that rule is the horrific method they used to shrink a database. I don’t know what else they did there to screw things to the floor so badly but that’s the only time that I’ll rebuild or reorganize indexes anymore and seems to work very nicely whether you have a heavy OLTP, batch, or combination of predominate queries. Brent was definitely sucking the right kind of bong water when he first made that suggestion for SAN based systems. 😉
Hey Jeff — yeah, Fill Factor was just an example of an option you can change with a Rebuild. It’s not something I run around changing.
Is there a way to create a computed column that queries another table without using a UDF? I realize it would not be deterministic and could not be persisted. I’m on SQL Server 2016 if it makes any difference.
No, but that doesn’t mean there’s not a better way to do what you want. Try asking the question on dba.stackexchange.com.
Thanks – here’s the question if you have thoughts – https://dba.stackexchange.com/questions/191843/referencing-another-table-from-a-computed-column.