Scalar functions and multi-statement table-valued functions are notorious performance killers. They hide in execution plans, their cost is under-estimated, the row estimates are way off, they cause queries to go single-threaded, I could go on and on.
Microsoft is bringing a fix in SQL Server 2019, and thanks to a newly published paper, we know more about how they’re doing it. Folks from Microsoft and the Gray Systems Lab wrote Froid: Optimization of Imperative Programs in a Relational Database (17-page PDF, and slightly easier-to-digest 12-page PDF).
It’s just a little technical, but if you’re a query tuner, you’ll probably find it a clearly written, enjoyable read. (I hate most academic papers, but this one was an easy read.)
What Froid Does: Turn Scalars Into Subqueries
If the entire body of an imperative UDF can be expressed as a single relational expression R, then any query that invokes this UDF can be transformed into a query with R as a nested sub-query in place of the UDF.
At first, it sounds like they’re turning single-statement scalar functions into subqueries a la the APPLY statement. That might not sound too exciting because most of the scalars out there are multi-statement. However, keep reading, and they say they’ve already got it working with DECLARE, SET, SELECT, IF/ELSE, RETURN, UDFs (nested/recursive functions), EXISTS, and ISNULL. The 17-page version of the paper includes some decently sized functions as examples of what Froid can tackle.
Froid replaces the scalar UDF operator in the calling query with the newly constructed relational expression as a scalar sub-query.
Froid (French for cold) is the name they’ve given the technique. Not a big fan of that myself – I keep hearing it as either Freud or fraud, depending on my level of enthusiasm. (“Sometimes a subquery is just a subquery.”) Anyhoo, onwards.
That one statement comes with a lot of interesting gotchas that they discuss throughout the paper:
- You won’t see the scalar function in the plan (just like we don’t currently see single-statement inline table-valued functions by name in the plan – we just see their effects, kinda like views)
- The function and the underlying tables might have different permissions (you could have permissions on the function but not the tables, or vice versa, which makes compilation & execution a little trickier)
- Code that doesn’t get used can just get removed outright (just like SQL Server can do join elimination)
- Costs and row estimates are now useful inside the plan
Froid is an extensible framework, designed in a way that makes it straightforward to incrementally add support for more languages and imperative constructs.
They mention C#, Java, R, and Python as other candidates. Given the initial (kinda duct-tape-y) implementation of R & Python in SQL Server 2016/2017, this interests me a lot: if we’re gonna use other languages in the database, I’d much rather have them be first class citizens instead of separate services.
How Froid Was Tested: With Azure SQL DB Data
I absolutely adore Azure SQL DB, but maybe not for the reason you expect. See, now that Microsoft is charging money to host your databases, they’re starting to catch on to the performance bottlenecks. They’re probably looking at their own code and going, “Dang, if we fixed this scalar UDF problem, queries would run X% faster, and we’d be able to cut our hosting bills by 14%, and therefore we’d be able to pass the savings on to…actually, we could just keep the savings.”
An example of that pops up in the paper:
We have analyzed several customer workloads from Azure SQL Database to measure the applicability of Froid with its currently supported constructs. We are primarily interested in databases that make good use of UDFs and hence, we considered the top 100 databases in decreasing order of the number of UDFs present in them. Cumulatively, these 100 databases had 85329 scalar UDFs, out of which Froid was able to handle 51047 (59.8%).
Think about that for a second.
Yeah yeah yeah, the top 100 databases had an average of 853 UDFs in them each. Sure, that’s horrifying. But set that aside – I actually don’t think that’s so bad given that it’s the top 100 databases worldwide.
Focus on this line for a second:
…we considered the top 100 databases in decreasing order of the number of UDFs present in them.
That means Microsoft looked at all Azure SQL DBs, sorted them by the number of UDFs, and then analyzed those customers’ code.
I know sometimes people get all tin-foil-hat-conspiracy, but this does mean that Microsoft developers are reading your databases and your code. Yes, they’re using it to make the product better – but this is a good example of the fact that in the cloud, your hosting partner can (and does) read your data, your code, and your workloads. (Later paragraphs talk about how they replayed customer workloads to gauge performance improvements with Froid.)
It’s a tradeoff: Microsoft reads your data, but they use it to make the product better. Is it worth it? Those poor folks with 853 functions would probably say yes, but obviously, you can’t trust their judgment to begin with, right? Those bozos have 853 functions per database. C’mon. They can’t even be trusted to pick what we’re having for lunch.
No, seriously, the performance improvements are fantastic, of course – the same kinds of order-of-magnitude improvements that you’d normally get with rewriting the code to avoid scalars and multi-statement TVFs altogether. The engine just goes faster without changing the way you drive. That’s pretty awesome.
When Froid Might Ship: Azure, SQL 2017, and SQL 2019
The paper includes this gem:
Froid is implemented in SQL Server 2017 in about 1.5k lines of code. For our experiments, SQL Server 2017 with Froid was run on Windows Server 2012(R2).
Don’t get all excited, now, speedy: that doesn’t mean they’re going to ship it in a 2017 cumulative update, although a performance tuner can dream.
Remember, Microsoft has a track record of being properly paranoid when it comes to shipping dramatic execution changes – like how batch mode is only available for queries with columnstore indexes in them. If query execution could result in incorrect results – and it has – then they’re right to be paranoid. So, it wouldn’t surprise me to see this kind of rollout come very slowly just like it did with batch mode execution. They surely won’t go wild and crazy and just release it in a 2017 CU that affects all scalar UDFs.
They’ll probably ship it in Azure SQL DB first – and hey, we can probably guess which 100 databases will get it first.
To learn more about Froid, head to the 44th International Conference on Very Large Data Bases 2018 on August 27-31 in Rio de Janeiro, Brazil. I’m kidding, of course – the paper was submitted for that, but the sessions haven’t been picked yet. And let’s be honest, if one of you goes to Brazil, you’re not going to step foot in the conference hall. You people are party animals. I’ll see you at the beach. Spring break!
Erik Says: Look, I’m willing to trade in a lot of blog posts and training material about scalar UDFs for them not to kill performance for unsuspecting developers out there anymore. I’d also love to have a very tangible reason to give clients to upgrade to 201(?). For all the things I love about 2016 and especially 2017, most of our clients wouldn’t see a lot of ROI with stuff needing batch mode to work, etc. (even with the tricky workarounds).
Confirmed by Joe Sack that this will ship in Azure SQL DB and vNext: https://twitter.com/JoeSackMSFT/status/955464318002188289
Well… the dates of the conference, are actually “winter” on this side of the globe … but the beach will still be open in Rio lol … xD
Some clarifications, as academic papers are sometimes brief to fit into the publishing rules:
1. We have logic in SQL Azure which determines if UDFs are potentially able to be inlined. Like any other SQL Server/Azure feature, it runs within SQL and just has some metadata property to keep track of this capability. While developing features, we often will add telemetry to measure whether customers would be positively impacted or not. We did not look at customer data or even the schemas when doing this for what happened in the published paper. Instead, we just looked at databases that would potentially benefit if we had such a feature and generically analyzed them. No customer data leaves the compliance zone during this process – we’re just testing our own code to see if it is working.
2. For the two specific databases we examined, we did not use telemetry to do this – we worked with two customers who gave us their schemas directly (without data) under mutual NDAs so that we could do work to validate what impact it would have on their databases if we could inline parts of UDFs.
To be fully transparent, there are activities in SQL Azure where the SQL team can examine the state of a machine in the service. For example, if there is a Watson dump made of memory, a SQL engineer may see customer data as part of that analysis (though it would be incidental – we’re usually just looking at how to fix a bug in such a situation). We can also look at other aspects of a machine for troubleshooting purposes, but overall we have a set of processes and procedures around how we operate that minimize what data we might examine to the smallest set possible and to maintain that the data stays within secure boundaries and does not leave the service. For example, Watson crash dumps are kept within a secure set of machines and have remote debugging + retention policies.
The policies around how we protect customer data in SQL Azure can be read here:
Conor – thanks for the clarification! I’m actually very familiar with that Microsoft document, which includes:
That matches what I’d written in the post. Would it be possible to get that MS doc to be as transparent as your blog post comment? That’d be awesome.
I kept hesring in my head…freudian slips will be in SQL Server vNext.
Just so long as the Froidian slip isn’t a slip of the release date 🙂
[…] might give you some clues about some other things that are coming as well. And this post from Brent Ozar talks about a potential forthcoming fix to scalar user-defined […]