Virtualization used to be a really Big Deal™ for database admins: we had to do a lot of careful planning to get a virtualization project done right. These days, virtualization is more and more of a no-brainer: most apps make the transition just fine. Every now and then, though, an exception pops up – usually after the project has already gone live and failed.
Most user-defined functions don’t virtualize well.
User-defined functions (UDFs) accept parameters and output a value or a table. Here’s an example of a scalar function that calculates how many badges a user has earned in the Stack Overflow database:
CREATE FUNCTION dbo.ScalarFunction ( @uid INT )
WITH RETURNS NULL ON NULL INPUT,
DECLARE @BCount BIGINT;
SELECT @BCount = COUNT_BIG(*)
FROM dbo.Badges AS b
WHERE b.UserId = @uid
GROUP BY b.UserId;
And here’s an example of a query that calls that function:
SELECT TOP 1000
FROM dbo.Users AS u;
User-defined functions are really common because good developers are taught to package their code for easy reusability. Put it in a function, and then call that function from everywhere.
However, most functions have a dark secret: queries that call them are single-threaded. (This starts to get a little better with some types of functions in SQL Server 2017.)
That means that for CPU-bound queries with scalar functions, single-core CPU speed is incredibly important. If a long-running query can only use one CPU core, and that core is suddenly 25% slower, then your query is suddenly 25% slower.
To successfully virtualize these:
- Track SOS_SCHEDULER_YIELD closely with something like the Power BI Dashboard for DBAs
- Get the fastest cores possible (think 3.5GHz or faster)
- Avoid CPU overcommitment – normally, VM admins like putting multiple VMs per core, especially given SQL Server’s licensing costs
IO-latency-sensitive apps don’t virtualize well.
We’ve all been taught that our code should work in sets, not row-by-agonizing-row. However, if you work one row at a time, you can become really sensitive to transaction log file latency.
One of my (least) favorite examples was an app server that had:
- C# app running on an app server
- It called SQL Server to log a row in a logging database table to say it was starting processing (which waited on the log file to harden)
- The C# app would do some processing
- It would call SQL Server back and update that one row to say it was done
- Wash, rinse, and repeat millions of times in a single-threaded fashion
As a result, every added millisecond of latency meant huge time increases for their nightly jobs. They’d long ago understood that it was a problem, so they’d put that database’s log file on really cheap, consumer-grade NVMe SSDs, which meant that they had sub-millisecond latency.
But when they virtualized that application, the log file moved from local SSD out to the shared storage. They’d purchased pretty good storage – but even that couldn’t compete with the extremely low latency they could get locally.
To successfully virtualize these:
- Look out for single-row, single-threaded processes (and ideally, write those to work in parallel batches)
- Track WRITELOG waits closely before the migration
- Load test the vulnerable processes before going live, making sure your jobs still finish in an acceptable time window
- Consider putting databases like that on separate volumes so their performance characteristics can be tuned separately
And alert management about technical debt.
Both of these cases involve code that isn’t so great – code that was shipped to get a feature out the door and bring revenue in. That’s technical debt.
Ward Cunningham’s analogy about technical debt is one of the most effective ways I’ve seen to communicate the issue to management.