Two Code Patterns That Don’t Virtualize Well

Virtualization used to be a really Big Deal™ for database admins: we had to do a lot of careful planning to get a virtualization project done right. These days, virtualization is more and more of a no-brainer: most apps make the transition just fine. Every now and then, though, an exception pops up – usually after the project has already gone live and failed.

Most user-defined functions don’t virtualize well.

User-defined functions (UDFs) accept parameters and output a value or a table. Here’s an example of a scalar function that calculates how many badges a user has earned in the Stack Overflow database:

And here’s an example of a query that calls that function:

User-defined functions are really common because good developers are taught to package their code for easy reusability. Put it in a function, and then call that function from everywhere.

However, most functions have a dark secret: queries that call them are single-threaded. (This starts to get a little better with some types of functions in SQL Server 2017.)

That means that for CPU-bound queries with scalar functions, single-core CPU speed is incredibly important. If a long-running query can only use one CPU core, and that core is suddenly 25% slower, then your query is suddenly 25% slower.

To successfully virtualize these:

  • Track SOS_SCHEDULER_YIELD closely with something like the Power BI Dashboard for DBAs
  • Get the fastest cores possible (think 3.5GHz or faster)
  • Avoid CPU overcommitment – normally, VM admins like putting multiple VMs per core, especially given SQL Server’s licensing costs

IO-latency-sensitive apps don’t virtualize well.

We’ve all been taught that our code should work in sets, not row-by-agonizing-row. However, if you work one row at a time, you can become really sensitive to transaction log file latency.

One of my (least) favorite examples was an app server that had:

  • C# app running on an app server
  • It called SQL Server to log a row in a logging database table to say it was starting processing (which waited on the log file to harden)
  • The C# app would do some processing
  • It would call SQL Server back and update that one row to say it was done
  • Wash, rinse, and repeat millions of times in a single-threaded fashion

As a result, every added millisecond of latency meant huge time increases for their nightly jobs. They’d long ago understood that it was a problem, so they’d put that database’s log file on really cheap, consumer-grade NVMe SSDs, which meant that they had sub-millisecond latency.

But when they virtualized that application, the log file moved from local SSD out to the shared storage. They’d purchased pretty good storage – but even that couldn’t compete with the extremely low latency they could get locally.

To successfully virtualize these:

  • Look out for single-row, single-threaded processes (and ideally, write those to work in parallel batches)
  • Track WRITELOG waits closely before the migration
  • Load test the vulnerable processes before going live, making sure your jobs still finish in an acceptable time window
  • Consider putting databases like that on separate volumes so their performance characteristics can be tuned separately

And alert management about technical debt.

Both of these cases involve code that isn’t so great – code that was shipped to get a feature out the door and bring revenue in. That’s technical debt.

Ward Cunningham’s analogy about technical debt is one of the most effective ways I’ve seen to communicate the issue to management.

Previous Post
Computed Columns: Reversing Data For Easier Searching
Next Post
New Speed Controls for Training Videos

13 Comments. Leave new

  • I see this time and time again. Some organization does a round of hardware upgrades. Since they want to virtualize a bunch of servers (most of which aren’t SQL Server), they buy a big host with as many cores as they can get in it, but those cores are slower than if they bought fewer core, faster CPUs. Then they put a VM on the host with SQL Server on it, licensed at the guest level. Except they’re paying the same to license 8 2.0GHz cores as they could have paid to license 8 3.0 GHz cores. They probably could have gotten the same performance with only 6 3.0 GHz cores, which would be cheaper to license.

    Reply
  • Are you saying all UDFs are single-threaded or just scalar UDFs?

    Reply
  • I thought I did read it pretty carefully. Here you say, “However, most functions have a dark secret: queries that call them are single-threaded.” And then you say, “That means that for CPU-bound queries with scalar functions, single-core CPU speed is incredibly important”. Your first statement doesn’t mention scaler and your second does. And as far as reading the link, well, like most of your hard working readers, I’m pretty busy. 🙂

    Reply
    • Randy -no problem! Whenever you’ve got the time to read and learn, the post will be there for ya.

      But just like you, I’m kinda busy myself. I wish I could spoon feed everyone every little piece of knowledge, but….at some point, ya gotta pick up the spoon yourself. 😉

      Reply
  • Randy Datillo
    January 25, 2018 8:41 pm

    Brent, that’s an awfully snarky comment. Do you realize how condescending that is to all of us readers?

    Reply
    • Yep.

      And for someone who claims to be too busy to read, you sure do have a lot of time available to have a discussion in comments. If you invested just a little of that in reading the post and the supporting material in the links, you’d be in much better shape.

      Reply
    • “Do you realize how condescending that is to all of us readers?!”
      How so? Brent was talking to Randy Minder not us.

      Reply
  • This could have just been called “Two Code Patterns That Don’t Scale Well.”
    Use these code patterns on a physical box which is otherwise properly sized for the application, and they’ll bring that box to a halt as their use scales up as well.
    Virtualization really has nothing to do with it — it just highlights the fact that IT has been hiding poorly-performing code by overspending on system resources for decades.

    Reply
    • Steve – interesting, I wouldn’t have called a $250 SSD overspending (especially in relation to a SAN.)

      Reply
    • > it just highlights the fact that IT has been hiding poorly-performing code by overspending on system resources for decades.

      not only poorly performing, it’s poorly designed. This is the root cause. Logging from C# app to a relational, transactional database? That’s idiocy. Perfect example of technical debt.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.

Menu
{"cart_token":"","hash":"","cart_data":""}