Trainable SQL Servers

Last Updated June 14, 2017

Funny thing

The human body is pretty good at adaptation. When you get sick or injured, your body responds and (most of the time) heals itself up. When you stress your body in a certain way, over time, your body will feel less stressed by the same level of that activity. If you do 10 push-ups every day, after a week you’ll barely notice you’re doing them.

One of my favorite sources for information about human body training is Mark Rippetoe. Perhaps it’s a bit of confirmation bias, because I like to eat, sleep, and not do cardio. I do like to lift something heavy a few times in a few different ways, then come back (after eating and sleeping and not doing cardio) and lift something a little heavier. That’s sensible training. To me, anyway. Again, confirmation bias.

If I were a runner, I’d want to try run some distance faster, or be able to run a longer distance.

If I were into CrossFit, I’d try to juggle kettlebells while riding a unicycle on a balance ball and deadlifting a barbell I’m standing on so next time I can fight a truck tire and drag a fridge up a ramp while wearing a fire hydrant for a hat. Or something? Cool shorts, though.

But that’s enough about the human body. It’s all gross inside. Gross and smelly.

What if your SQL Server adapted?

Azure has something a little bit like that. It will monitor missing index recommendations, test the changes, and then decide whether to keep them or not based on some further analysis. But this is more like a doctor trying different medications on you until one works.

There’s also, somewhere in the future, a feature called Adaptive Query Processing coming to a vNext near you. But again, this isn’t training. This is more like you saying “doctor, it hurts when I do this” and the doctor saying “stop doing that”. Even a flatworm turns away from pain.

This also… is not training.

This doesn’t help your CPUs if your app takes on lots of new users, who start running more queries. This doesn’t help your disks as users start committing more transactions and reading more data. It doesn’t help memory when your data doubles or triples past your currently allotted RAM. It doesn’t help tempdb when the 50 GB drive you gave it is all of a sudden 150 GB too small.

Your server faithfully uses the same hardware, and level of hardware, with absolutely no adaptation.

Just sits there. Does not change. Despite the fact that you stressed it (workload), and fed it (data), and it got some rest (users went to bed), it did not change.

It had the entire feedback cycle necessary available to it, but it did nothing to react. That part is still up to you.

What would adaptation look like?

The first thing to figure out is where and how this is possible.

Physical servers are pretty much out, but it could work if you had a several node AG and some kickass ROBOTS that had access to shelves of compatible hardware. You’d also have to trust some internal process to failover, take a node offline, replace hardware, bring it online, and so on down the line. If you think this sounds ridiculous, you’re right. If you don’t, you might be one of those phone company employees who thought the internet getting popular meant everyone was going to need two phone lines.

Physically situated VMs are… okay. But if you underpowered your host(s), you’re stuck with the same kind of scenario as above. You can’t give a VM more RAM than exists in the host. Again, ROBOTS! And ridiculous.

The cloud is the only place this is really doable. There’s just gobs of hardware and redundancy already.

No offense to AWS or GCE, but Azure is sort of a natural fit. It’s Microsoft, and they can stick their fingers in the source code to add whatever additional feedback a server would need to adapt to workload changes. Granted, any of them could use wait stats and other DMVs, or even Extended Events, and probably do a really good job of automating self-tuning hardware. It would still be a process, but the sweet part is that you spin up a new VM with powered up parts where deficiencies are detected and fail right over to it. That part of the cloud is spectacular.

Of course, having read some Microsoft RAP reports, I’d be a little nervous about them just making any ol’ change to my prod server.

“What did we find?”
“Context switching!”
“By God, you know what to do.”
[Sets stats to update async for msdb]

And yet I digress! You probably wouldn’t want this to kick in for a momentary spike. You’d probably want some period of sustained load around 5-10 minutes before the ROBOTS started organizing to upgrade something. Much in the same way you don’t want to fail your entire AG over for a 10 second network blip. You’d want an “Is Waffle House Open?” strategy. You don’t want a reaction until something serious is happening.

Any monitoring tool, too, could feed these metrics out.

Unless you’re hitting THREADPOOL or some RESOURCE_SEMAPHORE* waits. Then nothin’ is feedin’ nothin’ nothin’.

Which just might be a good time to start thinking about the ROBOT stuff.

You also may want to set some thresholds for the ROBOTS to respond to. There are all sorts of business rules that could dictate automated hardware increases. Increases in user counts or data size, after a code release, or if you on-board a new client. The possibilities are… possible. Sorta. I think there’s some “machine learning” fad kicking around that could help with this kind of thing.

We all know how that ends though.

The future

Stuff like this is fun to speculate about. It’s like IT science fiction. Which I guess is regular science fiction. Trans-serverism? I don’t know.

It’s probably already in the works somewhere, maybe under a secret patent.

Sorry if I spoiled anyone’s big reveal by writing this.

Thanks for reading!

Tim Cockerham
June 23, 2017 9:18 am

If you take a shot at Crossfitters on a website for IT people, does it actually make a sound? By the way, there’s a pic a person doing a headstand on an atlas ball on Crossfit’s front page. With a fridge and a fire hydrant just out of frame.

Reply
- Erik Darling
  June 23, 2017 9:26 am
  
  A fridge full of atlas balls. On top of an atlas ball.
  
  On top of an atlas ball.
  
  On top of an atlas ball.
  
  On top of an atlas ball.
  
  On top of an atlas ball.
  
  …
  
  Reply
Jason A
June 23, 2017 10:57 am

Interesting idea, although one wonders how the billing would be handled for such a thing…
“You want a hands-off system that will adapt the resources to your needs over time? We’ve got *JUST* the thing, it’ll only add a mere trifle ($999/mo+additional resource cost) to your charges!”

Also, while your example of where machine learning can lead came first, I think more people would recognize this as the reference:
https://en.wikipedia.org/wiki/Skynet_(Terminator)

Reply
- Erik Darling
  June 23, 2017 11:01 am
  
  Pricing is for the marketing people 🙂
  
  Reply
Marcelo
June 23, 2017 2:01 pm

It should page HR telling them to fire or hire DBA’s depending on the demand.

Reply
- Erik Darling
  June 23, 2017 2:03 pm
  
  Hahaha, would you work there?
  
  Reply
  - Marcelo
    June 30, 2017 12:59 pm
    
    Not for a long time I guess!
    
    Reply