There’s no Latin word for Robot
I’m all for automation, especially when it comes to the boring stuff. As awesome as the automation routines we have for that are, I still see people messing it up.
Right now, there’s no Clippy/HAL mashup glaring at you, telling you that what you’re doing is a bad idea.
You wanna take one log backup a day right after your full backup? Done.
You wanna rebuild and then reorg and then update stats for every index? Say no more.
You wanna set your database to single user mode to take a full backup? Okie dokie.
You wanna stop taking log backups between 8pm and 8am? No worries.
And, hey, you wanna set all this up to run without warnings or notifications for failures, long running processes, or the dreaded TOO FAST process? I’m not your mom! G’head, slugger.
(Ask me about when a three hour import process started taking 3 seconds next time you see me.)
Most of our client engagements start off with us running our free scripts to dig in on a health check.
As part of the process, we look at maintenance tasks. At a pretty good rate, we’ll see that something important is failing, with backups being the most common.
There are some common reasons for this, too:
- Drive filled up
- SAN guy changed a path
- Permissions got hosed
The important thing here: no one was aware of it.
Automation is great, but it’s only as reliable as its monitoring.
Set it and forget it
The trouble I find with most automated processes is that no one is checking in on them.
The automation layer, thankfully, rescues you from having to wake up every X minutes to take a log backup, stay up until midnight to take a full backup, etc. These are all noble ends.
But you do need to verify that automation is working as expected once in a while.
The first thing you want to know about are failures. This is easy enough in SQL Server by setting up job failure emails.
“Stats updates failed with something about dbo.Sort and tempdb, and we got alerts that the T drive is down to 46K of disk space.”
The second thing you want to know about is if a job has been running longer than usual.
“Say, why’s CHECKDB running for 8 hours? Usually it’s done in three.”
The third thing you want to know about is it jobs are finishing much more quickly than usual
“Did one of you forget to change that PRINT to an EXEC in dev?”
Which brings us to performance
I still haven’t gotten much of an answer to “what if performance was never good?”
SQL Server’s missing index requests are plain daffy sometimes. Without looking at the source code, I’m willing to wager that those (or some DTA-ish mechanism) is behind the A/B testing that goes on in Azure.
The automated tuning mechanism in general will only give you the best of all potentially bad plans.
Batch mode memory grant feedback will give up if it can’t find a middle ground (and currently requires ColumnStore to work)
Batch mode adaptive joins require… well, a join. You don’t even need one of those to make parameter sniffing happen.
I look forward to the day when there’s a process that will explore neat indexing tricks, query rewrites, temp tables, computed columns, and expanded SARGability, but that’s a long way off.
If you told me I could close SSMS today because there were no problems left to solve, I’d happily go do something else.
It’s not me, it’s you
When people talk about <insert role here> being dead, they often mean that the kind of <insert role here> they were is dead.
It says more about them than it does about you and the job you do.