SQL Server isn’t lonely, but it needs a dog. In today’s Consulting Lines series post, I’ll share my very favorite consulting line of all – one that makes people laugh while illustrating a point.
The Situation: Bob the Button Basher
My Blitz script helps me quickly spot SQL Server settings that have been changed from the default values. I always work with the client’s staff in the room, and when I see a changed setting, I turn to them and ask:
Me: “Hey, Bob, I notice that max degree of parallelism has been set to 1. The default is zero. Can you tell me why that was set to 1?” (The key here is to not say anything whatsoever about what the setting does – let Bob explain it to you.)
Bob: “Yeah, I read a blog saying that all OLTP servers should have maxdop set to 1.”
Me: “Ah, yeah, I’ve read those too. Did we measure the before and after impact of the change?”
Bob: “No, it just seemed like a good setting to change.”
Me: “Gotcha. Funny thing there – SQL Server is like an airplane. Modern airplanes are so advanced that there’s only two things in the cockpit: there’s the pilot, and there’s the dog. It’s the pilot’s job to feed the dog, and it’s the dog’s job to bite the pilot if he touches anything.”
Bob: “HA HA HO HO”
Me: “The problem with SQL Server is that it doesn’t have a dog.”
Bob: “Wait, what?”
Me: “SQL Server has a lot of knobs in the cockpit, and there’s a lot of advice out on the web that tells us to push all kinds of buttons to go faster. Thing is, SQL Server ships out of the box with a pretty good set of defaults. Let’s use those defaults until we have a really good reason to change something, and when we do, we need to document what we change so we know how it improved performance or reliability.”
What That Line Does
SQL Server really does have a lot of knobs, but every setting – every single one of them, including the defaults – has drawbacks. I salute Microsoft for doing a really good job of weighing the pros and cons and coming up with a solid set of defaults.
In my line of work, I flip a lot of knobs, but for every one I touch, I try to explain thoroughly what I’m doing. I may casually say, “This server needs Optimize for Ad Hoc Workloads enabled,” but when I do it, I also explain what I’m seeing about that particular server’s workload that justifies the change.
Jonathan Kehayias (Blog – @SQLPoolBoy) recently wrote an excellent article about Lock Pages in Memory, one of those settings people just love playing with. Before he posted the article, we had a great discussion about why we have differing opinions on that recommendation. I understand why he recommends using it as a default, but the reason I don’t recommend it is illustrated beautifully by the fourth comment on the post:
“With LPIM, SQLOS will back down its usage to your ‘min server memory’ value. At that point Windows can try and hard trim the other processes, and potentially page the process requiring the 30GB memory out, or if it can’t respond efficiently this way, you may face a crash due to Out Of Memory for the OS.” – Jonathan
Ouch. I’m not a big fan of giving a default recommendation that might crash the OS. There’s also a bug in SQL2008R2 that causes corruption with LPIM. I understand why Microsoft doesn’t enable Lock Pages in Memory by default, and I agree with them – it’s just not a safe button to push if you aren’t intimately familiar with the consequences and how to monitor for them.
This consulting line helps establish whether the other person understands the real ramifications of pushing buttons. If they’re blindly bashing buttons based on blogs, it helps them understand the risks.
What Happens Next
The success of this line depends on your ability to dive deep and explain the concepts. You have to be able to explain the pros and cons of both settings – the default setting and the one they chose. Jonathan’s post on LPIM is a great example – if you can be comfortable explaining a topic to that level of detail, then you should feel comfortable flipping that switch. (If, on the other hand, you’re not even comfortable reading a post with that level of detail, then you shouldn’t be flipping that switch.)
Bob: “So should we leave maxdop set to 1, or go back to the default? We have no CXPACKET waits, so it must be working.”
Me: “Well, that’s true – no queries are going parallel. But parallelism isn’t always a bad thing – sometimes we want to throw lots of CPU power at queries. Right now, we’ve got queries that aren’t running as fast as they could because they’re not getting the CPU they need.”
Bob: “But the server isn’t CPU-bound – CPU % is usually under 10%.”
Me: “That’s because you’re measuring CPU use as a total, not per thread. We’ve only got a few active queries at a time, but we’ve got a four-socket server with 10 cores per processor, and hyperthreading is turned on. Our CPUs are sitting around bored. If we let some queries go parallel, we’ll be able to use those CPUs more effectively. Based on what I’m seeing in your procedure cache, I recommend setting maxdop to 4 and cost threshold for parallelism to 100. Here’s the evidence that supports it, and here’s how we’ll measure the success of that change…”