
I was cleaning out a folder of presentations and I realized there are a whole slew of problems that we had to worry about ten years ago, and they just hardly ever come up today.
- Virtual Log Files (VLFs) – it’s been years since I’ve run across a server where this was an issue. I don’t know if people are just doing a better job of taking care of this, or if the widespread adoption of tools like sp_Blitz means that people know about it earlier, and fixing it before they call me. Either way, great!
- Partition Alignment – remember when you had to pay attention to how drives were formatted? Remember when you even knew what kind of storage you were running on? These days, between virtualization and the cloud, the underlying storage architecture could change several times over after you install the server.
- PFS/SGAM contention – between the better newer SQL Server installers creating a few TempDB files by default, and the widespread user education about 1 data file not being enough, it’s easily been 2-3 years since I’ve solved a problem by adding TempDB data files.
Hallelujah! Here’s to those problems never coming back.
On one hand, it’s not like our jobs are getting easier, either: there’s just a whole new class of challenges to deal with, whole new layers of abstraction.
But on the other hand, it is kinda disappointing what old problems we still have, and I’m going to link to the feature feedback requests to fix ’em:
- Cost Threshold for Parallelism – it’s still 5, which leads to problems with CXPACKET waits for tiny queries. Just last month, I was dealing with a customer’s 32-core server where they’d neglected to change this default setting. Just that one change alone dramatically cleaned up their performance problems.
- Configure backups during installation – you, dear reader, are a trained (maybe) professional (look, I’m being generous) who takes great care (I know, it’s getting hard to suspend your disbelief) in setting up backups to match your RPO and RTO. Many shops don’t have the luxury of employing a good-looking data person like yourself, though, and they struggle setting up their backups. It’s time to make that easier.
- Proactively look for corruption in the background. Out in the real world, far too many folks just don’t take the time to set up DBCC CHECKDB jobs, and even when they do, they don’t know when to schedule ’em because they’re afraid of taking their servers down. SQL Server knows when it’s idle, and can track the progress of CHECKDB. It needs to regularly scan for corruption just like RAID controllers do.
25 Comments. Leave new
My 2 cents:
PFS/SGAM contention – Funny, I fought this last year on a server with 16 tempdb files. I guess developers can break anything you wish. But to be fair most people don’t realize what can become a problem until it hits production. And I blame developers jokingly, because I let this code reach production in the first place.
Cost Threshold for Parallelism – I would add Max Degree of Parallelism. One query goes parallel and uses all CPUs? No thank you. Default should not be 0.
Good news! After 20 years, Microsoft fixed MAXDOP: https://www.brentozar.com/archive/2019/05/setting-maxdop-during-sql-server-2019-setup/
AFAIK a query goes serial (1 CPU) or parallel (MaxDOP setting), but there’s nothing between those numbers. I would like to have a dynamic MaxDOP based on query complexity
PFS/SGAM contention- is also addressed by Micrsoft by implementing Memory optimized tempdb meta data.
There is a great session from Pam Lahoud, about temp db fixes over the years.
https://www.mssqltips.com/sqlservertip/6230/memoryoptimized-tempdb-metadata-in-sql-server-2019/
and there is one more by Pinal Dave
https://blog.sqlauthority.com/2020/01/29/sql-server-number-of-tables-memory-optimized-in-tempdb-in-sql-server-2019/
Right, but that’s in 2019. I don’t even get a lot of customers on 2019, and I still don’t see the problem. Folks really started using multiple files, which is great.
I guess I need to start dreaming about 2019 then 🙂
When did you last think about the interleave on your hard drive?
My what?
It was in the before time – and, I rest my case!
VLFs: I recently came across a server for a legacy system with over 200 databases in the instance. Most had over 200 VLFs, with some almost at 700. This was far from its biggest problem, but at least this one is now fixed and the AutoGrow settings are far more reasonable. [Yes, I know we should not rely on AutoGrow but I’m the only DBA for over 800 databases so I have to prioritize.]
Yeah, 700 isn’t bad. I don’t think we even alert you in sp_Blitz until you hit 1,000, and even then we’re really just doing it so you can proactively fix it before it really becomes a problem (in the 10k range, you can notice a hit.)
On the “SQL Server knows when it’s idle” thing… absolutely correct! It knows that our prod server is never idle. 😀 Yeee-haaa!!!
Listen, you gotta stop that script that says WHILE 1 = 1 ALTER TABLE REBUILD;
BWWWAAAA-HAAAAA-HAAA!!!! Way too funny! You know that I mostly use the “Brent is sucking bong water through two straws” method (remember?). 😀 Works fine, fails safe, a drains to the bilge. 😀
To be sure (the written word is frequently mistaken), that was meant as a joke. You remember my posts about me adopting the mostly hands-off method that you spoke of way-back-when.
Hahaha, yeah, I knew you’d like that one, hahaha!
I see many servers with the power management set to balanced these days instead of high performance.
PFS & SGAM contention was the first time I saw you present. It was a lightning talk at Summit 2012, you presented it dressed as Bob Dylan and all I was thinking was “who is this guy?”
YOU WEREN’T SUPPOSED TO KNOW IT WAS ME – IT WAS REALLY BOB DYLAN
What about default fill factor = 0, which means actually = 100?
Stupidest settings which causes lots of troubles – still persists.
Sergiy – 0 and 100 are the right defaults. To learn why, watch this: https://www.youtube.com/watch?v=iEa6_QnCFMU
Brent: That talk was SO helpful! Thank you so much!!! You didn’t just answer why 0/100 is the best default, but you also covered what we should be doing every night and gave us tools to do it. I got sp_Blitz yesterday and have been going through the results ever since. We just got a new server. Setting things up right from the start is just such a nice feeling. Thank you for making that amazing tool available for free.
Susan – awesome, glad you liked it!
1hour 28 minutes – seriously?
First 5 minutes contained 0 useful information to me. Can’t make myself to continue listen to that.
Can you outline in short – why do you think they are right?
P.S. the idea of 0=100 is the first clue it’s wrong.
P.P.S. 2nd clue – those are marginal values.
I’m probably not a good fit for you. Best of luck on your learning journey. Thanks for stopping by, and take care.