SQL Server ships with sane defaults, right? Well, sort of. SQL Server ships with pretty sane defaults for a single CPU socket server or for a development work station. Honestly, the defaults are good enough for many things, but there are a few gotchas. Changing these settings is quick, relatively painless, and can result in all around stability and happiness.
Maximum Degree of Parallelism
By default, SQL Server will use all available CPUs during query execution. While this is great for large queries, it can cause performance problems and limit concurrency. A better approach is to limit parallelism to the number of physical cores in a single CPU socket.
If you’re saying “What?” right now, don’t feel bad. Basically – on a SQL Server with two CPU sockets and a 4 core CPU with hyperthreading off, the
maximum degree of parallelism option should be set to 4.
Don’t believe me? Check out Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server.
Cost Threshold for Parallelism
Speaking of parallelism – what about that cost threshold for parallelism setting? The default is set to 5, is that a Good Number®? Not so much.
The optimizer uses that cost threshold to figure out when it should start evaluating plans that can use multiple threads. Although there’s no right or wrong number, 5 is a really low setting. It’s appropriate for purely OLTP applications, but as soon as you add a modicum of complexity ka-boom!
I recommend starting with this setting at 50 or so and tuning up or down as appropriate. Make sure you measure for the critical queries in your application, of course.
Instant File Initialization
Technically this is a Windows permission, but whatever. By granting the “Perform volume maintenance tasks” permission to the SQL Server process, we give SQL Server a boost when it comes time to grow out data files.
The default behavior in Windows write a bunch of zeros whenever a user asks for space. If I create a 500MB file, Windows will write 500MB of zeros to disk to properly initialize the file.
By giving the SQL Server process these extra permissions, SQL Server can use a different API when requesting space for data files. These permissions tell Windows to mark the space as used and immediately hand it back to the calling program. In short – you can grow data files faster (which includes TempDB on system start up).
Free compression! If you’re using SQL Server starting with SQL Server 2008R2, you can flip a check box and get yourself some backup compression. This is a no brainer, really – backups are smaller, they take less time, and the restores even take less time. What’s not to love?
The Remote DAC
No, I don’t mean a DACPAC, I mean the dedicated administrator connection. When you connect through the DAC, SQL Server provides a dedicated connection, CPU scheduler, and memory; all of which make it easier to troubleshoot when SQL Server’s gone completely crazy. By default you have to be on the SQL Server’s desktop either physically at the console or else remotely over RDP. When I’ve got a runaway SQL Server, I know that I don’t want to wait for a sluggish GUI to respond to my commands.
Enter the remote dedicated administrator connection.
By enabling the remote DAC, a DBA can make use of the dedicated administrative connection from any computer that’s allowed to connect to the SQL Server – from your workstation, bastion server, or even your home PC over the VPN. It doesn’t matter, as long as you’ve got rights. Remote troubleshooting systems that are stuck at 100% CPU utilization is much easier when you’ve got dedicated CPU resources and memory at your disposal.
What are you waiting for?
Only one of these changes (Instant File Initialization) requires a restart, so get out there and start switching things up. (Your mileage may vary.)
Interested In Learning More?
In our classes, we cover the SQL Server settings you should watch and the tools that will identify even more settings to change. Read more about our training!
Good advice to be sure! I would also add the max server memory setting to your list of must change options. Obviously there are others and but where do you stop???
I do have a question about Cost Threshold for Parallelism. I always had the seat-of-the-pants feeling about 5 being too low, so I have used 10 as my default for the last 4 years. I can’t really defend that number but things have been much smoother since the change. I have a mixed bag of OLTP / heavy batch file processing. I realize that this is a “your mileage may vary” situation but can you elaborate on the 50 setting as being a good starting point?
Sure thing! I recommend 50 for Cost Threshold for Parallelism for a few reasons, but the big one is that 5 was the recommendation back in 1999 when having two CPU sockets made you a king among people deploying Windows. Here in the future, we can buy a cell phone with a quad core chip on it.
50 is an arbitrary number, but it is significantly higher than the default. The upside is that a cost threshold of 50 gives you some wiggle room after you’ve convinced management to let you make the change. If things get worse you can always cut it in half and still be better off than you were before.
I didn’t recommend setting max server memory simply because most people need to buy more memory 😉
So Jeremiah, just to be clear, you recommend leaving max server memory at the default setting?
Just to be clear: no. I merely said that most people have bigger fish to fry.
That is clearer. Thanks.
Just to prove the old adage, test test test, 50 proved too aggressive for me for a real case this morning (which is perfectly OK and expected). Funny thing was, there was a plan that really needed to be parallel and wasn’t. Its cost? 49.2. No joke 🙂
Not settings the max server memory is not good practice. By rule you should set the max memory to 85-90% of total memory leaving 10-15% free for non-sql server operations.
Well, the “rule” does vary by who you ask. In our setup guide, our rule of thumb is to start with leaving 10% free OR 4GB, whichever is greater (because on small instances it may need to be a larger ratio). Other people have more complex rules, and those can be fine, too.
So we do agree that setting it is good, I suspect Jeremiah was just explaining why it didn’t make his top 5 list in this post.
Thank you for the suggestions.
Just a comment: changing sp_configure options wipes out the plan cache, so, even though a sql restart is not required for most of these options, the changes recommended should still be planned carefully in a prod environment: perhaps at the start of a slow period, such as a weekend.
Just add the OPTION (FORCE PARALLELISM) hint!! Oh, wait … we don’t have one of those! 😀
Nice list Jeremiah! 1-4 are super important and pay dividends right from the get-go. 5 is one of those gotta-have-it-when-you-need-it-but-you-hope-you-don’t kinda things.
I really think the list should include a 6th item: Default file growth increments. 1MB for data files and 10% for tlog files has devastated almost every client I see’s IO response time and throughput capabilities. Having hundreds of thousands or even millions of 1MB file chunks is going to hurt your IO performance, I don’t care how many disks on the SAN you have backing up your LUN! 🙂
That’s a great point about the file settings, thanks for adding that.
Great points….How about adding “Optimize for Ad hoc Workloads” to the list if we are using 2008 or above…?
Also, If I recall correctly, Backup Compression is an Enterprise Only feature for SQL Server 2008. I think MSFT made this available to Standard edition as well starting 2008R2…
I can’t recommend “optimize for ad hoc workloads” as a blanket recommendation. While this can control the overall growth of the size of the plan cache, it doesn’t control growth of the count of entries in the plan cache. Since search time in a hash table is O(n) (it grows linearly with hash table size), this can become a big problem when you have a lot of ad hoc queries and a high query volume. Basically – you can end up performing enough plan stub lookups to cause performance problems.
Compression is an EE only feature in SQL Server 2008 but is free in Standard starting with 2008R2. That’s why I mentioned “If you’re using SQL Server starting with SQL Server 2008R2…”
I think I just had a flashback about how ‘Optimize for Adhoc Workloads’ burned me once. That was not amusing.
Kendra, can you elaborate? The “optimize for adhoc workloads” option seems pretty harmless, but knowing possible caveats would help me choose it more carefully.
I’d love to hear more about this too!
Good news! You already heard about it the comment Kendra replied to.
Optimize for Ad Hoc can end up causing huge wait types when you’re looking up plans when you have a sizable plan cache.
Let’s say you have a table and all reads on the table are performed on an indexed BIGINT column. Queries from the application are unparameterized and come in as fast as SQL Server can process them.
In this situation, you can end up with hundreds of thousands (if not more) plan stubs in the execution plan cache. Whenever a new query comes in, SQL Server has to check to see if that plan has ever been seen before. This lookup operation isn’t free, neither is inserting a plan or plan stub into the cache.
As the plan cache grows, you can expect the time it takes to perform a lookup or insertion to grow linearly with the cache. Busy server + optimize for ad hoc = LOLSLOW.
I have had issues with slowness from non-cacheable plans before on a proc that was run about twice a second that brought a good sized server to its knees. However, that was at a point in my career where I was clueless as to using wait types to their full potential.
What wait types would you expect to see in both my case and the case of ‘Optimize for Adhoc Workloads’ gone wild?
BTW, from what I remember, the non-cacheable proc created a temp table, loaded data into the temp table, then created indexes on it. We switched the order of operations to get it to go to cache.
Can you give some detail about how searching through hundreds of thousands of plan stubs is going to be worse than searching through hundreds of thousands of actual plans (which will be much larger and steal away much more buffer pool memory)? I would guess there would be fewer to search through because the real plans are larger, but then you have to balance that with the overhead of storing the new big plan, more flushing out of unused single-use plans and the loss of buffer pool memory leading to more IO which is really slow.
Given the number of people I respect recommending turning on optimize for ad hoc workloads (Tim Chapman, SQLSkills, Bob Pusateri, Grant Fritchey, etc, etc) I would consider it an “edge case” to not want it on.
Note that with ALL of the changes mentioned on this post the reader is obligated to TEST on THEIR SYSTEM to validate that they are making a change for the better!! 🙂
Kevin – that’s actually an interesting question. Often when I see MCM-level people recommending things that *they* would do to *their* servers, I take a step back and ask who their target audience is.
For example, on any server I manage, I like to have a particular vendor app that does a phenomenal job of correlating plan cache changes to database object changes. However, it takes a rocket surgeon to interpret the user interface. I’ve tried getting clients to install that tool, but when they get it, their eyes glaze over.
I’m an MCM – but the way that I manage my systems is different than how I’d recommend the public manage their own systems. I make changes and take risks by default, but only because I know I’m monitoring things really closely, and I can detect when there’s a problem. (I’m sure you’re the same way.)
Another example – out of all of the race car drivers I respect, none of them have airbags in their race cars. Should I take the airbag out of my car too?
Now, reread this post with the context of the reader, not someone whose email address is TheSQLGuru. 😉
Kevin, my take is this – if you don’t have a reason to make a change, you shouldn’t be making it. The settings that we’ve talked about changing all have clear reasons to make the switch.
I can’t make a recommendation without justification. Just saying “turn on optimize for ad hoc because… reasons” doesn’t fly. If there’s a good reason to flip the switch, I’ll flip it. But, in general, I try to address the underlying problem.
JP: “I can’t make a recommendation without justification. Just saying “turn on optimize for ad hoc because… reasons” doesn’t fly.”
Wait a minute – that is EXACTLY what you did with this blog post! 🙂 Change CTFP because 5 is too low because…. Change maxdop because 0 can cause xyz problems. Etc. I merely pointed out that many others (respectable others, just like yourself and Brent) have stated that optimize for ad hoc workloads solves a known and common problem the majority of the time. And like I said, one must still test – even with changes such as CTFP and maxdop – because the defaults COULD be optimal for their environment, however low the probability of that may be.
Anyway, enough of this beleaguered horse! We will just have to agree to disagree. Move along people, there’s nothing to see here … 😉
Kevin – uh oh, can you point me to anything I’ve got showing everyone should turn on optimize for ad hoc across the board? I would definitely want to edit that to make it more clear that it’s not an across-the-board recommendation.
Wow….not what I expected here.
Anything above is worth looking into, putting it through Dev, Beta, and load testing. However, if you’re blindly turning these knobs without knowing what they do or how your environment would act, that’s not good.
Microsoft set most of the defaults because they’ll work ok for default installs on small servers because the larger servers will have DBAs testing other settings. There are several settings that should be tested out and changed on most servers, and Jeremiah is pointing out those settings and where he would start out testing them.
If you read a blog and jump to installing it in Prod without knowing what it does or why you’re doing it, please don’t read my blog. In this case they mentioned Cost Threshold of Parallelism which wasn’t on my list of things to look into, but it is now. I did not change that in prod, but I’m going to learn more about it and determine if that’s the best thing for me now.
Brent: I didn’t say you said to turn on optimize for ad hoc, I said you (and Jeremiah) were respectable sources like others who HAD said to do that. 🙂
Steve: I think everyone here has said to test any changes to ensure it is optimal/appropriate for your system!! You are correct that to do otherwise is foolhardy, REGARDLESS of the “respectability” of the source!! LOL
I do disagree with one small point you make though Steve – that Microsoft has set (some) defaults to work well on default installs on smallish systems. Some of these are the values they are simply from inertia, often well over a decade out of date. They just don’t want to accept the “risk” of changing them, even though they would clearly reflect modern hardware/data sizes, etc, much more helpfully. I think CTFP, MAXDOP, 1MB data file growth and likely others fit that scenario.
Yeah, small and out-of-date are about the same recommendations, so that’s probably all it is. Even then, I can’t imagine 1 MB or 10% growth being a positive thing, same as a flat 300 recommendation for PLE that Microsoft is known for.
I feel that this was a great article by Jeremiah with the exception that I wish it was longer. Most of these things are on my new server build checklist, and I’m always adding to that.
I’m still hoping to meet you in person on Friday for the Pittsburgh SQL Server Saturday presenters’ dinner, and will definitely be seeing you this Saturday!
Well, it WAS a post on just FIVE things!! He picked some good ones. And the best thing is that the nice collection of comments he picked up are now part of the blogosphere for future consumption.
AWESOME about meeting you! I will be there with my 8yo daughter Bean, who will be attending here 10th or 11th SQL Saturday! 🙂
It would be great to see a longer list that includes more features. I can’t say Cost Threshold for Parallelism was on my list, and I’m probably missing more.
The only thing to keep in mind with instant file initialization is that it is specifically data files only. Those log files need to be 0’d out before they can be used since there is no EOF marker on them. See Microsoft’s site for a quickie showing it doesn’t work on log files, but you’ll have to dig through Google to find a descriptive reason. Since logs don’t grow too often, you really only notice this when you start SQL with a huge default log size on TempDB.
IFI is tricky like that, thanks for pointing to Microsoft’s documentation.
Steve – for a longer list, check out our SQL Server setup checklist at https://www.brentozar.com/go/setup. Enjoy!
Now that’s what I’m talking about! I’ll have to find time to go through that and compare to my server build checklist, hopefully I’ll have some comments for you and changes for me.
I’d also add “Check Windows Power Management settings” to that list..
Great post! But I’m missing the code to change the settings – would you please be so kind to provide it for the rookies like me? 🙂 Thanks Jeremiah!
Howdy Frank, all of these settings are documented in clear T-SQL (and SSMS clicky clicky) in SQL Server Books Online.
I’d add a quick check that the OS Partitions are properly aligned with the strip size of your disk array. Not as big an issue as it used to be since Microsoft changed the defaults when creating a new OS Partition, but well worth checking just in case.
On the MAXDOP, I’m slightly confused. You’re suggesting limiting the value to the number of cores on a single socket whereas the document you link to suggests
“For servers that have hyperthreading enabled, the MAXDOP value should not exceed the number of physical processors.”
They don’t appear to mention the single socket restriction, have I read either you or the document incorrectly.
Conan – if we just keep adding one more quick thing, then it turns into a book, hahaha. Gotta draw the line somewhere.
Reread that Microsoft knowledge base article carefully – there’s several caveats, like the number of cores in a NUMA node.
The only complaint you’ll have if your blog posts are as long as books would be “There goes my day….” In fact, we may all join together and sign you up for special surprise.
How would the MAXDOP article apply to a SQL Server hosted on an Amazon EC2 instance. Would it be configured via the vCPUs hosted on that EC2 instance?
Short version – it’s complicated and depends entirely on the hardware that your VM is running on (which can change across reboots).
Longer version – MAXDOP in AWS depends on the size of the EC2 instance that you’re using. You can see all of the instance sizes at ec2instances.info. I’ve put together an eBook on SQL Server in AWS that should help with more SQL Server configuration. And, if you’re really curious, you can always use CPU-Z to figure out which CPU is under the hypervisor and attempt to configure MAXDOP appropriately.
Thanks Jeremiah for the quick list. Great information! I would just reiterate what you briefly mentioned and that is TEST and RE-ADJUST as needed!The beauty of SQL Server is that one size never fits all (and that’s why we all have jobs).
A quick add-on question to Gerardo’s question about Hosted SQL Server. What about VMs in general? What should MAXDOP look like on a VM?
Hi Francis, setting MAXDOP on VMs is a pretty big question. One thing to keep in mind is that your VM can be moved between hosts with different core configurations without you being aware of it. That hardware could have completely different performance characteristics. Under virtualization (and in the cloud), we all need to stop micromanaging our SQL Servers and start thinking of them as appliances or plumbing.
Determine the resources you need to accomplish your task and use that as a guide.
I will disagree a bit with that comment Jeremiah. MOST people can stop micromanaging their SQL Servers. But anyone who wants/needs an optimized service (database, web server, file server, BigDataServer, whatever) simply must micromanage said servers to some varying degree.
I think what Jeremiah is saying is that we should pick a size for our VMs and leave it alone. Revisit that decision after the first month, and possibly quarterly after that. Also, make it easy on yourself by having the hosts configured identically.
When he refers to them as appliances or plumbing, that is a very good analogy. You put them in place, leave them alone as much as possible if you want to rely on them working properly, and make changes very rarely and when necessary. If you do this, MaxDOP becomes one of those things you can stop micromanaging and start spending your time worrying about things that need your attention.
This may involve you investing more upfront to set it up to the point that you don’t need to tweak it at every turn, but it saves you time. With money and time being pretty much the same thing, it’s going to be cheaper to put in a solution that doesn’t need a babysitter.
You’re right on the money, Steve. Virtual/cloud servers are resources that should be sized once and then left alone. Setting MAXDOP in an environment where it’s easy to start the day on 3GHz cores and end it on 1.7GHz cores doesn’t make a lot of sense.
In an environment like that it’s more important to worry about business level metrics (orders per second) vs product level metrics (flanging skrills per hectare of whatever).
One caveat about backup compression – if you’re using TDE (transparent data encryption), you’re going to see no benefit from compression. All it’s going to do for you is make your backups take longer, and possibly even produce a larger backup file.
For that matter, I haven’t found a third-party tool that will compress a backup from a TDE-enabled database either…
That’s a feature of any type of encryption – it renders compression useless by transforming plain text into meaningless garbage.
For many people, they won’t be using TDE. For those that are using this Enterprise Edition only feature, they probably have enough capital available that it’s possible to overcome the lack of backup compression by adding additional files and I/O paths.
Re the comments about changing initial-size and auto-increment size. I’d like to agree, and in fact used to change those defaults (on Model DB). However, I got burned a couple times by app-vendor installation routines that failed. If they hard-code a size that’s smaller than the system default, their install routine will flat-out fail.
Does your recommendation of setting Cost Threshold for Parallelism apply to both physical machines and VM’s? I’m specifically interested in a VMWare VM. Does it depend on host settings at all?
No difference there – an expensive query is an expensive query.
Ok, great. I figured that was probably the case here but thought I’d double check since we’re just getting our feet wet with the caveats of virtualizing SQL Server. Thanks!
Just curios, I have came across this VMWare best practices pdf where the maximum recommendation for [cost threshold for parallelism] is 25, half of what you recommend. Their [MAXDOP] recommendation is also fairly different to this article.
Please give me your views on this.
I have a lot of views on that document and there are far too many to write in a simple blog comment. In the case of that article, VMware are specifically targeting large, dense, VM farms – 1,000 hosts were involved with 10,000 powered on VMs, each using 16 vCPUs and 26GB of memory. I think you could agree that VMware are using a non-standard configuration for this document. As a general set of SQL Server guidelines, I would leave this PDF alone.
Even for virtualization, I can’t recommend following these guidelines.
Completely agree. Thanks for the feedback.
I have revisited this article frequently to make sure I’m choosing reasonable settings as I start troubleshooting performance problems. I have a SQL 2005 server that’s running on a Win2k3 virtual machine. It has 4 CPU, but the config is 4 sockets, each with one core. If I understand your MAXDOP guideline, I think my MAXDOP setting should be 1. Does that sound right?
Thanks – you guys rock.
Bil – I assume you’re talking about a VMware virtual guest, in which case the number of cores and sockets is a bit of an abstraction. Your VMware administrator can set your 4 virtual cores to be either 1 socket with 4 cores, 2 sockets with 2 cores, or 4 sockets with 1 core each. It doesn’t line up to real processors at all.
Thanks, Brent. Trial and error, here we come! At least it’s not PROD. Oh, wait…it is PROD.
Hello, Thank you for your good advice. I agree with all settings recommendations.
Regarding the MAXDOP, I would say that according to the Microsoft KB you should to check how many cores per NUMA Node you have, and then you should set the MAXDOP to this maximum value or less. I mean, the constraint is not the cores per physical socket but by the numer of cores per NUMA node.
KB reads as follow:
For servers that have NUMA configured, MAXDOP should not exceed the number of CPUs that are assigned to each NUMA node.
You’re right, but I won’t be changing the post. Only AMD chips ship with multiple NUMA nodes per socket and I can’t recommend running AMD chips based on recent benchmarks.
Well, an admin could manually configure NUMA (and it isn’t “physical cores” then obviously), but then they best damn well know way more than what 5 things they need to configure on their SQL Server!! 🙂
And Jeremiah is definitely correct about AMD chips. Search out blog posts by Glen Berry on some very good advice on which Intel chips to pick for various SQL Server hardware configurations (and why).
So I am in a VMware environment using SQL 2008R2 Standard. The processors are configured to be presented to the OS as 4 “Physical” CPUs with 2 Cores (8vCPU). I have 8 CPUs showing in NumaNode0. From reading this post I believe I should set the MAXDOP to 8, but a little voice is saying maybe 2. Can someone point in the right direction.
You need to enable NUMA in VMware. This is detailed in the “Performance Best Practices for VMware vSphere” guide – just search for that phrase and the version of VMware that you’re using. From the 5.5 guide:
Just remember that changing this could cause SQL Server to see certain configurations as “illegal” and take some CPUs offline (e.g. 8 x 1 core sockets). Regardless of NUMA, configure SQL Server so that parallelism lines up along physical
I’m in an IAAS environment where I don’t know the actual physical processors being used to serve up the 8 vCPU, so I’m not sure I can “configure SQL Server so that parallelism lines up along physical processor boundaries”.
You can use a tool like CPU-Z to determine the underlying CPU architecture, which will give you better insight into the actual number of cores available to you.
I would just like to bring to the table one more setting that may be worth looking at and perhaps changing: the Network Packet Size.
There are lots of posts out there on why to change the default setting OR NOT, but in my case, I am using SSIS with the Attunity connector to import 50+ GB of data each night from an Oracle DB into SQL Server 2008R2. After switching from the dafault 4096 to 8192, the insert time of my SSIS packages decreased with 25% in average. Of course, I tested it first in a DEV env and only then applied the same changes in PROD which I monitored on a daily basis for a couple of weeks.
That sounds like a fairly specialized case where you saw network contention. Changing packet size can have some far reaching consequences, especially if done across the board in the entire networking stack. There are also considerations to be made for jumbo frames. It can prove beneficial, but it’s not something I’d recommend as a default change. Glad to hear it worked for you.
So than what is the different between max degree of parallelism & cost threshold of parallelism ?
Will both settings need to change accordingly.
Max degree of parallelism is query use 2 CPU if it set to 2 correct ???
Ankit – you can learn more about MAXDOP and CTOP at https://www.brentozar.com/go/cxpacket – enjoy!
I’ve got “Max Degree of Parallelism” set to 4, but consistently see thread counts (with same session ID) much higher than that; right now I’m looking at 36 threads executing the same statement. What is going on here?
Parallelism is trickier than you’d think. Take a look at How it Works: SQL Server Per Query Degree Of Parallelism Worker Count(s) for a great explanation of why you’re seeing more threads than you expect.
Thanks! I can see where that is going.
Awesome, glad it helped. The article does a better job than I would do explaining it in a comment.
I know this is old but there is at least one caveat I’ve come across regarding enabling backup compression (and it aint CPU usage).
Some BI software reads log backups and requires them to be uncompressed, such as Oracle GoldenGate.
Yup, thanks for pointing that out. It’s always important to read the manual.
What would be a good way to measure improvement or degradation of performance when changing MAXDOP and Cost of Threshold for Parallelism? We are planning on setting our MAXDOP to equal to the number of physical cores in one NUMA node and Cost of Threshold to 50.
The one way I can think of right now to through wait stats. But I do understand that this will not eliminate CXPACKET waits. What else can I do to tell whether it made improvements?
I would capture a workload with profiler and run it through Qure. Then make your MAXDOP/CTFP changes and run the same workload and use Qure’s ability to compare profiler runs. Awesome FREE tool from DBSophic.com!
Thank you very much for your response. I appreciate it.
Checking out Qure now.
Anyone tried the Jumbo Frames settings for sql server at the SAN Level. Have a heavily used OLTP server and consultant recommened it (EMC 5600) sql 2008r2
You need to be careful and test that you’ll get an appreciable benefit – every NIC, switch, and router needs to be configured for jumbo frames, too.
Personally – I’ve seen mixed results.
It is pretty much always on I think for iSCSI SANs on the dedicated IO NIC(s).
In our case when we cutover from 5300 to 5600 it was not. Changing tonight, struggling with index rebuilds taking forever (25 hours) for a 1.3 terabyte db, using Ola’s script it did only 640 indexes but slower than old SAN
To every rule, there is an exception. We upgraded from SQL 2005 to SQL 2014. The new server was definitatly more powerful than the old one, but mostly in terms of memory and disk array. We had the same number of CPU/Core licenses as the old machine, but at a bit faster speed.
One of our larger reports that gets run frequently was taking almost twice as long on the new server. When I looked at CPU usage, I saw that one of the cores was pegged and the other 7 were idle. On the old machine, all 8 cores would be busy. I lowered the threshold from 5 to 4, and now all 8 cores are busy and the report runs in about 60% of the time it took on the old machine.
So: you want to raise the Cost Threshold for Parallelism, except for when you want to lower it.
If a large report is getting an estimated cost below 5, something is really wrong… Check the plan and the estimates vs actuals, I’d bet on a statistics or some other cardinality estimation issue.
I agree with Alex’s comment – this sounds like either a stats issue or an optimizer bug. Check out Identifying SQL Server 2014 New Cardinality Estimator issues and Service Pack 1 improvement for more information about finding bugs in the new CE and suggestions of how to work around the bugs in the new CE.
We had to add the 4199 trace flag to the starup parameters. After that, we returned the cost threshold to 5 and performance returned to what we expected.
I have not tried raising it above 5 yet. What exactly is the downside to having it too low? Why would you not want to take advantage of the multiple cores whenever possible?
TF4199 enables the grab bag of optimizer tricks, so you’re likely able to take advantage of some change to the optimizer that will be turned on in a future version of SQL Server.
The downside of having cost threshold for parallelism low is that relatively cheap queries can go parallel. You don’t want a lot of queries going parallel because of concurrency – SQL Server doesn’t perform time sharing in the way that a desktop computer does. Queries will run until they complete or until they wait. If you have 8 CPUs and a query is allowed to use all 8 CPUs and that query doesn’t have to wait for hardware, other queries will have to wait until that initial query finishes.
I think enabling remote dac can be a huge dent in the security for SQL server.
Can you explain a little bit more about how enabling the remote DAC is a security hole?
How do you look at cores in order to set Max Degree of Parallelism for servers running in Hyper-V? Would you still limit it to the physical cores on a single CPU on the host machine or some fraction of logical cores assigned to the virtual machine?
The same rules apply for virtual SQL Servers – don’t allow a query to go parallel in a way that could span NUMA nodes.
So, I’ve turned off NUMA spanning in my Hyper-V setup. I have two CPUs with six cores each. Task Manager shows me 24 logical CPU cores. So I have 6 physical cores / 12 logical cores per NUMA node. I currently have 8 logical CPUs assigned to my database virtual machine (licensed for 8 cores). Would it be best to set MDOP to 6 or 8? Either one will keep it on one NUMA node since the Hypervisor will not allow it to span NUMA nodes. Tha dabase virtual machine has 128 GB of RAM and rarely has to retrieve data from disk once it has been up and running for a few days.
I guess what I’m really asking is how effectively the workload of 8 logical CPUs is spead to 12 threads on 6 physical cores. Is there an advantage to letting it work 8 threads or should I limit it to 6 so each core can just concentrate on one thread?
I recommend reading Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server, following their initial guidance, and then tuning for your workload.
Also – each core will only “concentrate” on one thread – a CPU cannot process two instructions at the same time. HyperThreading is a cunning trick devised to process other instructions while waiting on cache, RAM, or other physical resources.
I swear that the guidelines look different than 6 months ago: https://support.microsoft.com/en-us/kb/2806535
So, does this mean that setting maxdop value to the number of physical cores per NUMA is no longer the recommendation? The current KB says to use the number of logical processors if they are under 8…
Am I looking at a wrong KB or has it really changed?
Thank you so much!
You’re looking at the right KB, but I’d validate that the recommendation is actually correct. Typically, enabling hyperthreading sees about a 30% improvement over disabling hyperthreading. I would test at multiple MAXDOP settings which, it turns out, the KB recommends:
Thanks for your response, Jeremiah.
And for Sharepoint, is setting it to 1 accurate? I have just come across articles today that says to set it to 1. I do not know what version of sharepoint is running. I only can see that there is a sharepoint database. will i be able to tell the version based on just the tables?
I’d refer to the SharePoint documentation for the correct way to configure SharePoint.
Can you please help me understand what happened here:
I have a SQL Server 2008 R2 instance with 2 CPUs (4 cores each) – virtual.
I had maxdop set at 4.
I had a query where the engine generated a plan with parallelism. Performance was terrible because a worktable had a TON of scans. The same query (with the same data) on a 2012 instance was fine. Setting maxdop to 2 or 6 caused the scans to disappear and performance as good.
I thought I was done after setting maxdop to 6.
A week later I got another bad query with the same deal (parallelism and high scans). This time I set the Cost Threshold for Parallelism to 50. The engine still used a plan with parallelism, BUT the scans were gone and performance was good!!
You’ve changed a huge number of things about SQL Server here – maxdop, SQL Server version, and cost threshold for parallelism.
In answer to “what happened????” – you changed stuff. All of those things can change the execution that you get. Many SQL Server settings can influence the execution plan chosen by the optimizer.
Thanks for your response! I think I wasn’t as clear as I could have been. I didn’t change the server version. I just tested it on a different server. Let’s disregard that point.
The bottom line is that I changed the cost threshold. The plan STILL had parallelism. However, performance was MUCH better. I’m assuming that is some detail I’m missing.
There’s a lot than can be different. You may have different statistics on the other server, a different amount of memory, or different indexes. Your best bet for an answer to this would be to post both execution plans on dba.stackexchange.com or sqlperformance.com.
If you’re worried about confidential information in the plans, you could use SQL Sentry Plan Explorer to anonymize the plans.
>> I just tested it on a different server. Let’s disregard that point.
How can that possibly be disregarded?? 😀
I meant that point is detracting from what I’m attempting to convey.
Recently I’ve done some testing with a query which pulls quite some data together; on a 4-Numa node server (HP DL580 with 1.5TB memory, FusionIO disks) with 4 cores each.
When I run this query with MaxDOP of 16, it runs 56 minutes. When I run the same query with MaxDOP of 4, it has not finished after having run for 2 hours. So, I think this maxdop advice (# of cores per NUMA-core) needs revision. I’m interested to see what your opinion is, and why.
Jan Willem Borrius
Jan – in order to troubleshoot that, we would need to look at the query plan. This sounds like a great question to post along with the query plans over at http://dba.stackexchange.com.
I disagree that this advice needs revision. The MAXDOP advice is a great place to start, and it applies well for typical SQL Server workloads. A 4 NUMA node system with 1.5 TB of RAM, and FusionIO storage is not typical.
The reason you typically want to limit computation to a single NUMA node is that foreign memory access (reading memory from another NUMA node) can be as slow as going out to disk on my any systems. But that performance varies from server to server. You can verify the timings using coreinfo.
Microsoft’s recommendation of “no more than 8” comes from Amdahl’s Law which, to paraphrase, predicts the maximum speed up by parallelizing a task. Or, to quote Wikipedia, “The speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program.”
For most workloads, the original advice holds true. For a few people, it’s different. Some features of SQL Server scale beyond 8 cores. And some operations can take advantage of more cores and more memory.
This is one of those places where you should start with recommendations and then perform experiments to determine the optimum settings for a system.
Why on earth would you be surprised by a 16DOP query running much more quickly than a 4DOP query?!? The guideline for setting MAXDOP to physical NUMA-node cores is simply to try to reduce cross-node memory access, which is relatively slow compared to local. But you are STILL talking microsecond timeframe for memory access and nanosecond timeframe for those extra 12 cores doing processing. Seems obvious it would run faster.
I note that this is exactly the reason we have OPTION (MAXDOP NN) as an override per query.
To answer Kevin’s question, I think Jan’s trying to understand how things work and where variances of best practices fit in. It’s a good learning experience, and I’m glad the question was asked.
Jan, in addition to trying to tune this query, it also sounds like you don’t have hyper-threading turned on with queries that may be able to take advantage of more parallelism. There’s ups and downs to hyper-threading and parallelism across NUMA nodes, so you’d have to test this stuff as much as possible. Like Jeremiah said, your server isn’t typical and some experimentation is going to be necessary. Also, as Kevin said, you can set a single query to use a higher DOP setting where it makes sense while leaving the server as a whole on settings that avoid cross-NUMA inefficiencies.
First of all, many thanks for your reactions. With regard to hyperthreading, we’ve switched that off, because one of the recommendations of the FusionIO people was to do that; also because there are some guidelines floating around the Internet to do so. Since we will go into production shortly, we don’t have a lot of time, so I don’t think hyperthreading will be switched on again.
With regard to the MaxDOP setting at server level; I’m not sure what I’m going to do; some more testing is necessary (this is a server on which we have a datawarehouse of around 2TB compressed; we load data and users are going to query the datawarehouse). I’ll keep you posted.
What about 2 CPUS with 1 core each on a VM. I do see parallelism. The MAXDOP is set to 0. I did set it to 1 and monitor the server and it there was no difference. But server team will not warrant adding more CPUs.
Update: based on some more testing, I’ve now decided to set MaxDop on server level on 4 (= number of cores in NUMA core). It turned out that for simple queries a higher MaxDop level had a detrimental effect (over-parallellisation). For more complex queries, we’ll use the MaxDop query hint.
Is there a script you can run to get the recommended MAXDOP setting for a server?
Adrian – not exactly, but check this out:
Yes there is. http://dba.stackexchange.com/questions/36522/what-is-a-good-repeatable-way-to-calculate-maxdop-on-sql-server
If the server has exactly 8 physical (16 logical) cores. Would you recommend set MAXDOP to 8?
Adrian – check out the link in the post for more details.
You receive requirements for an application that does not support parallelism. What setting(s) would you adjust to support the application?
What does application support of parallelism have to do with SQL Server parallelism settings?
If vendor support hinges on queries not running in parallel, just set MAXDOP to 1. That’s what people do with SharePoint.
I understand what is said in the MS link and the post but would all of this change if I were to run 2 instances of SQL on a single SQL 2014 SP2 cluster? My servers have 2 processors, 8 cores each for a total of 16 cores. (32 Logical Processors) If I set MAXDOP on both instances to 8 will that be potentially assigning all processors to be accessed in parallel if a complex SQL operation were requested by both instances at the same time? Or does MAXDOP refer to logical processors and not cores? Thanks in advance!
Jason – generally speaking, we don’t recommend instance stacking:
Once you start packing clowns into the car, it’s less about performance and more about, uh, clowning.
Brent- Thanks for the reply. I generally do not install more than one instance of SQL on a cluster but in this case it was warranted for various reasons. So in my case where I need to install 2 named instances of SQL on one cluster and given the specs I posted earlier what would you set MAXDOP to? “My servers have 2 processors, 8 cores each for a total of 16 cores. (32 Logical Processors)”
Hey Jason, that’s tough to answer just from the specs.
Why don’t you try using sp_BlitzFirst to gather wait stats on your servers, and see what they’re actually up to?
You may find that you don’t need to change anything at all, or that CPU isn’t what you need to focus on.
Hi Erik… This is a new SQL server and I just wanted to set the initial MAXDOP to the best possible setting before we have load on the server. With the understanding that we may need to adjust it later. We usually set MAXDOP to 8 on our SQL servers that have the same specs with only one SQL instance. So that is my baseline for setting MAXDOP to 8. Thanks again!
[…] In the words of Brent Ozar https://www.brentozar.com/archive/2013/09/five-sql-server-settings-to-change/ […]
“Basically – on a SQL Server with two CPU sockets and a 4 core CPU that does hyperthreading, the maximum degree of parallelism option should be set to 4.”
So if I have interpreted this correctly on a 4 socket, 4 cores per socket setup we should set maxdop to……… 4?
Glenn – yep!
I have been reviewing this document from Microsoft that is mentioned in this article:
This document has been updated a few times after this post. For SQL Server 2016 it says:
“On a Server with multiple NUMA nodes, having less than or equal to 16 LOGICAL processors per NUMA node, keep MAXDOP at or below # of LOGICAL processors per NUMA node”
If I understand this correctly, the directive from Microsoft is to take into consideration the number of LOGICAL nodes (not physical ones) so this is different from what this posts pointed out in 2013.
Is this a case of outdated information or is it me, not interpreting the information correctly?
Ooo, that’s new! That’s cool that they added more info! Neato.
Hi. If I have SQL Server Dev edition on my lap top (16g memory), is this something that I need to be concerned about, since my laptop basically has just one CPU?