You’ve got great ideas, don’t lie. We want to help you get those ideas out there… and into our scripts. It’s great to say that our tools (sp_AskBrent®, sp_Blitz®, sp_BlitzIndex®, and sp_BlitzCache™) are as good as they are today because of your help. We’re adding two new ways to make it easier to get a chance to share code and collaborate on some awesome tools.
Starting right now, you can head over to http://support.brentozar.com and submit new feature requests or vote on existing feature requests. You only get so many votes until we implement your feature, so make ‘em count!
This is also your chance to interact with us during the development of the feature.
A number of you have done this already! You’ve contributed a bunch of fixes, enhancements, and new checks for sp_Blitz®.
In the past you’ve had to email code contributions. We’ve made it easier – there’s now a form with a place for you to dump sample code. In fact, your sample code gets turned into a feature request in our version control system which, in turn, emails everyone. That really means we’ll get to your changes faster.
Ready to share your code? Head over to http://brentozar.com/contributing-code/ and get started.
Don’t freak out at the licensing agreement – the lawyers caught wind of what we were doing and said we should get you to sign it. All it says is that you’re the person who owns the code and you’re giving us the right to use your code; you still own the code.
What Are You Waiting For?
There’s an important rule for tuning stored procedures that’s easy to forget: when you’re testing queries from procedures in SQL Server Management Studio, execute it as a stored procedure, a temporary stored procedure, or using literal values.
Don’t re-write the statemet you’re tuning as an individual TSQL statement using local variables!
Where it goes wrong
Stored procedures usually have multiple queries in them. When you’re tuning, you usually pick out the most problematic statement, maybe from the query cache), and tune that.
You don’t always want to run all the other queries. Maybe some of them insert, update, or delete records. Or maybe they just take too much time to run while you’re testing.
How Local variables cause trouble
Let’s say the first statement in this stored procedure is the #1 query on our server.
IF OBJECT_ID('dbo.GetCities') IS NULL EXEC ('CREATE PROCEDURE dbo.GetCities AS RETURN 0') GO ALTER PROCEDURE dbo.GetCities @StateProvinceID int AS SELECT DISTINCT City FROM Person.Address WHERE StateProvinceID=@StateProvinceID; /* imagine lots of other statements here */ GO
We’ve found the statement’s execution plan in the cache:
I right click on the plan and find from the bottom of the XML statement that it was compiled for the value @StateProvinceID=80.
I want an easy way to test it, so I quickly modify it to run as a single statement, using a local variable:
DECLARE @StateProvinceID int = 80 SELECT DISTINCT City FROM Person.Address WHERE StateProvinceID=@StateProvinceID; GO
And I get a totally different execution plan! The original plan from the cache had nested loops, while now I’m seeing a nonclustered index scan:
This is confusing because I very carefully pulled the value it was compiled for from the original execution plan.
The problem is not the value I’m using for @StateProvinceID. The problem is not my statistics. The problem is local variables.
Local Variables are Weird
Local variables effectively “anonymize” the value being passed in on a parameter in SQL Server. It’s like you’re using Optimize for Unknown all the time.
That makes your local code behave differently than your stored procedure. It’s frustrating and confusing. It can lead you to create the wrong indexes, resort to index hints, or start using RECOMPILE hints everywhere.
Better Option: Temporary Stored Procedures
Here’s an easy alternative that not many people know about: Create a temporary stored procedure, like this:
CREATE PROCEDURE #GetCities @StateProvinceID int AS SELECT DISTINCT City FROM Person.Address WHERE StateProvinceID=@StateProvinceID; GO exec #GetCities @StateProvinceID=80; GO
Hey, look, my nested loop plan is back, just like I found in the execution plan cache:
This is much like a temporary table: it’s scoped to my session. It lets me quickly and easily test and execute this statement in the context of a stored procedure without modifying my application database.
I prefer this option for testing because it helps me remain conscious that I need to test the procedure with multiple values of the parameter and make sure it stays fast with common values.
(Note: I’m not saying you should use temporary stored procedures in your production code. These are just useful for the tuning process when you don’t want to modify the original application code.)
Another Option: Literal Values
If I’m in a hurry and I just need to tune the procedure for one value, I’ll simply do something like this:
SELECT DISTINCT City FROM Person.Address WHERE StateProvinceID=80; GO
By removing the parameter altogether I get a plan optimized for this specific StateProvinceID, and it’s my nested loop plan again:
Whenever I change the StateProvinceID value, I’ll get a different execution plan for each specific value.
Local Variables Sneak Up on You
Trust me: this is hard to remember out there in the real world. I’ve caught myself more than a few times rewriting a statement with a local variable, only to discover that I’m getting a different execution plan than I expected.
Whenever you’re rewriting a problem statement that’s run as part of a larger parameterized query, remember that temporary stored procedures can help your tuning process.
I sometimes hear database administrators say, “This database must be involved in our performance problem: it has too many connections.”
Lots of connections might cause a problem for your SQL Server, or it might not. The good news is that there’s a way to clearly tell if they’re dragging down performance or not.
Lots of connections can be confusing
If you use old built in tools like sp_who2 to see what’s happening in your SQL Server, many connections make it hard to see what’s going on. Firing up my test instance, I get back more than 45 rows from sp_who2 without any applications running at all. Sorting through that can be a mess.
For this reason, I’m not a fan of sp_who2. My usual question for the SQL Server isn’t “who’s connected?” I usually want to know other things:
- What’s making SQL Server wait right now? I like sp_AskBrent® to answer that.
- What’s running right now, and how long has it been running? I like sp_whoisactive by Adam Machanic to answer that.
Both of these tools help keep lots of connections from slowing YOU down as you look at your SQL Server.
How can I prove that lots of connections are causing a performance problem?
SQL Server can manage a lot of connections without too many issues. Yes, it’s better if an application cleans up its connections over time. We recommend tracking the User Connections performance counter, as patterns in this counter may coincide with other events and be useful information.
But just having a high amount of connections doesn’t prove that was the cause of performance problems. For a smoking gun as to why SQL Server is slow, look in two places:
- SQL Server Wait Statistics (look for THREADPOOL waits)
- The SQL Server Error Log (look for the creepy error below)
Threadpool Waits: a sign of trouble
When lots of connections want to run queries at the same time, SQL Server may need to allocate more worker threads. This process can be slow, and at a certain point SQL Server may reach the maximum worker threads allowed for your configuration.
“New queries have not been picked up by a worker thread”: Thread Starvation
When this gets really bad, you may see a nasty error message in your SQL Server Error Log. This message says something like this:
New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the “max worker threads” configuration option to increase number of allowable threads, or optimize current running queries.
The message neglects to mention a few critical things:
- Microsoft doesn’t recommend that you change max worker threads, as a general rule
- The amount of worker threads needed has a lot to do with your parallelism settings (and changing parallelism settings is far more common than the amount of worker threads)
- Periodic problems like locking and blocking can also drive up worker threads. Raising worker threads might just mask the real root cause of a blocking problem!
What to do if you have THREADPOOL waits or Thread Starvation
First, don’t panic. Avoid changing any settings before you know exactly what they’ll do.
Here’s where to start:
How’s your parallelism? If you have the “max degree of parallelism” setting at the default value of 0 (or a very high number), a good first step is to learn about CXPACKET waits and options for configuring parallelism.
Do you have lock waits? If you have high lock waits (query), look into which tables have the most blocking and how to track down the queries involved.
Have you identified the most frequently run queries in the problem period? Good indexing or improved TSQL for those queries can dramatically reduce needed threads. (Handy tools here.)
What do the perf counters say? Take a look at those performance counters we recommend to see if there’s a jump in the number of connections when the problem occurs or not.
Approaching the problem this way dispels the mystery of whether all those connections really are the problem (or not).
DBCC CheckDB: Frequently Asked Questions
Q: Do I really need to run DBCC CheckDB? Yes. If you’re even partially in charge of a SQL Server and you’d like to keep your job, you should make sure CheckDB is being run regularly. (Seriously!)
Q: But WHY do I need to run CheckDB? CheckDB helps alert you when data becomes corrupt. Data corruption can cause your users to get incorrect data, cause queries to fail, or can even take your entire SQL Server instance offline. Corruption can happen anytime. It could be caused by a problem in SQL Server, an issue with Windows, or an issue with your storage. It could also happen due to other types of software– something like a filter driver (replicating data, defragmenting drives, etc) or a virus scanner.
Q: How often do I need to run CheckDB? Every day you’d like to keep your job, you should run CheckDB. OK, maybe that’s a bit extreme.
Most people run CheckDB against user databases every weekend. That’s “normal”. Is normal good enough for you? Well, maybe not! When availability is important, people choose to run CheckDB more frequently: often once per day.
For system databases, why not check them every night?
Q: How can I make CheckDB faster? First, do you really need to make it faster? If you have a regular maintenance window where decreased performance is OK, that’s often good enough.
The best way to speed up CheckDB is to offload the work to a totally different server. Do this by automating a process where you regularly restore a full backup, then run CheckDB. This approach has multiple benefits:
- Tests your restores
- Doesn’t impact the production database (especially if they’re using independent storage)
- Secondary instance can be configured to use as a “warm standby” in case the primary server fails
And sure, you can potentially offload CheckDB work to a virtualized SQL Server instance. Just make sure that it has enough resources to run CheckDB fast enough.
Q: Are there any shortcuts with LogShipping, Mirroring, or AlwaysOn Availability Groups?
Bad news: None of those technologies mean you don’t have to run CheckDB against your main production database. The only valid way to offload CheckDB is to run it against a restored full backup.
In fact, AlwaysOn Availability groups may mean you probably have to run more CheckDB, not less.
Q: Do some features make CheckDB slower? Yep, some indexes check slower than others. Having nonclustered indexes on computed columns can make CheckDB crawl, as can having indexed sparse columns. If you’re using those features and CheckDB is painfully slow, you might consider disabling the nonclustered indexes before running the job– but that means writing custom code and making sure that you handle any error situations when the job fails. (Kind of gross, right? That could lead to a bad day. Head on back up and see “How can I make CheckDB faster” for a better answer.)
Q: What about “PHYSICAL_ONLY” or “NOINDEX”? Can I use those? Well, you can, but then you might not detect corruption as fast. That could be really bad, right? I’m not a fan of either of these options because they turn off valid features in CheckDB that could really save your bacon. I’d much rather offload CheckDB to another server by restoring a backup as describe above– it has much greater benefits and is even better for performance of your production database.
Q: Should I run CheckDB against TempDB? Yes– you should. CheckDB can’t do every single check against TempDB that it can against other databases (the output will let you know that it can’t create a snapshot), but it can still run a variety of checks against TempDB and it’s supported to run it. Don’t skip it! You should be checking master, model, msdb and tempdb, plus all your user databases.
Q: How can I automate running DBCC CheckDB? You can do this by using the Check Database Integrity task in a Maintenance Plan. You can also use a SQL Server Agent job and a free script to help with database maintenance.
Q: Is CheckDB all I need to protect me from corruption? Running CheckDB can help you find out about corruption, but you also need:
- Page verification settings for databases set to the right level. For anything SQL Server 2005 and higher, you want to use ‘Checksum’ for page verification.
- SQL Agent Alerts configured to let you know if corruption errors strike (even while CheckDB isn’t running).
- The jobs running CheckDB should notify you when they fail
- You need to be prepared! Practice responding to database corruption by learning to cause it, detect it, and correct it in a test environment. (You’ll learn how to do this in our “SQL Server Backup and Recovery, Advanced” course)
To allow alerts and jobs to notify you, you need to configure Database Mail, enable the database mail profile on the SQL Server Agent (and then restart the SQL Server Agent service), configure an operator, and then set the operator on the alerts and jobs.
Q: What if I find corruption? Corruption happens– that’s why this is so important. The first thing you should do is sign up for our online DBA Interview Training to find that next job. (Awkward laugh.)
All kidding aside (and that was a joke! stop updating your resume!), you must respond to the issue immediately. The first thing to do is to alert others on your team to the issue and make a plan. You need to make sure that a full CheckDB has been run against the database, that you’ve read the output of CheckDB carefully in the SQL Server Error Log.
Gail Shaw has posted an article with a great list of do’s an don’ts when you hit corruption here (login required, sorry). If the database is important to your business, I urge you to spin up a support call to Microsoft Support right away– even while you read Gail’s article. At around $500 USD Microsoft Support is your absolute best value when it comes to getting help right away for corruption. (Yep, I’m an independent consultant and I’m telling you that!)
Once you have the specific incident resolved, make sure you follow up. How long has the issue existed? Can root cause of the original corruption be identified? Could you have the issue again, or somewhere else in your environment? It’s rare for CheckDB to strike once and not repeat itself, so it’s important to follow up.
64GB RAM, no partitioning, no AlwaysOn Availability Groups, no encryption – what’s a DBA to do? Brent Ozar gives you a few easy workarounds in this 30-minute video.
Be aware that audio on this one sucks pretty bad – it accidentally recorded from the wrong microphone instead of our good professional ones. Sorry about that!
You may have read that you need to have more than one data file in SQL Server’s tempdb. This can happen even if you’re using blazing fast storage. If you create a lot of tiny objects in tempdb you may hit a bottleneck on special pages that SQL Server uses internally to allocate all those objects. For certain workloads, adding more tempdb files speeds up SQL Server.
Most database administrators aren’t sure how to tell when they need to add more tempdb files. Here’s how you can find out for yourself.
Tempdb stress: my virtual machine setup
I stressed out my tempdb on a virtual machine with 6GB of memory, 4 virtual CPUs, Windows Server 2012 R2 and SQL Server 2014. The virtual machine is backed by solid state storage. My tempdb was configured with a single data file.
How I ran my stress test against tempdb
I used ostress.exe. This command line utility lets you easily run a query multiple times across many threads. You can get ostress.exe for free by downloading the RML utilities.
Here’s the ostress.exe command that I called:
ostress.exe -E -d"tempdb" -Q"exec dbo.tempdbstress" -n5 -r300 -b -q
The dbo.tempdbstress procedure looks like this:
use tempdb; GO CREATE PROCEDURE dbo.tempdbstress AS SET NOCOUNT ON; SELECT TOP(5000) a.name, replicate(a.status,4000) as col2 into #t1 FROM master..spt_values a CROSS JOIN master..spt_values b OPTION (MAXDOP 1); GO
Finding GAM and PFS contention in tempdb with sp_AskBrent® and Wait Stats
While this is running, look at waits in two ways. In one session, take a sample of waits with sp_AskBrent®:
exec sp_AskBrent @ExpertMode=1, @Seconds=10; GO
I see PAGELATCH_UP, PAGELATCH_SH and PAGELATCH_EX waits:
I also have a lot of just plain old CPU wait (the SOS_SCHEDULER_YIELD), because I’m bashing at only 4 virtual CPUs (and I’m running ostress.exe on the same VM as SQL Server).
Finding GAM and PFS contention in tempdb with sp_whoisactive
exec sp_whoisactive; GO
This lets you see the contention if you catch it at the right instant. Bonus: you can confirm exactly what type of pages its occuring on! I can see here that this is explicitly PFS waits. “PFS” means “page free space”: poor tempdb is hitting a hot spot on one of those special pages just recording how much space is available and where.
And here I can see the GAM waits. “GAM” means “Global Allocation Map”. That’s another one of those special pages that can get hot and slow things down in tempdb. It’s tracking where objects are allocated. (There’s also “SGAM” or “Shared Global Allocation Map” pages that you might see.)
Both of these types of waits can be alleviated if you add more data files to tempdb, because then you’ll get more PFS pages and GAM/SGAM pages across the files.
How to tell if you need more tempdb data files
First of all, if you just have one tempdb file, consider adding more files as preventive medicine. Don’t go crazy with this. The formula recommended by Microsoft in KB 2154845 to use one tempdb data file for each logical processor up to 8 processors is a good place to start.
Look at waits that have occurred since startup. Do you have PAGELATCH_UP or PAGELATCH_EX waits? (Note that these don’t have an “IO” in them. PAGEIOLATCH waits are different.)
If you do, that’s not necessarily 100% proof that the issue is in tempdb, but it’s a good indicator. If you don’t, well, it’s probably not a big issue.
If you do find these waits, dig deeper:
- Identify when the PAGELATCH_% waits are growing the most, and dig into those periods with sp_whoisactive. You can log sp_whoisactive results to a table
- Look at your top queries. The queries that are part of the issue might be suitable for caching in an application tier.
Tempdb file configuration reminders
Remember: this only applies to tempdb data files. You only need one tempdb log file.
Also remember that the tempdb data files will be used proportionally with their size. This means that you should keep the files equally sized. You can use trace flag 1117 to force all tempdb files to grow at the same time when one grows, but this will apply that behavior to every database on the instance (not just tempdb).
You can add tempdb files without restarting the SQL Server instance. However, we’ve seen everything from Anti-Virus attacking the new files to unexpected impacts from adding tempdb files. And if you need to shrink an existing file, it may not shrink gracefully when you run DBCC SHRINKFILE. For best results, deploy the configuration first to your pre-production environments, use a change request, and perform the change in a planned maintenance window.
A frequently asked question is “How much memory does SQL Server need?” It’s difficult to figure out how much memory you should allocate to SQL Server.
Sizing Memory for a New SQL Server
Sizing memory for a new SQL Server application sounds like it’s going to be tricky. But there is one thing that makes this simpler: SQL Server Standard Edition.
SQL Server Standard Edition limits available memory; there are two options based on the version of SQL Server:
- SQL Server 2008 and earlier: You get unlimited memory. Rejoice in this, it’s about the old good thing your AMC Pacer of a database has going for it. Plan for an eye opener when you look at licensing changes.
- SQL Server 2008R2 – 2012: Put 96GB of memory in the server. There is a limit of 64GB of memory on these versions of SQL Server: just give SQL Server 64GB and walk away.
- SQL Server 2014 or newer: Put 192GB of memory in the server. SQL Server 2014 allows up to 128GB of memory for Standard and BI Edition. Give SQL Server the maximum and walk away. This can get more complex if you’re using SQL Server Analysis Services, but have you ever known me to talk about SSAS?
Sizing memory is more difficult with SQL Server Enterprise Edition. Experience with capacity planning makes it easier to size SQL Server. Let’s assume you don’t have a capacity planning team, what should you do?
Get the cheapest, biggest, DIMMs that money can buy. What does that mean? Well, there’s a significant price difference between 16GB and 32GB DIMMs (that’s memory sticks to you and me). For example, a Dell 820 can accept up to 24 DIMMs. What’s the cost look like?
- 16GB DIMMs - 384GB of system memory, $5,849.28
- 32GB DIMMs - 768GB of system memory, $19,779.60
- 64GB DIMMs - 1536GB of system memory, $108,310.32
The 32GB DIMMs fall into the “cheapest, biggest, DIMMs that money can buy” category. Besides, the cost of RAM is likely to be a rounding error compared to the cost of licensing SQL Server Enterprise Edition.
Sizing Memory for an Existing SQL Server
This is where things start to get tricky. There are a few different metrics available to measure the amount of memory that SQL Server is working with.
Remember, we’re trying to measure the size of the active working set of data. This is difficult to do, but there are a few easy ways that we can get most of the way there.
Sizing Memory with Page Life Expectancy
On a single CPU socket system, this is easy:
SELECT object_name, counter_name, cntr_value AS [value] FROM sys.dm_os_performance_counters WHERE LTRIM(RTRIM(object_name)) = 'SQLServer:Buffer Manager' AND LTRIM(RTRIM(counter_name)) = 'Page life expectancy' ;
If you’ve got more than one CPU socket (or you’re on a system with multiple NUMA nodes), then you might want to get a bit more specific. The previous query averages memory consumption across all CPU sockets/NUMA nodes. This query looks at each one individually:
SELECT object_name, counter_name, instance_name AS [NUMA Node] , cntr_value AS [value] FROM sys.dm_os_performance_counters WHERE LTRIM(RTRIM(object_name)) = 'SQLServer:Buffer Node' AND LTRIM(RTRIM(counter_name)) = 'Page life expectancy' ;
Why track the page life expectancy at all? PLE can be a simple indicator of memory and I/O health. If PLE ever decreases, then you know you’re reading from disk. That doesn’t mean the active working set of data is bigger than memory, it just means that you’re reading from disk at some point.
Track page life expectancy for a regular period of time. Make sure to measure PLE at the same time of day and at regular intervals. Graph your PLE and use that to determine trends about memory use. If PLE is trending down – you may need more RAM.
By the way, here are a few things that could need a read from disk (and decrease PLE):
- Rebuilding all indexes
Measuring Disk Activity with Performance Monitor
Perfmon is a great way to track physical disk activity. You don’t need to do any complicated math to determine if disks are being accessed. Once you’re ready to measure, fire up perfmon and add the following counters for every disk that contains non-system SQL Server data files; don’t include log file drives.
- Physical Disk \ Avg. Disk sec/Read
- Physical Disk \ Disk Reads/sec
- SQLServer: SQL Statistics \ Batch Requests/sec
Monitor these counters for at least a week, just like you should monitor PLE for at least a week. You don’t need to collect this information at a particularly fine granularity – even once an hour will do. Just make sure you take into account when regular maintenance (index rebuilds, DBCC, etc) is running so you can account for any reads and writes during those periods.
If you don’t see heavy read activity during business hours, then you can assume you’re not under memory pressure.
By including the
Batch Requests/sec counter, we can track disk activity to SQL Server activity. We don’t need worry about system level reads that happen when there’s no SQL Server activity. Also make sure anti-virus is configured to ignore SQL Server files.
If reads increase along side
Batch Requests/sec, then we know there’s some correlation – as user activity increases, so does disk activity. If you want to figure out the exact cause, you’ll need to get more invasive and use SQL Server Profiler or Extended Events.
For more information on collecting and analyzing perfmon counters, take a look at SQL Server Perfmon (Performance Monitor) Best Practices.
Measuring with the DMVs
We can get even more scientific using the DMVs. This is where things can get really tricky and it’s important to exercise a great deal of caution – not because the DMVs are inaccurate, but because they contain a great deal of information from many sources.
While we can use
sys.dm_io_virtual_file_stats to measure I/O against specific files, it’s important to take into account regular maintenance on those database files. Database backups (both full and differential) will register reads against the data files. In order to determine how many reads are happening against a file, we need a quick and dirty measure.
DECLARE @startup DATETIME, @tbb DECIMAL(38, 0), @total_reads DECIMAL(38, 0), @total_data_size DECIMAL(38, 0), @mb DECIMAL(38, 0) , @gb DECIMAL(38, 0) ; SET @mb = 1024.0 * 1024.0 ; SET @gb = @mb * 1024.0 ; SELECT @startup = create_date FROM sys.databases WHERE name='tempdb' ; SELECT @tbb = SUM(CAST(COALESCE(backup_size, 0) AS DECIMAL(38,0))) FROM sys.databases d LEFT JOIN msdb.dbo.backupset b ON d.name = b.database_name WHERE b.backup_start_date >= @startup AND b.type <> 'L' AND d.database_id > 4 ; SELECT @total_reads = SUM(num_of_bytes_read) FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs WHERE vfs.database_id > 4 ; SELECT @total_data_size = SUM(CAST(size * 8 AS DECIMAL(38,0))) / @mb FROM sys.master_files WHERE [type] <> 1 AND database_id > 4 ; SELECT (@total_reads - COALESCE(@tbb, 0)) / @gb AS [Non-backup reads (GB)] , ((@total_reads - COALESCE(@tbb, 0)) / @gb) / DATEDIFF(DAY, @startup, CURRENT_TIMESTAMP) AS [Non-backup reads / day (GB)] , @total_data_size AS [Total Data Size (GB)] ;
This query, while not exact, will at least take into account the reads and writes against the file but will exclude regular backups. If you’re performing index maintenance, that index maintenance will also be included. There’s no easy way to measure the load of index maintenance apart from taking a snapshot of
sys.dm_io_virtual_file_stats before and after the database maintenance occurs.
Again – this method isn’t exact, but it will give you a better understanding of the physical I/O requirements of your server since the last SQL Server start up.
There’s one other thing to keep in mind – if you have the right indexes in place, SQL Server won’t need to read from disk and you won’t need to keep the entire database in memory. Adding memory until you don’t perform additional I/O is one way to solve the problem, but tuning indexes is another. Check out sp_BlitzIndex® for ways to tune up your indexes.
Summarizing our Sizing Journey
For new SQL Servers – fill it with memory.
For existing SQL Servers there are three ways to check memory:
- Page life expectancy.
- Measuring disk activity with Performance Monitor.
- Measuring disk activity through the DMVs.
None of these are precise – they all miss out certain key measurements – but they all give us a good enough idea without requiring heavy weight monitoring.
Which SQL Servers should you virtualize first? How should you license them? How many virtual CPUs and how much memory should you start with? How do you know when that’s not enough, and how should you react?
I cover these questions in this 30-minute webcast video:
We’ve got more virtualization best practices too.
If you’ve been following the blogosphere, or heading to SQL Saturdays, you’ll know that many folks are excited about the brand new SQL Server cardinality estimator (CE for short). As Kendra showed in her article The SQL 2014 Cardinality Estimator Eats Bad TSQL for Breakfast, the new CE provides dramatically improved query plans.
What if you want to know more about the new CE? What’s your option? Well, a lot of bloggers, presenters, and crazy people (I’m looking at you Paul White), will observe how the CE behaves by building carefully contrived scenarios and verifying the outcome. We can observe the CE in the same way that we observe the optimizer – by observing the side effects.
What if you want to know why something happened? To get an answer to that, you’d have to go directly to the source. Thankfully, the source has published their work in the paper Testing cardinality estimation models in SQL Server. Warning – this paper does cost money if you aren’t a member of the ACM.
Why would you want to go and read an academic paper like this one? Well, it contains a lot of juicy information about what the SQL Server product team considered in terms of possible optimization paths for the CE. One item of interest is the concept of a simple join optimization. To summarize – the simple join ignores histograms and assumes that all distinct values from the child are contained in the other table. This could cause some query estimation issues, but there will be faster optimization. The authors note that for some workloads, this will be much more desirable.
If you’re not up for an academic paper or two, Joe Sack has put together an in depth look in Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator. It’s well researched, peer reviewed, and based on observable verifiable information that’s in the product right now.
One of the questions we ask every new client is, “How much data is this application allowed to lose, and how long is it allowed to be down for?”
This leads to some really fun conversations.
The business users assume that the database will never lose data and cannot go down. The IT admins assume that the business understands all systems can lose data, and that the business will be okay with some data loss during a crash.
To get the business users and the IT team onto the same page, I had to build that page for them to get onto! And here it is:
Step 1: the it team fills out the current state.
We start by filling out the “current” column for each of the three scenarios, documenting how much data the business could lose in the event of an outage, and how long the system could be down.
Say we’ve got a single physical standalone SQL Server, and we’re doing transaction log backups every hour to a network share. That means we could lose at least an hour’s worth of data, and be down for several hours (if not a day). To get the system back online, we would need to:
- Provision hardware and storage space
- Install the right version/edition of Windows, plus patches
- Install the right version/edition of SQL Server, plus patches
- Set up the same sp_configure options (maxdop, any trace flags, etc)
- Set up security (users, passwords)
- Restore the databases
Often I find that just getting everybody to agree on the “current” column risks can take an hour, and the business users are horrified by how much work is involved. It’s nobody’s fault – this is just the current state of the system.
Step 2: The business users set a goal.
They’ll say, “We’re not allowed to lose any data and we’re never allowed to go down”, and that’s where page 2 of the worksheet comes in:
Down the left side, pick how much data we’re allowed to lose.
Across the top, pick how long we’re allowed to be down.
Where they meet, that’s a rough cost of the solution, and the list of options that can meet the business goals. These costs are wide ranges that give the business a rough idea of what it’ll take to achieve their objective – don’t get hung up on the exact dollar amounts. The business users just need a rough idea to help guide their choice.
I strongly, strongly encourage business users to pick one of those 4 categories, not something like 15 minutes or 4 hours. SQL Server’s high availability and disaster recovery methods line up really well with these 4 buckets, and if they choose 15 minutes, they’re not really doing me any favors – I’m going to need to design a 1-minute solution.
Step 3: The IT Team designs a solution to meet the business goals.
In our How to Be a Senior DBA training class, we coach DBAs on:
- How to get real answers in writing from business users
- How to set the “IT Goal” column of this worksheet
- How to give the business users the right recommendation
- How to handle cost objections
- How to build a solution that meets the business users’ needs
When I was a DBA, that was the hardest part of my job. The business kept telling me, “You’re not allowed to lose data and you’re not allowed to go down, but you don’t have a budget.” Now, in our classes, we arm DBAs with worksheets and guides to go toe-to-toe with the business. You look like a pro because you get to leverage all these cool worksheets that we built for our own consulting projects.
You can grab this planning worksheet plus much more in our First Aid resources. Enjoy, and hope it makes your next HA/DR project easier!