Blog

Memory Grants, SQL Server

New ways to measure your queries. And fish. — New ways to measure your queries

SQL Server 2012 SP3 has just been freshly released! Not only does this release contain bug fixes, it also contains a slew of performance tuning related features.

Many of these features are about memory grants. This is NOT the memory used in the buffer pool/ data cache by the query — it’s the memory also known as Query Workspace Memory. Start reading more here.

Check these out:

KB 3107398 – Improvements to sys.dm_exec_query_stats

This is kind of a big deal. The sys.dm_exec_query_stats DMV will now report on total, last, min and max statistics for granted workspace memory, ideal workspace memory, degree of parallelism, and reserved threads. These columns are in SQL Server 2016 CTP3, but it’s really awesome that they’ve been backported town to 2012!

KB 3107397 – “Actual Rows Read” Added to Query Execution Plans

This is really exciting! It’s hard for people to read execution plans. You may have a very large index seek or scan that reads a ton of rows, and it has a hidden filter in the seek or scan. But it may be hard to diagnose because you only see the number of rows to come OUT of the seek or scan operator. This lets you know not only the rows that exit the operator, but how many it had to consume, too. It’s sorta like having a little bit of STATISTICS IO in your execution plan!

KB 3107401 – New Query Hints for min_grant_percent and max_grant_percent

Previously, if you identified that a query was asking for a problematic memory grant, your options were pretty limited – code changes, index changes, resource governor, or server wide settings. These options let you ~~shoot yourself in the foot~~ tune this just for a single query. Check out the KB– there is a safety in the feature. If the minimum required grant is higher than max_grant_percent, it will still get the minimum.

KB 3107172 – hash_spill_details Added to Extended Events to Diagnose tempdb Spills, Improvements to sort_warning and hash_warning

Ever seen sort or hash spill warnings in execution plans and wondered how much of a difference they were making to the runtime of the plan? That may be less of a mystery with these improvements.

KB 3107173 – query_memory_grant_usage added to Extended Events

This XEvent allows tracing for the ideal memory grant for queries, what they actually got, and the degree of parallelism they used.

What About SQL Server 2014?

I don’t believe any of these are currently available for SQL Server 2014, but looking at past release patterns it’s likely they’ll be available soon.

PSA: Don’t Forget to Test Your Service Packs

Even the best laid plans can sometimes go awry.

Brent’s Bad Idea Jeans: Stop People From Changing Tables with This One Trick

Last Updated February 13, 2017

Bad Idea Jeans, Humor, SQL Server

Developers hate his bad idea jeans:

CREATE VIEW dbo.StopThemDead WITH SCHEMABINDING
AS
SELECT COUNT(*) FROM dbo.Table1
UNION ALL
SELECT COUNT(*) FROM dbo.Table2
UNION ALL....

CREATE VIEW dbo.StopThemDead WITH SCHEMABINDING

SELECT COUNT(*) FROM dbo.Table1

UNION ALL

SELECT COUNT(*) FROM dbo.Table2

UNION ALL....

Presto, the schemabinding option means no one can change the underlying tables. (At least, until they figure out about your view.)

Obligatory disclaimer: seriously, this is a really bad idea. But it’s hilarious.

Why is SQL Server Slow Sometimes – But Only Sometimes?

Last Updated November 12, 2015

Have you got a SQL Server that usually performs just fine, but every now and then, everything falls to pieces? Users complain about performance, and then minutes – or hours – later, the problem just mysteriously solves itself?

Here’s a few things to look for:

Are data or log files growing? When SQL Server needs to grow a file, it can bring all transactions in that database to a grinding halt. You can mitigate this issue by turning on Instant File Initialization, pre-growing out your data and log files, and using smaller autogrowth sizes. To figure out if this is your root cause and which cure makes the most sense, run sp_Blitz® and look for the warning about data and log file growths that have been taking more than 15 seconds.

Are queries being blocked? If one query takes out a lock that prevents other queries from getting their work done, it presents interesting symptoms. Your SQL Server’s CPU use might look low – but it’s because so many queries are sitting around idly waiting. To diagnose, start by running sp_BlitzFirst® with the @Seconds = 0, @ExpertMode = 1 parameters to check your wait stats since startup. If LCK* waits are significant, then start digging at the database level. Run sp_BlitzIndex® in each database, and look for the Aggressive Indexes warnings about indexes that are a frequent blocking sore spot. Consider dumping indexes you don’t need on those tables (to speed up deletes/updates/inserts), and adding the right indexes that you do need (to speed up other queries).

Sometimes I choose the wrong coffee tap, and I'm predictably slow. — Sometimes I choose the wrong coffee tap, and I’m predictably slow.

Are queries being rolled back? If a query fails – either due to being canceled, or running into an error, or hitting a deadlock – SQL Server has to undo their work. This may cause blocking, or it just may hammer storage. This one’s a little trickier to catch. If you run into it live, then you can see it with sp_BlitzFirst® – it warns about queries that are currently rolling back.

Did a bad plan get into cache? When you run parameterized SQL, SQL Server builds an execution plan based on the first set of parameters it sees. Normally, parameter sniffing is a good thing, but sometimes it goes awry. You can have a plan get pushed out of cache (due to memory pressure or statistics changes) and then get compiled based on the first – possibly unusual – set of parameters that come in. That plan might not work so well for other sets of parameters, and it can degrade performance on the whole box until a new plan is generated for it. You can learn more about solutions in this blog post about parameter sniffing, and to diagnose it, run sp_BlitzCache®. Look at the top resource-consuming queries, and see if an unusual one has suddenly jumped to the top of the charts.

Are shared resources under pressure? In virtualization environments, other guests may be using an increased amount of CPU, memory, or storage throughput, giving your guest less horsepower. In shared storage environments, other servers might be doing backups or disk-intensive workloads. In multi-instance servers, another instance on the same box might be the real culprit. To identify these, check out wait stats with sp_BlitzFirst®.

Looking back at what I’ve written, I’m sure someone’s going to say, “Brent, you’re just trying to pitch tools.” And sure, I am – they’re free. I want you to use the same powerful tools that we use every day in our consulting. Your job is hard enough without reinventing the wheel!

Is My SQL Server Too Big for Virtualization or the Cloud?

Last Updated November 10, 2015

Let’s set size aside for a second. Here’s the order in which you should try new technologies – whether it’s virtualization, cloud, storage, or whatever:

Development servers with less than 100GB of data
Production servers with less than 100GB of data
Development servers with <1TB
Production servers with <1TB
Development servers with >1TB
Production servers with >1TB

It’s simple: start with the lowest risk, easiest-to-manage servers first. Learn your lessons on smaller servers, then gradually use the technology on larger and larger servers.

If at any level, your users complain about performance and you can’t figure out how to fix it, stop at that level. Don’t go farther.

It’s not that 1TB of data is too big to virtualize or to move into the cloud – it’s not. But your skills may not be up to the task, and you need to sharpen those skills before you attack a larger performance problem.

It all comes back to my post, “How Many Databases Can I Put on a SQL Server?”

What kCura Relativity Best in Service Means for SQL Server DBAs

Last Updated February 9, 2017

kCura Relativity, SQL Server

kCura Relativity is a legal e-discovery application that hosts its data in SQL Server. You can host it yourself, or hire a hosting partner who specializes in hosting it for you. The best hosting partners compete for Best in Service status, an award that means what it says on the tin.

Starting this year, kCura’s toughened up the requirements for Best in Service. It ain’t Average in Service, or Good Enough in Service. As a result, database administrators have some extra work to do.

Best in Service requires a disaster recovery plan.

Deep, calming breaths. I didn’t say a warm standby data center – I just said a plan. It might be perfectly okay for your plan to be, “Step 1: panic. Step 2: call Brent to bring our databases back from the dead.”

Here’s the thing, though: at some point, one of your customers is going to ask to see the plan. (I know, because I’ve been involved in some of those conversations.) You don’t want to show them a plan that looks that way.

Use these resources to get started:

Our High Availability and Disaster Recovery Planning Worksheet, a simple 3-page PDF that lays out your RPO and RTO options.
RPO and RTO: The Letters That Get DBAs Fired
How Many Databases Can I Put on a SQL Server? It comes down to how fast you need to guarantee restores.
How Do You Manage DBAs? Measure Backups.

Best in Service Requires Transparency

kCura’s all about transparency, right down to the Trust portal that shows Best in Service partner service status. Today, publishing your scores is optional – but come summer, it’s going to be mandatory.

That means if you miss backups or DBCCs, the whole world’s gonna know.

And it’s going to cost your business real money.

I’m wildly excited about this because it brings business focus to a very important DBA job duty. It’s your job to make sure the data is safe and secure. Start measuring it and checking today before you have awkward management conversations this summer.

#FreeCon 2015 PASS Summit Recap

Last Updated November 1, 2016

We believe that when you’re at a community event, the most valuable thing isn’t what’s up on the screen. Instead, it’s building relationships with the people around you.

With that in mind, over the last five years, we’ve run a series of FreeCons – totally free networking events for the SQL Server community.

This year’s FreeCon at the PASS Summit in Seattle was our most ambitious yet. We rented out a movie theater for 50 folks and shared:

Identifying what you’re proud of now, and sharing that with those around you
Setting goals for yourself over your next year
Learning about personal branding
Understanding the difference between outbound and inbound marketing
Planning which inbound marketing tasks to do, in which order, and how long they’ll take
And of course, throughout the event, sharing all of that with those around you, and building meaningful lifelong relationships

We’re really proud of the results and the reactions:

Ed Leighton-Dick wrote: “For many years, I focused primarily on the technical aspects of my profession. That focus served me well, but I realized several years ago now that I needed to start developing some soft skills, also. Skills such as marketing have continued to elude me, but yet, they have become critical as I build my consulting practice. Thanks to Brent and Kendra for all their work in putting on the FreeCon, and thanks to all the rest of the attendees for a fantastic day – I learned a lot from all of you!”

Chris Yates wrote: “One thing that stuck out to me was the authenticity shown by both Brent, Kendra, and Doug along with the 50 or so of us who were asking questions.”

Woohoo..what an awesome #FreeCon by @BrentO & @Kendra_Little today..enjoyed and learned a lot !!Thank you! #summit15 pic.twitter.com/JXnPbf5d6A

— Anup Sivadas Warrier ?? (@AnupWarrier) October 27, 2015

One of the best Free(pre)-con ever in SQLPASS summit. Thank you @BrentO @Kendra_Little @thedouglane #freecon #Sqlfamily #summit15

— Dev Bethanasamy (@dev_b) October 27, 2015

Thank you @BrentO and @Kendra_Little for #FreeCon today. Great experience and a fantastic way to meet a bunch of the #SQLFamily.

— Kevin Feasel (@feaselkl) October 26, 2015

"Don't look down at obstacles, look ahead into the turn" – motorcycle advice turned into life advice from @Kendra_Little #FreeCon

— Cathrine Wilhelmsen (@cathrinew) October 26, 2015

Thanks to @BrentO @Kendra_Little & @thedouglane for a great #Freecon today. Lots of things to think about over the next few weeks.

— Lord Richard Douglas (@SQLRich) October 27, 2015

https://twitter.com/CraigPurnell/status/658804548941828096

And Richie Rump tried to follow along:

View this post on Instagram

A post shared by Richie Rump (@jorriss)

Thanks to everyone who joined us – you were really the secret to making this event a lasting success.

Interviewing: How to Test TSQL Writing Skills

Last Updated November 2, 2015

Execution Plans, Query Store (QDS), TempDB

The candidate at work — “Sure, I write great queries!”

We sometimes help clients interview DBAs and Developers for their team. Requirements vary depending by position, but sometimes it’s important to test a candidate’s TSQL skills.

While I can test candidates thoroughly on strategies for TSQL (when they’d use certain constructs, the pros and cons of various approaches), if what you care about is skills writing complex TSQL there’s one surefire test: a lab.

The first step is to pick your dataset and environment, then dream up the TSQL query you’d like your candidate to write. Next, test it with one of your own staff members and make sure the problem is suitable for the job you’re trying to fill.

But you may not want to give candidates access to your production, or even development data — after all, they don’t work for you yet.

Good news, there’s an easy way to do this online! Check out the Stack Exchange Data Explorer. You can assign the candidate a query to write using this tool online.

How to make it work:

This only works for in-person interviews. Because human nature.
You’re going to need a moderator and at least 30 minutes, because the candidate will need to figure out the schema.
A history of queries is kept in the tool. You’ll need to let them know they can’t search that history, and it’s up to you if you’re going to let them use Books Online (hey, we all use it in the real world).

If you don’t want to deal with the hassle of the query history online, you can always download a copy of the Stack Overflow database and set up your problem on a locally installed test instance of SQL Server, too.

The secret test: if your candidate likes writing TSQL, they’re going to love the challenge and it will energize them.

Happy interviewing!

The case for Query Store in tempdb

Last Updated December 4, 2020

Erik Darling

Query Store is so cool

Billed as a flight data recorder for SQL Server, the Query Store is a repository of execution plan information, like the plan cache, except a bit more actionable. And it has a GUI.

"Try looking into that place where you dare not look! You'll find me there, staring out at you!" — “Try looking into that place where you dare not look! You’ll find me there, staring out at you!”

You can read all about what it does and what you can do with it around the internet. You can be suitably impressed and enchanted by the promise of data that’s persisted between restarts, being able to quickly and easily address plan regression issues, and so forth.

I’m here to bum you out when you’re done with all that.

You can’t use it in tempdb

On tempdb? Whatever.

What? Really? With all the stuff that goes on there? Aside from the many selfish demo reasons I have for wanting it, I know lots of people who make tempdb their default database for user logins so they don’t go messing things up in the real world, or more fragile system databases. I’m also aware of several vendor apps that dump users into tempdb and then use fully qualified names to point them to application data.

I found this all out by accident. I had just nabbed CTP 3 of SQL Server 2016, and was settling in for an evening of staring contests with execution plans and new features.

That’s when ~~tragedy struck~~ I got an error message:

ALTER DATABASE tempdb SET QUERY_STORE = ON; 

Query Data Store cannot currently be enabled on tempdb

Msg 12438, Level 16, State 1, Line 1
Cannot perform action because Query Store cannot be enabled on system database tempdb.
Msg 5069, Level 16, State 1, Line 1
ALTER DATABASE statement failed.

ALTER DATABASE tempdb SET QUERY_STORE = ON;

Query Data Store cannot currently be enabled on tempdb

Msg 12438, Level 16, State 1, Line 1

Cannot perform action because Query Store cannot be enabled on system database tempdb.

Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

And yeah, it’s an official no-go

Oh, but I model and msdb are fair game? — Oh, but model and msdb are fair game?

I’m all disillusioned and stuff, man.

This is going to be frustrating out in the wild for a lot of people. Not just fancy-~~pants~~ camo shorts consultants who like to make demos where they treat tempdb like it owes them money. Real people. Who pay real money, for real servers.

Anyway, there’s a Feedback request to get this changed. Vote for it. Or don’t. Just don’t vote more than once, or if you’re dead.

My Favorite Moments from the 2015 PASS Summit WIT Lunch

Last Updated February 12, 2017

#SQLPass

The speakers are great, but the audience is great, too!

One of the great things about the SQLPASS Summit is the annual Women in Technology lunch. Not only does the lunch feature an interesting presentation, but you get to talk to men and women at your table about diversity, how to build better teams, and how we can help others.

Many of the best moments at conferences happen when you connect with your peers. The two things I found most fascinating at the WIT lunch this year came from my table and from audience questions.

We can give back. Kathi Kellenberger’s doing it!

I was lucky to sit at a table next to Kathi Kellenberger. She shared with me that she volunteers as a mentor for LaunchCode. Kathi volunteers as part of this free, non-profit program to inspire and teach talented people and help them land jobs in IT. She trains her group regularly in person and provides awesome, free online resources for CoderGirls.

I’m really inspired by the work Kathi is doing. Rock on, Kathi!

Harvard offers free tools to test for unconscious bias

Most of us like to think that we aren’t biased. But we may have unconscious tendencies that we don’t realize. An audience question led to Denise McInerney giving a shout out to Cindy Gross.

Cindy is on the Azure CAT team, and has recently begun presenting to people and encouraging us all to take free online tests to help us identify our unconscious bias as a first step to building better, happier teams.

Thanks PASS, for hosting such a terrific lunch. I’m already looking forward to attending next year.

SQL Server Features I’d Like To See, PostgreSQL Edition

Last Updated October 27, 2015

Erik Darling

BUT FRANCE HAS A FREE PONY

That’s right. PostgreSQL is basically free. The MIT license is like the Church of England of licenses. They do not care what you do.

But I care what they do! If you read the first part of this article, you saw some Oracle features that I wish SQL Server had. Over here, I’ll be talking about some cool stuff PG does that SQL doesn’t do. At least not ‘natively’. There are workarounds, but we’re still getting some circles run around us.

So here goes!

Unlogged Tables! Forget wrestling with minimal logging, which sometimes just doesn’t work, no matter which Trace Flags and hints you throw at it. You can just tell the engine you don’t care about this table and it won’t log any transactions for it. Yeah, put that in your ETL and smoke it.

Generate_series! This is one of those things that I used, and then spent hours playing with. You know all that crazy stuff we do with Tally Tables and Date tables? Yeah, PG users just use generate_series, and it spits out the range of values they want. It works with numbers and dates, and even handles intervals. It’s pretty wonderful. Sick burn.

Arrays! Between array_agg and UNNEST, PG offers simple ways to handle a couple things that SQL Server users spend a lot of time trying to hack together T-SQL and CLR methods to deal with. If you’ve ever been mortified and confused by those SELECT STUFF FOR XML PATH shenanigans we use to create lists/arrays, or read one of the bounty of articles and arguments about splitting strings in SQL Server, you’d probably buy features like this a steak dinner.

BRIN Indexes! As a guy who has dealt with a lot of large tables, and large indexes, something like this is really awesome. A Block Range Index basically stores high and low keys for each, you guessed it, Block Range within the index, and decides which blocks it needs based on those values. The upside is that indexes of this type are TINY compared to traditional indexes. It’s a lot more like scanning the statistics histogram to figure out which steps you need, and then retrieving those steps.

Multiple language stored procedures! Yep. Write a stored procedure in a language you’re comfortable with. Tell PG what the language is at the end of the stored proc, and it will use it. If you’ve ever used CLR, and struggled with .dlls and trustworthy and blah blah blah, this probably sounds like a dream.

MINDOP! Just kidding. But I do wish we had this. It’d be way more useful than, like, Service Broker, or Resource Governor, or Affinity Masking, or Priority Boost, or… I’m getting carried away. Apologies; Parallelism riles me.

Anyway, I hope you enjoyed these, and I hope that you’ll give other database systems a look. There’s some pretty cool stuff out there.

Free SQL Server Magnetic Poetry Kits

Last Updated October 25, 2015

We were looking at magnetic poetry words on a public bathroom wall (true story, insert TempDB joke here) and thought, “You know what would be really awesome? If there was one of these for databases.”

So we made ’em.

Each of us put together our own favorite database words, so we each have our own sheet.

View this post on Instagram

A post shared by Brent Ozar Unlimited (@brent_ozar_unlimited)

View this post on Instagram

A post shared by Brent Ozar (@brento)

To get one, just track us down at a conference or user group. This week, Doug & Kendra will be at the PASS Summit in Seattle, and I’ll be at SQL Intersection in Vegas. The more of us you meet, the more words you’ll be able to work with, since we’ll each carry our own sheets.

Share your favorites with us on Instagram or Twitter with the tag #SQLpoetry. We’ll pick out our favorites and send you a little something special.

Announcing sp_BlitzCache v2.5.0

Last Updated October 22, 2015

Jeremiah Peschka

Woohoo, it’s another half of the year which means it’s time for another version of sp_BlitzCache®. The last version came out on June 18th, and it’s been about 4 months. In that time frame, you have requested some new features, and they’ve slowly trickled into sp_BlitzCache®.

If you don’t want to wait, head over to the download page and get yourself an sp_BlitzCache®!

In no particular order:

Default Values

Raul Gonzalez pointed out that if you set all of the parameters to NULL, sp_BlitzCache® will fail. This happens by default when you script execution through SSMS. Why SSMS does this is beyond me, but there you have it. sp_BlitzCache® will now get really upset and fail to run with a nice error message if you set any of the following to null:

@top
@sort_order
@query_filter
@reanalyze

Yay, errors!

Expert Mode

You guys are smart. I mean really smart. To help you keep being smart, I added a bunch of columns to expert mode. Ready?

Compilation memory
Compilation CPU time
Compilation wall clock time
Query hash and query plan hash
Cached plan size
Serial desired and required memory

Lean Options

sp_BlitzCache® used to pull back a lot of data by default. I mean… a lot of data. This could cause execution time to go through the roof on busy systems and systems with big plans. The default value for @top has been reduced to 10. You can still set it back to whatever you want, but just know that you can change it back to the way it was before.

There’s also an option to @skip_analysis. This bypasses the XML shredding that goes on in the execution plans. When you skip the XML analysis, you won’t find out if you have plan warnings or missing indexes, but your executions of sp_BlitzCache® will be much faster and you’ll use less CPU. Use this on heavily resource constrained systems.

Speaking of XML – The XML checks were re-written to be smarter and require less CPU time. We still have to scan the CPU, but I’ve re-written the XQuery in ways that shouldn’t need anywhere near as much CPU time as it used to. The upside is that this also makes it easier to add more checks to sp_BlitzCache®.

Sorting Results

There are two big changes to how results are sorted. One for speed and one for functionality.

An additional sort order option has been added for recent compilation. If you want to sort by recent compiles, just run EXEC sp_BlitzCache @sort_order = ‘compiles’

Some sort orders have always been slow (I’m looking at you “executions per minute”). New changes have pushed the sort earlier int he query and, in testing, these changes make performance faster, especially on busy systems.

Making Big Numbers Smaller

There was a potential for the totals temporary table to have integer overflows. I fixed that.

Basically, there was a significant chance that you might run into problems where you’ve burned through more than 600 hours of CPU time (that’s 2,147,483,647 milliseconds BTW). To help you avoid errors when you’re actually using your CPUs, I added a bunch of math to make numbers smaller. No, really, that’s about all that I did.

In Review: SQL Server 2005 Waits and Queues

Last Updated February 13, 2017

Jeremiah Peschka

Wait Stats

Back in November of 2006, Microsoft released “SQL Server 2005 Waits and Queues” as a best practices article. Many people in the SQL Server community have referenced this article as a great place to get started with understanding waits in SQL Server. Heck, I’ve recommend it on a regular basis.

The last time I recommended it I told the recipient, “This is great, but only read it from page X to Y.”

After nearly 10 years, this paper is showing its age. As you read this critique, and if you read the paper, it’s important to consider the historical context. “SQL Server 2005 Waits and Queues” is the first of its kind for SQL Server.

High Level Overview

The idea of the article is that developers and DBAs should use the waits information to identify areas of slow performance in SQL Server (waits) and then associate that with performance monitor counters (queues). The authors imply a methodology that starts with waits, correlates through other data, and, if followed correctly, eventually arrives at a solution for your performance problems.

The core concepts of “SQL Server 2005 Waits and Queues” are solid; queries wait, SQL Server records the waits, you interpret the waits. The overall methodology that the authors propose is a good one, too. Gathering evidence and correlating the evidence with other signifiers is a sound way to approach problem solving.

The devil is in the details. And, in this case, the devil has a hangover.

In the Beginning…

The “SQL Server 2005 Waits and Queues” paper starts off benignly enough. There are places where it sounds downright revolutionary. Even today very few people consider things like:

We outline a methodology that considers performance from the perspective of application, SQL Server, and correlates this to the system or resource perspective.

The Purpose section goes on to mention “This paper outlines a best practice methodology and provides guidelines and thought processes to identify poor perfoming applications”. If “SQL Server 2005 Waits and Queues” stuck to defining process and high level guidelines, this paper would be gold. The process that the authors outline takes a top down approach that allows an observer to start at a system level, rapidly identify problem areas, drill down to the root cause, and then solve the problem.

This Is How We Do It

Immediately after the Purpose section, the authors dive into their performance methodology. Waits are dsecribed as the observed symptom of a problem and perfmon counters are the proof. Or, to look at it another way, waits are what the application observes and perfmon counters are resource waits.

In a stroke of genius, there’s a lengthy example where the authors explain how correlating between lock waits and perfmon counters can help an observer determine if there’s really a problem going on. The crux of the argument is that “associations or correlations allow us to determine relevancy to overall performance”.

Averages are all that we have to go on when passively observing SQL Server. But in many cases, averages are misleading. In the example given, the authors mention looking at average lock time over a given window (the use of averages comes up a lot in this paper). It’s important to remember that averages can be heavily skewed by outliers.

Remember, the methodology described in this paper is a good first glance, but you can’t use it to rule out a specific problem. You can only use this methodology to determine which areas of SQL Server are a performance problem.

The methodology section doesn’t skimp on details. It provides a description of the SQL Server query wait model (running, waiting, runnable) and examples of how a query can move through this.

Unfortunately, it’s in this section where the first hints that’s something isn’t right show up. “A low signal (where signal is less than 25% of the total waits) to resource wait ratio indicates there is little CPU pressure.” The problem isn’t with the statement; if queries are spending 25% of their time waiting to get on a CPU, you’ve got problems. The problem is that the statement comes with no context – there’s no information about why the signal:resource wait ratio is bad and what should be done about it. This problem persists throughout a lot of the paper, too. It’s assumed that the reader knows where to go for verification of the absolute ratios that are provided.

Blueprints

The authors of “SQL Server 2005 Waits and Queues” attempt to provide performance tuning blueprints for both OLTP and data warehouse applications. If you’ve ever build a house, you know that blueprints change the minute you go out in the real world and these blueprints are no exception. When I was reviewing this article, I scrawled in the margin “here’s where things go south for a while”.

The blueprints attempt to set up a list of scenarios that DBAs need to watch out for. These are typically based on hard and fast thresholds for perfmon counters and DMVs – more on that in a minute.

Although many people have pure data warehouse systems, it’s increasingly rare to find a purely transactional system. A pure transactional system is entirely single item select, insert, update, and delete; there are no reports. This was uncommon back in 2005/2006 and it’s even less common today – business users insist on viewing live data and we have no choice but to deliver. It’s left up to the reader to determine how to synthesize appropriate metrics based on the pure OLTP and pure data warehouse blueprints.

Here’s my first major issue with this article: hard and fast advice is rarely correct. The OLTP blueprint contains a huge amount of advice that could put users in a lot of trouble. In no particular order, developers will run into advice like:

Have three or fewer indexes on frequently updated tables.
Queries with more than 4 joins can be too normalized for high OLTP scalability.
Avoid scans on tables since a missing index flushes the cache.
Page Life Expectancy should always be over 300.
Page Life Expectancy should never drop by more than 50%.
Reads should take 4-8 ms without any IO pressure.

Take that in for a minute.

All of these statements are listed in the “SQL Server 2005 Waits and Queues” article. They’re all presented as hard rules to follow and that indicate performance problems. But there’s no evidence given for any rule being true, nor is there any condition given for re-evaluating them. The authors do provide a single line caveat “the values in Value column are good starting point. The actual values will vary.” At least their bases are covered for being wrong.

Computers have changed significantly in the last 10 years. SQL Server has seen 4 releases since this paper was published and we’re about to be on SQL Server 2016. The query optimizer has seen improvements, storage devices have seen improvements, and memory has seen improvements. When this paper was released, a dual core Xeon was a high end server CPU. In 2015 I have a quad core processor in my phone; it’s not uncommon to see 4, 6, or 8 core processors in modern servers.

As you read through the blueprints, it’s important to remember how things have continued to change in the SQL Server world. The first rule for data warehousing insists that data warehouses should have more indexes, not fewer. Whereas a lot of modern thinking on data warehouses runs counter to this and is focused on making sure disk throughput is fast enough.

Other gems from the data warehouse blueprint include:

Avoid plan re-use.
If top 2 sys.dm_os_wait_stats values contain IO, there’s a blocking bottleneck. (Never mind that a data warehouse will usually have more data than RAM and will need to read from disk)
Avoid big scans because a missing index flushes the cache.
“The time taken for plan generation of a large query is not nearly as important as having the right plan”.

It’s noble that the authors make these attempts, but the blueprint requires a significant amount of customization and modification to make it worth considering, much less using in production.

The real value of the blueprint comes from suggested correlations between counters and waits, but you need to have solid baselines and a good understanding of how different parts of SQL Server work in order to synthesize this into something meaningful.

By the way, this only gets us to page 16 of 94.

What’s Stopping You?

The next major section deals with finding the typical resource bottlenecks in a SQL Server. This is a pretty boring section, truthfully. It contains descriptions of problem areas (“memory”, “IO stalls”) and a sample query that will help the reader look into the problem.

The biggest specific complaint I have about this section is that the authors insist on usingPage Life Expectancy as a meaningful performance indicator.

Outside of that, there a glaring general concern – at no point does this paper tell the reader what to do with the information gathered by these queries. It’s left up to the reader to interpret IO stalls, memory pressure, missing indexes, and query plan problems. During a presentation I’ll sometimes answer an unforeseen question with “that’s an exercise left up to the reader”, readers of 100 page best practices articles come to an article expecting best practices, not homework.

It’s also in this section where vague guidelines start to creep in. Readers are instructed to “make sure that the [blocked process] threshold is not too low” – the default for this is 5 seconds, earlier in the article it is recommended that the blocked process threshold should be 30 seconds. But no guidance is provided to help a DBA determine the right threshold.

And, ultimately, that’s my chief complaint – there’s very little reasoning given around the numbers presented in this best practices article. It’s assumed that the reader knows SQL Server well enough to read between the lines, refine existing numbers, and synthesize additional performance metrics. When support information is provided, it’s often in the form of another best practices article. These vary in length from 20 pages to 100 pages – I don’t know about you,but the idea of reading War & Peace: Best Practices Edition over the weekend doesn’t sound that fun.

The “Good” Part

Starting on page 22, this best practices paper turns into a list of SQL Server wait types and correlating performance monitor counters. This goes from page 22 – 79. At that point, the “queues” portion kicks in and then you’re looking at performance monitor counters. I used to recommend that people read through the list of waits to get a better idea of what’s going on inside SQL Server.

Don’t use this part of the article.

Look at sys.dm_os_wait_stats. It has a much better explanation of individual waits.

Nowhere in this article do the authors explain which waits are ignorable and which aren’t.SOS_SCHEDULER_YIELD is given just as much weight as QPJOB_KILL. It’s left to you to determine if any of these waits are significant. Thankfully, we’ve got a wait stats script that will ignore many waits for you.

What’s worst of all is that some of the advice in this section is just plain crazy.

ASYNC_IO_COMPLETION – This is your SQL Server asynchronously writing to disk. To solve the problem, the authors suggest you check for memory pressure and bad query plans.
CXPACKET – The guidance for this wait type has been superseded by KB 2806535
DBTABLE – This wait is labeled “internal only”, but additional information is given on correlating it to performance problems. Should I even worry if it’s internal only? Can I even do anything about it?
Failpoint – if you made it to the Failpoint, you clearly haven’t reached your own internal fail point.
LATCH_x – SQL Server 2005 introduced sys.dm_os_latch_stats which helps a DBA determine which latches are causing problems. Does this paper point the reader to the right DMV? Nope. It just says “Possibly the most common case is contention on internal caches (not the buffer pool pages), especially when using heaps or text.”
LATCH_x + PAGELATCH_UP means you should partition the table, BTW.
If you see LCK_x you should check for memory pressure. Unfortunately, we’re left to our own devices to figure out how. If you’ve been taking this article at its word, you’re probably running for the First Church of Page Life Expectancy to find everything else.
OLEDB is explained to largely be caused by traces and linked servers. The mitigation, set up more traces, figure out where your client apps sit and whether or not they’re loading files, or query sys.dm_exec_requests to find out which SPIDs are seeing this wait. Nowhere does it mention stopping your traces and not using linked servers.
WRITELOG waits suggest looking in sys.dm_io_virtual_file_stats (good) and query plan output (bad). Raise your hand if you’ve ever successfully used a query plan to spot I/O problems.

Context switches are an important measurement of... something. — Context switches are an important measurement of… something.

Surely the queues section must be better… He said, hopefully.

Queue lengths are regarded as important. Disk queue length is especially spurious given modern storage technology. Formerly alarming numbers are now considered anemic – it’s not uncommon to see recommended queue depths as high as 128 or 256.
“Under typical circumstances reads should take 4-8ms” – they going on to remind you to check with your vendor. Which is good because 4-8ms of storage latency is either some kind of local solid state storage or else the time it takes your HBA to acknowledge a write.
Page Faults/sec – this metric is trash. It measure when your CPU can’t find a page in memory OR when the page isn’t in the cache the CPU thought it should’ve been in. This is normal. Also – you can’t control it. Even with all of your data in memory, your server will still show page faults occurring.
Pages/sec – this measures reading data to and from memory. Once again, this is junk. Microsoft PFE Clint Huffman debunked Pages/sec as a meaningful counter back in 2009.
Page Splits/sec is a counter to watch, right? Well, any addition of page to the end of an index is technically a page split. So is splitting a page. So we don’t know if these are good or bad, but they’re happening!
Context Switches/sec is mentioned as something to watch, too. According to the paper, if this goes over 20,000 there’s a problem and if it goes over 50,000 there’s a big CPU problem. One time I opened a program and it produced 33,000 context switches per second. Time to throw this laptop away.

At this point, I literally can’t even.

Conclusion

If you’re reading “SQL Server 2005 Waits and Queues” for the first time, read pages 1-7 and then skip to the conclusion. Ignore everything in between.

It’s important to consider this article in a historical perspective – it’s the first serious attempt at documenting a tuning process for SQL Server that combines perfmon counters and wait stats. In today’s context, the concrete advice from the article doesn’t hold up; but the concepts and broad methodology are still useful for identifying SQL Server problems.

If you are looking for guidance on tuning SQL Server using wait stats, more accurate information can be found in SQL Server Books Online and third party sites (like this blog).

The Five Stages of Dynamic SQL Grief

Last Updated April 3, 2019

Doug Lane

Development

Dynamic SQL can be an incredibly powerful tool in the pocket of a SQL Server developer. Developers frequently express amazement at the level of flexibility dynamic SQL offers, an astonishment at how quickly things get out of hand, and finally the humbling realization that such machinery nearly tore their limbs off. This process can be broken into five stages: The Five Stages of Dynamic SQL Grief. Here’s how to recognize where you, or someone you know, is at in their relationship with dynamic SQL.

Stage One: Denial

"Here's the flow control for our new switchboard proc." — “Here’s the flow control for our new switchboard proc.”

“THIS IS INCREDIBLE AND I’m going to use it everywhere!”

Stage one is where the developer will stare slack-jawed in amazement as the statement they just glued together like a words-cut-from-magazines collage not only compiles but returns results as expected. Suddenly, an entire universe of possibilities presents itself. What if we have one code path for admins and a separate one for power users? What if we just roll our INSERTs, UPDATEs, DELETEs, and SELECTs into one stored proc? That way we only have to make one proc per table.

But why stop there? What if we allow the user to pass in the name of the table they want to use? Then it’s one proc for the whole database! Oh this is going to rock people’s worlds when we roll it out.

SYMPTOMS OF DENIAL:

Over-exuberance
Disregard for/obliviousness to code performance, security, and the general lunacy of contemplating a universal stored procedure
Not yet tired of typing single quotes (‘)

Stage Two: Anger

“HOW MANY SINGLE QUOTES do i need?”

Stage two sees the developer having struggled to get their code to production due to a multitude of issues, including but not limited to:

Coding each of the many branches created by nested IF statements
Testing and debugging each branch of the code
Trial and error to figure out how many single quotes it takes to get the statement to compile

During this stage, the developer is irritable, but still resistant to the idea of simplifying anything. Their most important task is to get the code to run, and then ship.

If you see errors like these over the developer's shoulder, walk away slowly and quietly. — If you see errors like these over the developer’s shoulder, walk away slowly and quietly.

SYMPTOMS OF ANGER:

Refusal to believe complexity is part of the problem
Profanity-laced tirades about single quotes
Unwillingness to discuss the code with others
Disinterest in security flaws
Unfamiliarity with Erland Sommarskog
Occasional lower back pain

Stage Three: Bargaining

“WHAT DO I HAVE TO CHANGE TO GET THIS TO run?”

Stage three begins when the developer can no longer deny being overwhelmed, and they begin to simplify their code. They start looking for areas of obvious frivolity to remove in order for it to test successfully. It is a turning point; they have reached high tide of complexity and must send their wildest ideas back out to sea in order to make their code viable again.

SYMPTOMS OF BARGAINING:

Willingness to discuss code with others
Receptivity to feature-cutting suggestions
Continued disinterest in security flaws
Admission that they have been overly ambitious in the application of dynamic SQL

Stage Four: Depression

“WHAT WAS I THINKING?”

Stage four is the low point for the developer writing dynamic SQL. At this stage, refactored code now works as designed. However, the security issues which the developer refused to address are now at the forefront, and cause a second round of rewrites. Forcing the developer to undergo a second set of rewrites is the proverbial kicking them while they’re down, and their spirits are at an all-time low.

SYMPTOMS OF DEPRESSION:

A sudden interest in Erland Sommarskog
Realization that many of their other dynamic SQL patterns were ill-conceived
Extended work hours to fix their many, many bad ideas and security holes
Increased interest in large quantities of ice cream and Netflix

Stage Five: Acceptance

“I was a fool, but now I know better.”

Stage five is the rebound stage for a developer’s initial encounter dynamic SQL. The fog of depression begins to lift as they work through the second rewrite process. This may seem counter-intuitive. It is not the rewrite that lifts the spirits of the developer, but rather the transformation from uneducated caterpillar to educated butterfly that improves their mood. They now have clearer insight into the consequences of their code decisions. They realize dynamic SQL is not a universal translator for turning business logic into a single script, and they will not make that mistake again.

SIGNS OF ACCEPTANCE:

Willingness, if not eagerness, to discuss the topic of dynamic SQL
Close familiarity with Erland Sommarskog
Interest in other dynamic SQL references
Sense of humility about their code abilities
Return to nominal interest in ice cream and Netflix

If you encounter a SQL Server developer writing dynamic SQL, use this guide to identify which stage they are progressing through and treat them with empathy. They are going through a difficult time.

Introducing Our 2016 Scholarship Program for SQL Server Training

Last Updated February 12, 2017

Apply now – applications close Friday, November 6th.

Company News, SQL Server

You work with Microsoft SQL Server – maybe as a developer, report writer, analyst, sysadmin, or DBA.

You want to learn more about how to make it faster and more reliable.

But your company can’t afford training. Maybe it’s a charity that needs to save every dollar to make a difference in the lives of others. Or maybe it’s a non-profit arts foundation, or a small startup that’s barely making ends meet.

We want to hear from you. We’re going to pick 25 SQL Server professionals to get access to a tailored set of training classes that focus on issues relevant to each person’s pain points.

The fine print:

You must already have a job working with SQL Server.
You must work for a foundation, non-profit, charity, or similar company that’s doing good work. It can totally be a for-profit company, just as long as they’re making a difference. (If you work for Ginormous Profitable Global Corporation, forget it.)
Your company or government rules must allow you to receive free or discounted training. (Some companies prohibit their employees from accepting gifts.)
You can be anywhere in the world. (Some of our scholarships will involve in-person training, and some will be online video training.)

Announcing Brent’s SQLSaturday DC Pre-Con: Performance Tuning When You Can’t Fix the Queries

Last Updated April 9, 2017

Your users are frustrated because the app is too slow, but you can’t change the queries. Maybe it’s a third party app, or maybe you’re using generated code, or maybe you’re just not allowed to change it. Take heart – there’s still hope.

I do this every week, and I’ll share my proven methodologies to performance tune with indexes, SQL Server configuration switches, and hardware.

In this one-day workshop on Friday, December 4 at the Microsoft office in Chevy Chase, you’ll learn:

Simplified internals – how SQL Server turns your query into an execution plan
Wait stats solutions – a list of fixes for most common bottlenecks
Plan cache analysis – how to discover the most resource-intensive queries so you know if indexes and hardware will work
Monitoring metrics – what DMVs and Perfmon counters to monitor, and what thresholds to alert on
Hardware warning signs – how to prove your server is undersized for your workloads
Tools and scripts – the best free and paid tools to make your tuning job easier

This session is for developers, DBAs, and consultants who have to make SQL Server go faster. You should be comfortable writing queries and creating tables, but not as confident about interpreting SQL Server’s DMVs and diagnostic data. I’ll train you the same way I train my own clients.

This session will be demo-heavy, and you can play along. Bring a laptop running SQL Server 2005 or newer, with at least 100GB of free drive space. We’ll distribute the Stack Overflow demo databases on USB drives for you to restore, and you can follow along with the demos. You’ll get the load test scripts, run them against your laptop, and see how we solve various challenges.

https://www.youtube.com/watch?v=MMX9uh9Fn58

Space is limited to 100 attendees. Get your spot now for $199, and then register for SQLSaturday DC too. See you there!

SQL Server Features I’d Like To See, Oracle Edition

Last Updated March 18, 2016

Erik Darling

Oracle, SQL Server

BUT FRANCE HAS A PONY

I really like SQL Server. Most of the time. Okay, so most of the time I like SQL Server most of the time. Don’t get me wrong, if I had to go back through the career-time continuum and pick a RDBMS to work with, I’d probably still choose it over Oracle. Probably. And, because I don’t exclusively grow facial hair from my neck, I wouldn’t be allowed to choose PostgreSQL. They’d kick me off the mailing list.

Just kidding. You’re all handsome rogues. We could have had a nice life together, staring longingly into each other’s shoes and trying to implement parallelism.

I’d have DB2 here, but the cost of entry to the Developer Edition is rather steep. So, you know, I’m sure it’s great! But no. Though I would be really happy if Microsoft implemented ANSI Standard constructs into T-SQL half as fast as IBM does.

I have poked at Oracle and PostgreSQL a bit, and found they have some really cool stuff. Heresy, right?

Check out some of these Oracle gadgets and tell me they wouldn’t make your life a whole lot easier.

In no particular order:

Table restores! Built in! I’m very surprised we never got a feature like this. You can do it with a 3rd party tool like Dell LiteSpeed.

Adaptive Plans! Go to the link and read the second paragraph. Read it twice. Wipe the drool off your face.

UPDATE: Geoff Patterson has created a Connect item to get Adaptive Plans for SQL Server.

In-Database Row Archiving! You know all that stuff you do with partitions that Oracle already does better? Where you’re basically praying for partition elimination to not undo the two weeks of work you put in to setting up this partitioned table that developers are writing horrible MERGE upserts to? Yeah. You can just tell the engine to not pay attention to rows you don’t care about anymore when it accesses the index. Fancy that.

Bitmap Indexes! It’s kind of like a filtered index, except for all values of a highly non-selective column.

Materializing CTEs! Even though it’s undocumented, we use plenty of undocumented stuff in SQL Server to get the job done. This is really cool to me, since I’ve discussed this limitation in CTEs before. I’d love to see a way to do this in SQL with the same behavior; not having to create temp tables. It would be a nice way to get around issues with caching statistics for temp tables, and especially since MS is still fixing bugs around temp tables.

Are there more? Yeah, but this is a blog. Go grab a VirtualBox and read the documentation if you’re interested in learning more.

Training Plan for a SysAdmin to Manage Availability Groups

Last Updated April 9, 2017