Are your CPU-intensive queries running slowly? Check your VM type.

When you build a VM for SQL Server in Amazon EC2, Azure VMs, or Google Compute Engine, it’s all too easy to overlook the CPU speeds that you’re getting. Most of the time, it doesn’t really matter because SQL Server isn’t usually CPU-bottlenecked: your queries are doing table scans with 16GB RAM, starving for IO throughput.

But if you’re seeing high CPU usage, or you’re seeing SOS_SCHEDULER_YIELD waits on your queries, stop to check the VM type you used when building this VM. Once you’ve found it, the easiest ways to look up your CPU speed are:

In AWS, for example, I run across a lot of customers in the 2xlarge tier: 8 cores and 61GB RAM. Most of the CPUs in that price range are tragically slow:

From AWSinstances.info

That means if you’ve got queries that suffer from a CPU bottleneck, and they’re not able to go parallel (like because they call scalar functions), you’re not going to have a great time with these instance types.

To pick a better instance type, do a little more digging. EC2instances.info shows some clock speeds as “unknown” when the processors are custom silicon built for Amazon:

Top secret

Top secret

To learn more about those models, hit the AWS documentation pages:

  • r5.2xlarge: custom Xeon with “sustained all core Turbo CPU clock speed of up to 3.1 GHz”
  • z1d.2xlarge: custom Xeon with “sustained all core frequency of up to 4.0 GHz”

The prices aren’t much higher than the instance types you’re already running, either. In fact, it’s been just the opposite for me on two clients in a row! Both had built their SQL Server VMs a couple years ago, and newer instance types not only gave them faster CPU speeds, but reduced costs as well. On a mature application, once it’s been up for a year or two with a stable user base, it’s easy to measure the SQL Server’s performance needs to figure out if it’s constrained by CPU, memory, storage, locks, or something else. Then, you’re better equipped to build the right VM type to save yourself money while making queries go faster.

And before you say, “But I wanna tune the code and indexes!” – you can totally do that too. Why not both? (Especially if changing the VM type saves you money.)

This is also a good reason to touch base with management to see if you’re using reserved instances, and if so, when those reservations expire. If you’ve had a reservation for more than a year, it’s probably time to revisit the latest instance types for a free speed boost.

Pop Quiz: SQL 2000 Bug or SQL 2017 Bug?

2002: Me and Bailey, mom’s Great Dane

When I was thinking about this site’s 17th birthday, I started wondering: back in 2002, what bug might I have been working with in SQL Server?

I started reading back to the list of bugs fixed in SQL Server 2000 Service Pack 1, and it hit me…these bugs aren’t all that different than the ones we’re facing today.

Before you click on each bug, guess whether it was in SQL 2000 Service Pack 1, or last month’s SQL Server 2017 CU14. When you click on each one, you’ll see the release notes for it. (Unfortunately, the 2000 bugs all point to the SP1 page – the individual bug pages have been lost to the sands of time.) Good luck!

  1. CREATE for existing object with IDENTITY column causes duplicate IDENTITY values
  2. Statistics maintenance causes significant bottleneck on SQL Servers that use 16 or more CPUs
  3. Assertion failure occurs when you try to back up database in limited disk space
  4. TCP Timeout or login time-out error occurs when you connect to SQL Server using Integrated Authentication
  5. Error removing log shipping on secondary when database name has a quote
  6. Upgrade fails with an error when sysadmin account “sa” is renamed
  7. SQL Server resource DLL may fail to log to the event viewer in a cluster
  8. Configuration option network.enablekdcfromkrb5 now set to false instead of true
  9. SELECT may not return all rows if it contains a large number of values in an IN clause on a NUMERIC column
  10. Query results are not as expected when you run a particular query from Excel
  11. Complex DISTINCT or GROUP BY query can return unexpected results with parallel execution plan
  12. “Non-yielding” error occurs when there is a heavy use of prepared statements
  13. INSERT statement incorrectly conflicts with constraint
  14. DBCC STACKDUMP doesn’t generate dump file
  15. Combination of multiple EXIST, OR, and subquery clauses may give a sub-optimal plan (you know, I’m gonna go out on a limb and say that still happens)
  16. Incorrect record is deleted or updated when a clustered index is in descending order
  17. Assertion occurs when a parallel query deletes

Let me know how you did in the comments.

Updated First Responder Kit and Consultant Toolkit for April 2019

This month, lots of improvements and bug fixes, including checking for paused resumable index operations, Evaluation Edition expiration dates, sp_BlitzCache shows the number of duplicated plans for each query, sp_BlitzIndex will skip a list of databases for you, and more.

To get the new version:

sp_Blitz Changes

  • Improvement: check for First Responder Kit stored procs that are out of date (in case you’re updating sp_Blitz, but not its relatives.) (#1994, thanks JeffChulg!)
  • Improvement: check for resumable index operations that have been paused. (#2010)
  • Improvement: warn on Evaluation Edition expiration dates. (#2017)

sp_BlitzCache Changes

  • Improvement: shows the new air_quote_actual plans from sys.dm_exec_query_plan_stats on SQL Server 2019 and Azure SQL DB. (#2019)
  • Improvement: show the number of multiple plans for a query in the Warnings column, like “Multiple Plans (105)”. (#2026)
  • Fix: arithmetic overflow when calculating terribly bad total reads for a plan. (#2018, thanks IanManton for the bug report.)
  • Fix: Azure SQL DB wasn’t allowed to sort by memory grants or spills because we were checking version numbers rather than sys.all_columns to check if the necessary columns exist. (#2015)
  • Fix: the output table wouldn’t get created when called with @SortOrder = ‘all.’ The table would get populated if it already existed, it just wouldn’t get created for the first time. (#2009)

sp_BlitzFirst Changes

  • Improvement: when we call sp_BlitzCache in the 15-minute Agent job for the Power BI Dashboard, we now run it with @SkipAnalysis = 1, @SortOrder = ‘all’. This runs faster (because it doesn’t analyze the plan XML looking for anti-patterns), and simultaneously gets you more query plans because we sort by all methods, every time. (#2009)

sp_BlitzIndex Changes

  • Improvement: new @IgnoreDatabases parameter lets you skip a comma-delimited list of database names. We use that same logic to populate databases with over 100 partitions, which means @GetAllDatabases = 1 works normally, but skips databases with over 100 partitions (and shows them in the output list as to why they were skipped.) To examine those, use @BringThePain = 1. (#1944, thanks Richard Hughes for the idea and Rich Benner for the starting point for the code.)

sp_BlitzLock Changes

sp_BlitzQueryStore Changes

  • Fix: fixed error “The multi-part identifier “s.is_cursor” could not be bound.” (#2024)

sp_ineachdb Changes

  • Fix: now it actually does stuff even if you don’t specify @Help = 1. I tell you what, people, this is how I can tell you’re not using this proc yet, hahaha. (Pull #2035, thanks Aaron Bertrand.)

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. If you need a free invite, hit SQLslack.com. Be patient – it’s staffed with volunteers who have day jobs.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes us!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

SSMS v18 and Azure Data Studio: The Golden Age of SQL Server Tooling

Yesterday, SQL Server Management Studio 18.0 shipped. Here’s the official announcement, the download page, and the release notes.

Yes, it’s still free, and yes, they’re still adding features. Yes, they removed database diagrams and the debugger. (Update Apr 27: if you want the debugger, use Visual Studiodownload.) No, SSMS 18 doesn’t run on Windows 8 or older. Yes, it still has bugs (all software does), but they’ve been working hard on making it less buggy, as evidenced by the release notes:

“Crashes And Freezes Into Me” is the name of my Dave Matthews Band cover band

You, dear reader, are so lucky. Every time there’s a new release of SQL Server or SQL Server Management Studio, you can grab the latest version of SSMS and keep right on keepin’ on. Your job still functions the same way using the same tool, and the tool keeps getting better.

And it’s free. You don’t have to ask the boss for upgrade money. You can just download it, install it, and take advantage of things like the cool new execution plan est-vs-actual numbers (which also cause presenters all over to curse, knowing that they have to redo a bunch of screenshots.)

I spend a lot of time jumping back & forth between SQL Server and Postgres, and lemme just tell you, the tooling options on the other side of the fence are a hot mess. It’s a sea of half-ass, fully-expensive tools with a mishmash of feature coverage. Pick one, and you just pray that the maker keeps updating it and adding new features. Sometimes they don’t. Often, they don’t.

Microsoft isn’t just building one good free tool.

In April 2019, most people who work with Microsoft SQL Server should be using SQL Server Management Studio (SSMS.) It’s free, very robust, and it’s easy to find tutorials that show you how to do what you want to do.

However, for some of us – a very small minority, and I’m in that group – mid-2019 marks the time to add a new tool to your toolkit. Azure Data Studio (ADS) is kinda like SSMS, but for developers. It’s less about management, and more about code authorship. As of this writing, it’s best suited for people who:

  • Use Macs or Linux as their daily workstation
  • Query both SQL Server and PostgreSQL
  • Don’t need to tune those queries to make them go faster
  • Need to source-control queries, say with Github
  • Want to share some queries in an easy-to-follow-along format (Jupyter notebooks)
  • Don’t mind using a less mature, less documented, rapidly changing tool

I’m being exceedingly careful here about marking a narrow audience. People who fall slightly outside of that audience – say, Windows users who need to performance tune queries – may also use ADS, but have a less satisfactory user experience. They might download & install it, use it a few times, and go, “What the hell, this tool is garbage.” The tool just isn’t ready for those users yet – maybe it will be in time, but right now, let’s focus on who it’s ready for.

Azure Data Studio is ready for me.

I’ve used Macs for over a decade, and for the last couple of years, I’ve been splitting time between SQL Server and PostgreSQL. Recently, I’ve been spending more and more time in Azure Data Studio accomplishing my work on both databases.

You still won’t see me using it in most of my training classes or presentations yet. As a trainer, I need to meet my students where they are, and the vast, vast majority of the SQL Server world is still in SSMS – and that’s totally fine. I’m not on a mission to migrate you.

If you have job duties like me, go give Azure Data Studio a shot, and if you need help, join the #azure-data-studio channel in the SQL Server community Slack.

If you’re more mainstream, go get the new release of SSMS 18.

Either way, it’s free, and you’re lucky.

BrentOzar.com is Turning 17 Years Old.

I registered it before then, and it used to be a static HTML site, but I consider the birthday to be the first published blog post from May 7, 2002. A few fun stats:

2002 Brent: at home with a Thinkpad with a webcam module on top, reading HAL-PC Magazine

To celebrate, take 50% off your first year with coupon code Happy17th:

It’s good for the next 17 days. Here’s to another year of fun!

Fixing Performance By Throwing Cloud At It

Your databases are hosted in the cloud – either in VMs or in a database-as-a-service – and you’re having a performance emergency that’s lasted for more than a day. Queries are slow, customers are getting frustrated, and you’re just not able to get a fix in quickly.

Just ask management if they want to throw hardware at it.

Here’s what your email should look like:

Right now, everyone’s unhappy with performance, and I’m in the middle of researching the issue. I’m not going to have a definitive answer – let alone a solution – for at least a few days.

In the meantime, do you want to spend $1,500 per week to temporarily increase performance?

If so, I can change our VM from an r5.4xlarge (16 cores, 128GB RAM) to an r5.24xlarge (96 cores, 768GB RAM.) Our VM costs will go from $293/week to $1,757/week. Those costs don’t include licensing because I’m not familiar with how we’re licensing this VM – that would be a separate discussion for Amy in Engineering, who manages the licensing.

Let’s buy stuff

The email is short and to the point because:

It ignores blame – I don’t care whose fault it is, whether it’s my own fault for being unable to fix it faster, or a growing customer base, or bad code, or maybe the database server should have been larger in the first place.

It doesn’t include an end date – maybe I’m going to be able to solve it in 3 days, but maybe it’ll take a month. This open-ended temporary solution buys us breathing room to do a really good job on the fix rather than duct taping something crappy.

It doesn’t quibble over VM sizes – because during an unforeseen, unsolved performance emergency, you simply don’t know which exact VM size is the best bang for the buck. Just go with the largest size possible, end of story. If someone wants to quibble over sizes, then they’re welcome to do the research to figure out what the right size is. You are not the person to do that research – you need to keep your head down solving the problem. If you don’t even know the root cause and the software fix yet, then you certainly can’t make an exact determination on which VM size will give you the right fix.

It translates technical debt into real dollars – because sometimes the business makes the conscious decision to ship less-than-optimal code in order to bring in revenue. Sometimes, they also want the ability to continue shipping that less-than-optimal code, and they’re willing to spend money to keep bringing in that revenue. That might frustrate the anal-retentive perfectionist in me that wants every line of code to be flawless, but the reality is that sometimes we just need to keep shipping and taking customer money.

It gives the business a choice – and choices help take heat off you. When a manager is screaming at me that we have to fix things faster, I love being able to say, “It sounds like this is really important to you, and you want to move quickly. Here’s a way you can make that happen right away. Oh, what’s that you say? You’re not really that interested? I see. Well, in that case, I’m not going to work 9AM-9PM 6 days per week to fix this, because I have a family at home.”

When your company chose to migrate to the cloud, I bet one of the reasons was to gain flexibility. Let that be your friend.

Free SQL Server Load Testing Tools

So you wanna run a load test against SQL Server.

One thing I need you to understand first: you have to provide the database and the queries. Almost all of the tools in this post, except the last one, are designed to help you run queries, but they don’t include the queries. The whole idea with load testing is that you’re trying to mimic your own workloads. If you’re just trying to test a server with generic workloads, start with my post, “How to Check Performance on a New SQL Server.”

Single-Query Load Tests

These tools let you write a query, and then run that query lots of times to stress load. Of course, this doesn’t really mimic production workloads, which will run lots of different queries concurrently. You can kinda mimic that with my technique to call random queries from a stored procedure, but even with that, we’re talking about a pretty low-tech technique here.

Microsoft RML Utilities and OStress (downloadinstructions) – very barebones command line tool that lets you run a query across a number of sessions, for a number of threads. No graphical user interface here – it’s very stripped-down and non-intuitive, but that’s also kind of the draw: it’s fast and simple.

(Open source) SQLQueryStress (downloadinstructions) – graphical utility originally written by Adam Machanic, and now maintained by Erik Ejlskov Jensen. I find this way easier to use, but it’s also slower than OStress. When you have a really well-tuned workload that returns hundreds or thousands of rows per query, it’s not unusual to hit ASYNC_NETWORK_IO waits with SQLQueryStress when it’s unable to keep up with digesting the results, something I don’t see as often with OStress.

Commence the load test

Load Replay Tests

These tools capture a workload (either from production, or from a group of users clicking around in dev/QA), and then help you replay that same workload again and again, either on the same SQL Server or different servers.

(Open source) WorkloadTools (download and instructions) – Gianluca Sartori’s command line tools to collect a workload (either with tracing or XE), analyze the data, and replay the workload in real time. If I was a full time DBA in charge of SQL Server load testing at a company, this would be the one I’d start with.

Microsoft Profiler – (built in) – in theory, you can use Profiler to capture a trace of queries in production, and then replay that trace against another server. In practice, it doesn’t work well because it doesn’t accurately mimic the concurrency issues you hit in production. I don’t recommend this as a load testing solution.

Microsoft Distributed Replay (instructions) – this tool aimed to solve Profiler’s weaknesses by capturing a Profiler-style trace, but then replaying that trace in a coordinated fashion across multiple load test servers, all aimed at the same SQL Server. Before you go too far down this road, watch Gail Shaw’s Pluralsight course on it, and read the tales of woe in the Microsoft forums. The learning curve on this is pretty steep – think days and weeks of setup & experimentation time to get this working really well, not hours.

Microsoft Database Experimentation Assistant (downloadinstructions) – this builds atop Distributed Replay to let you run A/B tests with different server settings, indexes, even SQL Server versions, and then tells you which queries got better (and which ones got worse.) The drawback here is that you gotta get Distributed Replay working before you can progress to this point.

(Open source) HammerDB (downloadinstructions) – can be used either to run a standard synthetic workload (like a TPC benchmark), or your own custom queries. This one can test not only SQL Server, but also MySQL, Oracle, PostgreSQL, and more. I would only go down this road if I was a performance tuning specialist in a large company with lots of databases – this thing is really specialized.

Regardless of which tool you choose, remember to use the tips from my post, “How to Load Test a Database Application.”

15 Reasons Your Query Was Fast Yesterday, But Slow Today

SQLSaturday Israel 2019 speaker shirt

In rough order of how frequently I see ’em:

  1. There are different workloads running on the server (like a backup is running right now.)
  2. You have a different query plan due to parameter sniffing.
  3. The query changed, like someone did a deployment or added a field to the select list.
  4. You have a different query plan due to a statistics change.
  5. You’re getting a different share of hardware (perhaps someone changed a hardware setting, introduced a noisy neighbor VM, or you have hardware competition in the cloud.)
  6. Someone added or removed an index.
  7. Someone added or removed a trace flag or other server-level config setting.
  8. Your application itself changed, and it’s not able to digest the results as quickly.
  9. Someone patched, and it had an unexpected side effect.
  10. You have the same plan, but different memory grants.
  11. Someone’s modifying more rows at a time, so you’re hitting lock escalation.
  12. Your query is being throttled by Resource Governor.
  13. It never really was that fast to begin with.
     
    And starting with SQL Server 2017 (if you have columnstore indexes) and SQL Server 2019 (for everyone):
     
  14. You’re getting a different memory grant due to adaptive memory grants.
  15. You’re getting a different join type due to the adaptive join threshold changing.

 

Which Database is Causing the Problem?

Pick a bottle, any bottle (Moet & Chandon)

You’ve got one SQL Server hosting a bunch of databases, and performance is bad.

Which database should you focus on first?

Here are a few ways you can pick – but keep in mind they may produce different answers, especially when you’ve got several databases that suck, or different ways of sorting the culprits.

Option 1: check the plan cache.

As SQL Server executes queries, it tracks which queries it sees the most often, plus helpful metrics like how much CPU time they used, how much data they read, number of times they spilled to disk, etc.

To find out which queries are using the most CPU power, install sp_BlitzCache, and then run it:

By default, it sorts by CPU, finding the most CPU-intensive queries. You can also use different sort orders like these, and check the documentation for more:

The results will show you the top most resource-intensive queries, and the very first column shows the database name.

I just know the answer is down here somewhere, give me more time

Drawbacks:

  • The database name is just the context where the query ran. If you’ve got cross-database queries, or people are running queries from tempdb, you won’t really see which objects they’re accessing.
  • The data can be somewhat transient when your server is under heavy memory pressure, like trying to host 1TB of data on a VM with 16GB RAM.
  • If your app is using unparameterized SQL, the results won’t be as helpful.

Option 2: check the file statistics.

SQL Server tracks which files get the most reads & writes, and you can query that with the system function sys.dm_io_virtual_file_stats:

Here, I’m sorting them by which files have had the most data read since the server was restarted. (It can also be reset when the database is restored, taken offline/online, etc.)

Drawbacks:

  • These numbers don’t include data retrieved from cache.
  • These DO include system tasks like backups, corruption checks, and index rebuilds.
  • The query above is pretty vanilla – for more details like latency, check out sp_BlitzFirst.

Option 3: check which pages are cached.

If one database’s pages are dominating SQL Server’s memory, then queries in that database are probably the ones repeatedly reading that same data over and over.

To find out which database’s data is using up the most cache, use the sample query from Books Online:

dbo.Champagne is both the problem and the solution

Drawbacks:

  • Querying this data is sloooow, and gets slower the more memory that you have. Querying this data on >64GB RAM can take several minutes.
  • This data is extremely transient – it changes all through the day as people run different queries, causing different pages to go in & out of cache.
  • This doesn’t necessarily mean that the queries reading the most data are the ones causing the biggest problems.

Index scans aren’t always bad, and index seeks aren’t always great.

Somewhere along the way in your career, you were told that:

  • Index seeks are quick, lightweight operations
  • Table scans are ugly, slow operations

And ever since, you’ve kept an eye on your execution plans looking for those performance-intensive clustered index scans. When you see ’em, you go root ’em out, believing you’ve got a performance problem.

Thing is, … they lied to you. Seeks aren’t necessarily good, nor are scans necessarily bad. To straighten you out, we’re going to walk through a series of demos.

First up: a “bad” index scan.

Let’s start with this query – using the Stack Overflow 2013 (50GB) database:

The resulting execution plan has a clustered index scan – reading the whole table, 2.5M rows:

That first line, SET STATISTICS IO ON, turns on a set of messages that will show you the number of logical reads SQL Server performs when executing your query. Those are measured in 8KB pages. The output shows up in the Messages tab:

SQL Server had to read 44,530 8KB pages (the whole table) to count all of the rows.

Okay, so that’s a “bad” scan – because it reads all of the pages – although I could argue that it’s really more of a bad query. (Do you really need an accurate, up-to-the-second count of the number of rows? Probably not – you could probably cache that sucker in the app tier and use the same results for hours at a time. But I digress.)

But here’s a “good” index scan.

Take this query:

We’re telling SQL Server, “Just give me the first 10 rows – I don’t care whether they’re the first ten, or the last ten, or just some random ten you pulled out of your back end.” The execution plan for this one is also a table scan:

See the teeny tiny arrows? That means only a limited number of rows came out – which sounds like a good thing, but here’s the real indicator that it was a good thing:

SQL Server only had to read 5 8KB pages – that’s 40KB for those of you with my math skills – and that’s hardly anything at all! Heck, that’s in the exact range as our next demo.

Here’s a “good” index seek.

The plan shows us dive-bombing into one specific user, and only reading out one row:

It’s a super-efficient operation, only reading 3 8KB pages – not that far off the “good” index scan we did a second ago!

That’s the kind of operation you think about when you see the word “seek” on a plan – but that’s only one kind of example. Try this one on for size:

Here’s a “bad” index seek.

As you mentally build an execution plan for this, you should know that the lowest Id in the StackOverflow.Users export is -1. They all go positive from there.

That query reads back every single user in the table.

You would probably consider that a table scan, but that’s not what the plan shows:

It’s technically a “seek” because SQL Server is seeking to a particular value, and then reading out the rows from there. Logical reads shows more proof that it’s reading the entire table:

That’s a lot of pages. That’s not what you think of when I say “seek.”

Here’s what “seek” and “scan” really mean.

Seek means SQL Server knows what specific row it’s going to start at. That’s it.

  • It might be the first row in the table
  • It might be the last row in the table
  • It might read out just one row
  • It might read out all of the rows
  • It might have been performed multiple times (hover your mouse over the operator in the plan and look at “number of executions” for an indication)

Scan means SQL Server is going to start at either end of the index (min or max) and start reading from there. That’s it.

  • It might read out all the rows (which is what you usually expect, but…)
  • It might only read out one row

An index seek operation isn’t necessarily good, nor is an index scan inherently bad.

To cast judgment, dig deeper.

Let’s take this query:

Its execution plan does a clustered index seek – it’s going to jump to what happens to be the first row in the table (Id -1) and read through all of the rows in the entire table, looking for ones who have a reputation < 0:

So it’s a seek – but is it worthy of attention? Do I need to do something about that operator? To figure out, hover your mouse over the seek operator and check out the tooltip:

And look at:

  • Number of Rows Read: 2,465,713 – meaning SQL Server had to examine this many rows in order to find the juicy rows that you actually wanted.
  • Actual Number of Rows: 0 – meaning its search was fruitless.

When SQL Server has to examine a lot of data to find not a lot of results, that’s a slow way to access the data. You might be fine with that – this might be a query that never runs – but if you want it to run faster, this is a place you might want to focus your indexing improvements. It doesn’t matter whether it’s a seek or a scan – if you have to read a lot of noise to get a little bit of signal, that’s a problem.

Finding Froid’s Limits: Testing Inlined User-Defined Functions

This week, I’ve been writing about how SQL Server 2019’s bringing a few new features to mitigate parameter sniffing, but they’re more complex than they appear at first glance: adaptive memory grants, air_quote_actual plans, and adaptive joins. Today, let’s talk about another common cause of wildly varying durations for a single query: user-defined functions.

Scalar UDFs and multi-statement table-valued functions have long been a bane of performance because as your data quantity grows, they still run row-by-agonizing-row.

Today, SQL Server hides the work of functions.

Create a function to filter on the number of Votes rows cast by each user, and add it to the stored procedure we’re working on this week. Eagle-eyed readers like yourself will notice that I’ve changed the TOP to just a TOP 100, and removed the order by, and the reason why will become apparent shortly:

Without the function, this query takes under a second. Add in the function, and the function flat out kills performance: in compatibility level 140 (2017), the query takes about about a minute. SET STATISTICS IO ON doesn’t show anything about reading from the Posts table:

Note that CPU time is much, much higher than elapsed time – indicating that something in the query went parallel. (That’ll be important later.)

As query tuners know by now, the query plan doesn’t show the work involved with the function:

Thankfully, sp_BlitzCache shows the ugly truth (and you’ll have to click to zoom for sure on this one):

The function runs 102 times, each time burning 2.5 seconds of CPU time in ~600ms of duration, doing 248K logical reads. That’s a lot of row-by-agonizing-row work. Okay, well, that’s not good, and that’s why user-defined functions have caused much gnashing of teeth.

SQL Server 2019 inlines the work, so
it goes slower. Yes, you read that right.

In the Froid white paper, Microsoft talked about how they were working on fixing the function problem. When I read that white paper, my mind absolutely reeled – it’s awesome work, and I love working with a database doing that kind of cool stuff. Now that 2019 is getting closer to release, I’ve been spending more time with it. Most of the time, it works phenomenally well, and it’s the kind of improvement that will drive adoption to SQL Server 2019. Here, though, I’ve specifically picked a query that runs worse only to show you that not every query will be better.

To activate Froid, just switch the compatibility level to 150 (2019), and the query runs in 1 minute, 45 seconds, or almost twice as slow. Dig into the actual plan to understand why?

There’s something present that we DON’T want: an eager index spool. Like Erik loves to say, that’s SQL Server passively-aggressively building a sorted copy of the data in TempDB, accessing it over and over, and refusing to build a missing index hint. “No no, it’s okay, you’re busy – I’ll take care of it for you. I know you don’t have time to build indexes or call your mother.”

There’s something missing that we DO want: parallelism. Remember how I kept harping about the 2017 plan going parallel, using more CPU time than clock time? Yeah, not here – this query is single-threaded despite a cost over 1,000 query bucks. Even if you clear the plan cache and try again with @Reputation = 1, you still get a single-threaded plan:

And it’s not that you can’t get a parallel plan with a scalar UDF – that limitation has been lifted for Froid-inlined functions, but there’s something wonky happening here in the plan generation. Sure, I understand that the eager spool is still single-threaded, but… what about the rest? Hell, even a COUNT(*) from Users goes parallel here, and that finishes in under a second:

<sigh>

So, to recap: the query runs slower, doesn’t generate missing index hints, and is still single-threaded. For this particular query, Froid isn’t getting us across the finish line.

Let’s try inlining the function ourselves.

Take the function and inline it:

And the difference is staggering: it runs in under a second, and everything in the plan goes parallel:

FroidSo to recap: am I saying Froid is bad? Absolutely not – I’m loving it. Just like adaptive memory grants and air_quote_actual plans, Froid is one of my favorite features of SQL Server 2019 because when it helps, it REALLY helps. The example in this blog post is a fairly unusual case in my testing so far.

However, like I wrote yesterday, you’ve really gotta test your code to be able to say with confidence that you’re going to see a difference, and that the difference won’t be adverse. It’s just not fair to expect Microsoft to deliver an update to SQL Server that makes every single edge case go faster. When a few queries regress, you’ve gotta have history of their prior plans in Query Store, and the knowledge to use that past plan to get an older version in place long enough for you to fix the query for the long haul. (Remember, as developers deploy a tweaked version of the query, like adding a field to it, the old Query Store forced plan will no longer apply as the query text changes.)

And again – this is just like the 2014 release when 99% of your code ran faster, but 1% ran slower, and that 1% came as a really ugly surprise that required hard, hard troubleshooting to narrow down. Joe Sack giveth, and Joe Sack taketh away. (Well, I guess even when he taketh, he giveth to uth consultanth.)

Updated April 12 – had the wrong inlined function in the “after” results, and was showing a manually-tuned version as running in 13 seconds. My  bad – that was from a prior draft of the post. Props to Bryan Rebok for catching that in the comments.

Parameter Sniffing in SQL Server 2019: Adaptive Joins

So far, I’ve talked about how adaptive memory grants both help and worsen parameter sniffing, and how the new air_quote_actual plans don’t accurately show what happened. But so far, I’ve been using a simple one-table query – let’s see what happens when I add a join and a supporting index:

(Careful readers will note that I’m using a different reputation value than I used in the last posts – hold that thought. We’ll come back to that.)

The execution plan leverages a new 2019 feature, adaptive joins for rowstore tables. They were available to queries with a columnstore index in 2017, but this is a new icon to see in a rowstore-only plan:

In this query plan:

  1. SQL Server started with a clustered index scan on Users, and produced 50k matching rows.
  2. It then had to decide how to use the index on Posts.OwnerUserId. For low numbers of rows (in this case, 36,700 or less, a threshold visible in the plan’s tooltips), it would have preferred an index seek for each matching row. For higher numbers, it would prefer to scan that index.
  3. Because our number of rows exceeded the adaptive threshold, it went with a scan.

I can hear you cheering because this is awesome. It means your plans are less susceptible to parameter sniffing problems because now SQL Server can cache one plan that adapts to more input parameters.

Narrator: “Sadly, the reader was incorrect.”

Now run it for @Reputation = 1. As we saw in earlier posts, @Reputation = 1 produces a lot more data – about 1M rows in this case – so it’s not surprising that something goes wrong. In this case, the adaptive join didn’t grant enough memory, so it spills 12,880 pages to disk:

That in and of itself isn’t really all that bad, though – and if I continue executing @Reputation = 1 several times in a row, adaptive memory grants kick in and stabilize memory at 288MB. The query stops spilling to disk, and it goes faster.

Until I run it just once for @Reputation = 2, at which point the memory grant starts dropping – and it can drop down to just 7MB. You saw in the first post how that story ends: the memory grants swing back and forth, constantly basing a query’s memory grant on the parameters used the last time the query ran, not the current parameters.

Heaven forbid I run it for @Reputation = 1 right after I run it for @Reputation = 2:

Spill City

Welcome to Spill City, population 212,785

And remember: if you try to diagnose why this query took >10 seconds, you can’t see spills in the new air_quote_actual plans in 2019 yet – but they’re hopefully coming for RTM.

So what’s a good plan for @Reputation = 1, then?

Oh, dear reader. Oh, sweet friend. Let’s run it for @Reputation 1, 2, and 3, all with recompile on so we can see what plan will get put into cache when each variation happens to run first:

1 query, 3 parameters, 3 different plans

For a simple one-join query with only one parameter, three different-shaped plans are cached for three different input values, and they effectively produce 4 different plans since one of those is an adaptive join.

  • @Reputation = 1 caches a plan that starts with a Posts index scan in batch mode
  • @Reputation = 2 caches a plan that starts with a Users index seek, and does everything but the sort in rowstore mode
  • @Reputation = 3 caches an adaptive join plan that either scans a Posts index in batch mode, or seeks it in rowstore mode

Deploying the new Charcoal Estimator (CE)

This isn’t a new phenomenon because complex queries can often produce wildly different plans, but SQL Server 2019 just makes it happen more often, with even simpler queries, because now we’ve got the option of adaptive joins and batch mode. SQL Server has exponentially more plans to choose from, and in the early days of 2019, some of these to backfire, hard.

Parameter sniffing isn’t getting easier with 2019.

It’s getting harder.

Much, much harder, to the point where it’s getting to be time for a mind shift amongst query authors. In theory, we have to stop asking, “Why was this query slow the last time I ran it?” and start asking, “How can I change this query to be more predictably fast?” I just don’t see that mind shift happening anytime soon, though. That’s not the way humans work. We’re all about shipping a good enough query, quickly. It’s a miracle that our queries even compile, let alone produce accurate results, let alone perform well.

SQL Server tuning work ebbs and flows.

In some SQL Server versions, the job of performance tuning gets easier. SQL Server 2016 comes to mind: for many SPs and CUs, many of which got backported to 2012 and 2014, we were blessed with a continued stream of diagnostic data improvements that made it way simpler to diagnose complex problems like memory grants, spills, and poison waits like RESOURCE_SEMAPHORE and THREADPOOL.

In other versions, SQL Server adds new features faster than the supporting diagnostic infrastructure can keep up. SQL Server 2014’s new Cardinality Estimator, In-Memory OLTP (Hekaton), and updated columnstore indexes were good examples of that: you could use ’em, but boy, they could backfire without warning.

Microsoft’s obviously going to say that SQL Server 2019 is in the make-life-easier category because the robots do more for you. However, the robots have brand new, untested, not-very-well-documented choices that they’ve never had before, and it’s much harder for you to see what they did after the fact.

Based on my time with 2019 so far, I think it’s going to be a lot like 2014. People are going to:

  1. Put 2019 in their testing environment
  2. Test their worst queries under 2019 compatibility level, and be pleasantly surprised
  3. Go live with 2019 (but they won’t enable Query Store because they don’t know why or how, and the defaults don’t make sense)
  4. Have their server catch fire with unexpected query plan regressions
  5. Not have the time to figure out if the problem is adaptive grants, adaptive joins, or batch mode (and especially not have the time to understand that any of those could cause problems with any query)
  6. Switch their databases back to 2017 compatibility level (which is thankfully easy)

But instead, I really wish they would:

  1. Enable Query Store today, and start building up a history of good query plans that you can use to fix regressions
  2. Go live with 2019, but stay in 2017 compatibility level, again, building up a good history
  3. After a couple/few weeks, try 2019 compatibility level – but don’t be afraid to switch back if the work is overwhelming
  4. If the work isn’t overwhelming, use Query Store to regress specific plans, but inventory them
  5. For each regressed plan, start working with your developers to identify what behavior is causing a problem, and modify those queries or their supporting indexes to remove the requirement for the plan guide

That last step is so important because sooner or later, your developers are going to ship a new version of that query. They’re going to add a comment, tweak the query, or add a join, and as soon as they do, the plan guides will no longer be relevant. Remember, Query Store’s plan guides only work for exactly similar prior queries. SQL Server can’t automatically regress a query to a prior plan if you’ve never had a good plan for that exact query.

And oh, are your plans about to change, bucko. Tomorrow, we’ll discuss the biggest change of them all: automatic inlining of user-defined functions.

Parameter Sniffing in SQL Server 2019: Air_Quote_Actual Plans

My last post talked about how parameter sniffing caused 3 problems for a query, and how SQL Server 2019 fixes one of them – kinda – with adaptive memory grants.

However, the post finished up by talking about how much harder performance troubleshooting will be on 2019 because your query’s memory grant is based on the last set of parameters used, not the current set.

So let’s imagine that you just got an emergency phone call: the boss says their query was slow just now, and they wanna know why.

In theory, you’ll use the new sys.dm_exec_query_plan_stats.

Microsoft’s Pedro Lopes unveiled this feature last week, and in theory, it sounds pretty easy:

  1. Enable trace flag 2451
  2. Get the query’s plan handle
  3. Get the “actual” plan by running SELECT query_plan FROM sys.dm_exec_query_plan_stats (your_plan_handle)

Let’s start our adventure by running usp_UsersByReputation for @Reputation = 2, which gets the tiny memory grant. Here’s the real actual plan from running the query:

Real actual plan for @Reputation = 1

The real actual plan is full of rich details: the degree of parallelism, how long the plan compilation took, and something very important for this particular issue, the amount of memory grants. In this case, the query desires 11.7MB.

Normally, all those details will be lost in time, like tears in rain. Enter sys.dm_exec_query_plan_stats – we’ll use sp_BlitzCache to show the plan handle, and then pass that to the new DMF to get the most recent “actual” plan:

And at first, things look promising – click on the query_plan, and you get:

“Actual” plan from sys.dm_exec_query_plan_stats for @Reputation = 2

IT’S AN ACTUAL PLAN! FOR A QUERY THAT RAN IN THE PAST! You can tell it’s an actual plan because the numbers below each operator are the actual numbers of rows that returned from each operator. This is a huge step forward because it enables you to see where the query plan’s estimates went wrong: where it thought that only a few rows would come back, but in reality, way more did. But before you pour the champagne, dig a little bit deeper.

But…there’s a whole lot missing from the details.

And one of the most important things there, desired memory, is completely wrong.

The memory grant numbers don’t show what the query executed with – they show the desired memory grant for the query’s next execution! To see it in a painful way, run it again, this time for @Reputation = 1, and here’s the real actual plan:

Real actual plan for @Reputation = 1

Again – rich details, especially for the sort. That yellow bang warning on the sort is absolutely priceless, and when you hover your mouse over it, you get a tooltip showing how many pages spilled to disk.

And a whole lot of them did – because the desired memory for this query is just 1.5 MB! Ring a bell? That’s the desired memory from the supposed last “actual” plan, which wasn’t actual at all. So let’s go query sys.dm_exec_query_plan_stats and see what this query supposedly had for an “actual” plan:

“Actual” plan from sys.dm_exec_query_plan_stats for @Reputation = 1

“It desired 209MB.” THE HELL IT DID. That’s the grant for the next time this query runs!

Henceforth, I shall refer to these plans as air_quote_actual plans.

In practice…well, it’s only CTP 2.4, but…

To recap what I’ve shown so far this week:

  • Adaptive memory grants mean your query’s memory is based on the prior execution’s parameters
  • The air_quote_actual grants in sys.dm_exec_query_plan_stats are based on the query’s next execution

I can see why this is hidden behind a trace flag, and isn’t practical for mainstream distribution. If this shipped to the public as actual plans (no airquotes), it would hurt more than it would help. They don’t include:

  • A lot of details about memory grants (granted, wait time, used, etc) – and what it does show is wrong
  • Degree of parallelism
  • IO statistics (reads)
  • Wait stats
  • TempDB spills – although Pedro reports those are coming soon:

That means the air_quote_actual plans really just serve as clues that will require a trained performance tuner detective to solve. Just like you’ve seen for years in Watch Brent Tune Queries, your mission is to start at the top right of the query plan, review the estimated versus actual numbers to see where they went awry, and then change the query or the supporting indexes to guide the optimizer down a better path. Air_quote_actual plans serve as a tool in that investigation, but you have to know which parts are true – and which parts are lies.

<sigh>

I do like it – just like I like adaptive memory grants – but it’s another sign that this is going to be a 2014-style release. More on that in the next post when we explore another new SQL Server 2019 feature to mitigate the perils of parameter sniffing: adaptive joins. This one really does work as designed, eliminating parameter sniffing issues while building a resilient plan that works for all kinds of parameters.

(What? That was too obvious? You really are sharp. You’ve always been my favorite reader, you. We really get each other.)

Parameter Sniffing in SQL Server 2019: Adaptive Memory Grants

This week, I’m demoing SQL Server 2019 features that I’m really excited about, and they all center around a theme we all know and love: parameter sniffing.

If you haven’t seen me talk about parameter sniffing before, you’ll probably wanna start with this session from SQLDay in Poland. This week, I’m going to be using the queries & discussion from that session as a starting point.

In this week’s posts, I’m using the 50GB StackOverflow2013 database (any size will produce the same behaviors, just different metrics.) I’ve got an index on Reputation, and I’m using this stored procedure:

Setting the stage: how SQL Server works today

When you call it for @Reputation = 2, you get an index seek + key lookup plan optimized for tiny amounts of data.

When you call it for @Reputation = 1, you get a clustered index scan plan optimized for large amounts of data.

Plans optimized for different reputations

The easiest way to tell the sad story of plan reuse is with a table:

Parameter sniffing metrics

Neither the @Reputation = 1 or @Reputation = 2 plans are good for the other value. When SQL Server caches the scan plan, it runs into RESOURCE_SEMAPHORE issues due to unused large memory grants when other parameters run. When it caches the seek + key lookup, @Reputation = 1 performs poorly.

Note that this works the same way in SQL Server 2019, too, as long as your database is in 2017 compatibility level. (All of these screenshots & data were done on SQL Server 2019.)

This query presents three problems.

There’s no one right way to read the data. A nonclustered index seek + key lookup is fastest for small numbers of rows, but there’s a tipping point where a clustered index scan is more appropriate.

There’s no one right memory grant. The query uses between 296KB and and 160MB, a huge range. Even worse, because the data types are much larger than the average population actually uses – and because SQL Server doesn’t know that until runtime – the grant varies between 12MB and 6.7GB.

The index recommendation is flat out wrong. Clippy suggests that the user create an index on DisplayName, and then just include all of the columns on the table. If that’s created, it does solve the read problem, but we just doubled the size of our table, and it doesn’t fix the memory grant problem because we still have to sort 1,090,040 rows by DisplayName every time the query runs.

Until SQL Server 2019, the typical solution was to create an index on Reputation, DisplayName, and then living with the 10,000 key lookups (which really isn’t a big deal.) However, most people just can’t figure that solution out because Clippy’s missing index recommendation is on Reputation, and then just includes all the fields.

SQL Server 2019’s adaptive memory grants aim to solve one of them.

I wrote about adaptive memory grants a few months ago when the announcement first dropped, and the short story back then was that I liked it. In SQL Server 2019, when databases are in the 2019 compatibility level, the memory grant changes over time as the query runs.

I love this feature so much because it’s the kind of thing end users kind of expect that the database is doing already. When I demonstrate parameter sniffing to them, their usual response is, “Wait, what the hell do you mean that the database server doesn’t allocate different amounts of power to work with different parameters?” That’s an awkward moment. Adaptive memory grants help SQL Server catch up to user expectations.

In my work since then, I’ve gotten a little more nervous. I still like it, but…I’m getting nervous for the end user questions that are going to start coming in.

Your query’s memory grant is based on the LAST PERSON’S PARAMETERS, not yours.

I’ll demonstrate by running the proc a few times in a row, documenting its memory usage each time, showing how it adapts:

Changing memory grants

I’ll translate this into a support call:

  • User: “Hi, I just ran the sales report and it was really slow.”
  • Junior Admin: “What parameters did you use?”
  • User: “I ran the national report for the last 3 years.”
  • Junior Admin: “Oh yeah, you’re working with large time spans, it’s slow the first time you run it, but it’s really fast after that.”
  • User: “What?”
  • Junior Admin: “Yeah. I’m not sure why it seems to really pick up speed after the first time. But if you wait a while, it slows back down again. Maybe it has something to do with caching.”

Or maybe a more senior admin handles the call:

  • User: “Hi, I just ran the sales report and it was really slow.”
  • Senior Admin: “What parameters did the person before you use?”
  • User: “What?”
  • Senior Admin: “Yeah, see, your report speed is based on the parameters the last person used. SQL Server gives you the right amount of memory for their parameters.”
  • User: “What?”
  • Senior Admin: “If you want it to go fast, you have to run it a second time. The second time you run it, it’ll have the right amount of memory for what you asked for last time.”
  • User: “What?”
  • Senior Admin: “Unless someone else sneaks in and runs it at the same time. So yeah, your best performance will be if you run it twice when no one else is around.”
  • User: “Okay.”

Or maybe it gets escalated all the way to our best staffer:

  • User: “The other two admins don’t know why my query is slow. Can you run it and see what’s going on?”
  • Señor Admin: “No.”
  • User: “What?”
  • Señor Admin: “Every time you run a query in SQL Server 2019, it can get a different memory grant based on the prior execution or lack thereof. If I run it for you now, it might be blazing fast, or unfairly slow.”
  • User: “Was SQL Server 2017 like this?”
  • Señor Admin: “Silence. Rather than looking at SQL Server’s behavior, we need to look at your query’s design and the supporting indexes. We need to look at the shape of the plan, understand why it’s getting that shape, and see what we can do to change your query to get a predictably better plan shape.”
  • User: “That sounds like a lot of work. Can’t we just go back to 2017 compatibility mode where the grants were predictable?”
  • Señor Admin: “I liked you a lot better before you went to SQLSaturday.”

Fortunately, Microsoft saw this coming, and adaptive memory grants give up when grants keep swinging around with no sign of relief, settling back on the original grant that they started with for the first compilation. However, the clock resets whenever you rebuild indexes, update statistics, or the plan gets dropped out of cache – which means that support call will get even worse. That poor admin is probably going to be right back to the old recommendation of, “Rebuild the index and then run the query again, and it’ll be super-fast” – all because it’s getting a plan customized for that incoming set of parameters.

<sigh>

Like I said – I love this feature, but I’m getting nervous. I’m going to have to put some serious thought into the First Responder Kit scripts to make this easier to diagnose. In the next post, I’ll cover a new SQL Server feature that’s designed to help.

What You Said You’re Going to Do About SQL Server 2008

Last week, I asked you what you were going to do about SQL Server 2008’s fast-approaching end-of-support date.

Countdown clock at SQLServer2008.com

Countdown clock at SQLServer2008.com

Here were some of my favorite responses from blog comments and Twitter. Terry’s been doing a good diligent job, but struggling to keep up:

Like Andrew we’ve been working on retiring/migrating off 2008 (to mostly 2014 or 2016) for 1.5 years. We started with 25 instances and are down to 3. They’ll be done by the EOL date. Then a couple year breather before we rinse and repeat with 2012 (EOL 7/12/2022).

Trav wrote:

if you’re my company you ignore the dbas who have been warning about this for 2 years and then get mad when we point out that time is up.

But Pittsburgh DBA responded with something that I heard from a lot of commenters:

Who cares? It works fine, and it’s been working fine for a hell of a long time. This panic attack going on around the country about 2008 is hilarious.

I don’t know that I’d say it works fine because I usually reserve the time fine for things like art and dining. If SQL Server 2008 was a restaurant, it’d be more Mickey D’s and less Top Chef – but I get where he’s coming from.

Similarly, Alen points out:

my old employer still has some 2005 servers. They still work. there is no money to upgrade them. When i worked there i called MS maybe 2-3 times in many years for SQL support.

I’ve heard that remark about support from several clients, too, as in, “We don’t call them for support anyway. That’s why we have you, because we got burned by a few bad support experiences.” I try to gently remind them that I don’t have the capability to write patches for bugs.

Mark Freeman’s servers are on life support:

We have four 2008 R2 instances that I’ve been asking to upgrade for almost 2 years. The answer has always been, and continues to be, that they are supporting a legacy application that we intend to retire Very Soon Now and therefore we don’t want to put any resources into an upgrade project. That legacy application is, in fact, being replaced and the replacement (using Azure SQL Database) is rolling out over the next two years. Meanwhile, the legacy version (not in Azure) is responsible for supporting 80% of revenues. That going out of support doesn’t seem like an optimal situation.

John Cas points out that even Microsoft’s apps have those problems:

Greg Low reminded me of something I’d (happily) forgotten about:

Core based licensing shock was the #1 reason most don’t want to move from 2008 R2.

And on a related note, Jeff G reminds you that the upgrade money might come from the same pocket that pays your salary:

In the industries I have worked in (casino vendor, transportation, construction), the idea of upgrading because Microsoft stopped supporting the software is laughable (as in I was laughed at when it was suggested to upgrade). But on the same token, these guys do not see technology as a revenue generator but more as a revenue black hole. No matter how I explain it, the sticker-shock is too great. Since these were smallish companies, I don’t necessarily blame them. “As vote of hands, who would rather have bonuses this year and who would rather run software that is supported by Microsoft?” Software always loses in that case. Now if I am a 1000+ person company with multiple data centers (and cloud instances) then the sell is far easier (at least it was for me). This was due to rolling it into the master licensing agreement (you know for all the Windows licenses, Office 365 licenses, and server licenses) which was a yearly expense that was already budgeted for. To give you my opinion, if the support of 2008/2008 R2 doesn’t cost too much, then I would leave it in.

Tony’s company is thinking big because they’re tired of this hamster wheel:

Expired versions of ERP & integrated engineering systems (running on 2008R2) have to be dealt with first before turning off old OS & SQL is possible. Work is *finally* underway (6 years ago I came to this job to do this), but these systems will probably hang on for another 2 years. And maybe longer, if it’s decided to keep them running as reference. Infrastructure team wants to airlift the data center into cloud, which at least takes hardware failure off the risk list (7 years old and counting).

John C wrote:

we plan to make the leap to 2019 by end of Q3… but considering 2017 in case the RTM launch of 2019 delays to end of Q2.

It’s easy for me to say this from an armchair perspective, but I think Microsoft made a mistake with the 2019 launch timing. They used to make a lot of noise that they were going to ship a new version every year on a “fast train” of releases, but 2019 is taking longer to ship than 2017 did. I know the MS folks will say, “But 2019 has so many big features and we can’t rush it” – that just means someone got the scoping wrong, and should have shipped a smaller incremental set of features earlier.

On a related note, Rowdy Vinson tweeted:

I had to laugh at that. I love 2017 and I use it every day, but I gotta confess that when I wrote the post Which Version of SQL Server Should You Use, the 2017 advantages weren’t quite as awesome as I’d have hoped. But wasn’t I just saying MS should ship a smaller set of improvements, more often? 2017 is what that would look like, and it’s fine. No, not fine as in…<sigh> Moving on.

Shaun Austin wrote:

I think the apathy from some organisations comes from not really understand what a lack of support means. I work for an ISV and our software runs on SQL Server 2008 or later. I’ve been encouraging our customers to upgrade SQL Server for the last 12 months or so. The typical conversation goes like this:

Customer: “what do you mean when you say our version of SQL Server won’t be supported from July?”

Me: “I mean Microsoft won’t provide critical security updates and bug fixes. Nor will they provide technical support in the event of a serious problem, nor will it be guaranteed to run on later versions of Windows Server.”

Customer: “Huh….but your software will still work…right?”

Me: “Well…as it stands….yes…so long as SQL Server is able to run…our software will continue to work. There are other performance and feature benefits too of course”

Customer: “Hmm…ok…well…I don’t have a problem now, and an upgrade to SQL Server seems really expensive…I’ve also heard about that cloud thingy which is going to save me a chunk of cash and I’m definitely going to look into that…..hmm….I’ll get back to you.”

Or as Dcariboo summed up:

Business dictates to IT what technology we use, hence why we still have an app around that requires sql server 2000. 8\

Andrew Miller summed up the life of an admin:

We’ve been migrating DBs from 2008/2008R2 to 2016(preferred) or 2012 for the past 1.5 years. We are down to 4 applications still running on 2008/2008R2. We plan to be complete by next week. At which time, we’ll start planning the migrations from 2012 to 2019.

But I think Jeff Moden’s comment was my favorite:

I’m tickled!!! It means that 2008 is finally STABLE!!! ? ? ?

Building a SQL Server app? Start with Azure SQL DB.

No no, April Fool’s was yesterday.

Hear me out.

Azure SQL DB’s feature limitations are a good thing. It doesn’t support everything that SQL Server supports, like CLR, cross-database transactions, database mail, linked servers, and OPENQUERY, but…frankly that feature-limitation page reads like a list of my favorite things to tell developers to avoid. You shouldn’t be using most of that stuff in a new application today. You might need it down the road, but…if you don’t, don’t get started using it on day 1. Stick with the basics, and Azure SQL DB supports those, plus more.

Azure SQL DB has optimistic concurrency. You get Read Committed Snapshot Isolation (RCSI) by default, which avoids most common readers-blocking-writers and writers-blocking-readers problems. Sure, you can still get into blocking problems – but this just avoids a lot of ’em right out of the box.

Azure SQL DB will force you to handle errors. A good developer will listen for things like deadlock errors and build retry logic into her application – but let’s face it, most of us aren’t that diligent. Azure SQL DB kinda forces you to do it by giving you transient errors when your database is under too much load. The better our developers realize that yes, even the best databases can throw transient errors, the better.

No, seriously, not a data dumpster fire, stop saying that

And stop seeing the official icon as a data trash can where the inside contents are on fire

Azure SQL DB confines the blast radius of bad queries. In a traditional shared SQL Server environment, a poorly written application can wreak havoc on the performance of unrelated applications that share the same guest or host. With Azure SQL DB, your performance really is confined to just your database – so poorly-written queries show up on the radar, fast. Managers get a much better idea of how much bad code will cost on a monthly basis.

In fact, if you’re a database administrator reading this, I bet Azure SQL DB is exactly the kind of system you’d want to force your developers to use when building a new app.

Stop thinking of Azure SQL DB as a limited product, and think of it as a database platform managed by strongly opinionated DBAs.

DBAs just like you.

Registration Open for My SQLSaturday Sacramento Workshop

Performance Tuning in 21 DemosI’m coming to Sacramento this year for SQLSaturday #885. On Friday, I’ll be teaching a one-day pre-con:

Performance Tuning in 21 Demos – You’re a developer or DBA who wants to tune indexes and queries to make your application go faster. You don’t like learning by watching slides or reading white papers – you want to jump right into the code and see what happens. You’re the kind of person who loves watching YouTube videos that start with, “Hold my beer and watch this.”

Attendees get their choice of a Recorded Class Season Pass or a Consultant Toolkit – so the pre-con pays for itself! Learn more and register now.

Can’t make it to Sacramento? Here’s where to find me this spring:

Should we use stored procedures or queries built in the app?

A client asked a great architecture question that deserved its own blog post:

Should we use more “stored procedures” for select, insert, update, delete or should that just be sent as queries from the application?

If you ask a junior database administrator where to put something, she’ll probably say “in the database” because that’s the thing she has control over. That’s fair – DBAs love having control over stuff. However, let’s step back a little.

If your shop lacks T-SQL expertise, but has a lot of sharp senior application developers, then it may be easier to keep the queries in the app. If you simply make a mandate that “all queries come from stored procedures,” but you don’t give your staff the training necessary to write fast, scalable stored procedures, then the result won’t perform. Database administrators love to bash the queries written by ORMs like Entity Framework and nHibernate, but the reality is that these days, those tools write better queries than untrained developers. Sure, in a perfect world, we’d all be perfectly trained on all of the tools we use, but….

If your shop is good at testing & source controlling app code, but doesn’t have that same level of expertise for testing & source controlling stored procedures, then you’ll probably find versioning and deployment easier when the queries are built into the app code. In the year 2019, testing & source controlling database objects like stored procedures and functions still isn’t quite as easy as app code. It’s getting better, but it’s not there yet.

If you’re good at measuring app code performance, like if you’re good with code speed tools like New Relic that instrument every step of your application code, then you’ll be able to more naturally track down slow queries with those tools. However, if you’re not good at app performance monitoring, you may find it easier to track down slow queries with plan cache tools like sp_BlitzCache.

If you have predictable performance needs, like your business has the same exact workload for months straight, then you can keep code in the application tier. When you start to see performance issues, you can make gradual changes to the code, test that as part of your regular testing processes, and gradually deploy those changes out through your normal release process.

However, if your business has extremely unpredictable performance needs, like a sudden high volume of transactions every few months, and if you don’t do a great job of performance testing ahead of time, then stored procedures can be a better fit. In the heat of a performance emergency, it can be easier to change and deploy a new stored procedure than deploy an entire new build of your application. For example, during one client’s emergency, I implemented this bizarro stored procedure caching technique because it meant the difference between making money, versus their online store being…not online.

In this client’s case, with a team of sharp app developers, none of which had database training, and no database developers or DBAs on the team, the answer was clear: keep the queries in the code. I do try to give folks an avenue to up their game if they want, though – and in your case, if you’ve got developers who want to level up, start them on Itzik’s classic, T-SQL Fundamentals.

What Are You Going to Do About SQL Server 2008?

Last week, you used SQL ConstantCare® to analyze 2,396 SQL Servers across a variety of versions:

Alas, poor SQL Server 2008

Alas, poor SQL Server 2008

  • 15% (362 instances) of SQL Server 2008 and 2008 R2
  • 18% (438) SQL Server 2012
  • 27% (650) SQL Server 2014
  • 32% (777) SQL Server 2016
  • 7% (169) SQL Server 2017

That means 15% of your servers are going to be out of support in about 106 days: SQL Server 2008 & R2 both end support on July 9th.

So I’m curious:

  1. What does your company plan to do about that?
  2. If the answer is “nothing,” what have you tried to convince them otherwise? Or have you?
  3. What’s the single biggest thing stopping you from moving to a supported version?

I’m asking because it influences how we support these older versions in the First Responder Kit, SQL ConstantCare, and Consultant Toolkit. For years, I’ve said, “If Microsoft can’t support it, neither can we,” but given the relatively high market penetration of these old versions, I’m not sure I can still say that. (After all, if I look at the market numbers, it’s more profitable to support 2008 and R2 right now than it is to support 2017!)

What happens when you cancel or kill a resumable index creation?

SQL Server 2019 adds resumable online index creation, and it’s pretty spiffy:

Those parameters mean:

  • ONLINE = ON means you’ve got the money for Enterprise Edition
  • RESUMABLE = ON means you can pause the index creation and start it again later
  • MAX_DURATION = 1 means work for 1 minute, and then gracefully pause yourself to pick up again later

If you decide you want to pause the index, run this command in another session:

When you do that, the index creation’s session fails with a bit of an ugly error:

Aborted index creation

Okay, well, that’s not exactly what I’d expect from “pausing” something, but that’s okay. There’s a new DMV, sys.index_resumable_operations, that shows the paused index creation:

sys.index_resumable_operations

It shows the syntax used to create the index, the percentage complete, when it was last paused, and more. If you want to give it a happy ending or abort it, it’s your choice:

So far, so good.

The first catch: a paused index is stealthy.

It doesn’t show up in sys.indexes, nor does it show up in the Object Explorer list of indexes on a table. That makes it sound like it’s harmless, but it’s not. I can’t drop the table:

Resumable index operation error

Nor can I create any other index – same error:

Index creation error

Tricky. New stuff for developers to watch out for in their deployment scripts.

The next catch: killing or canceling an index creation, doesn’t.

In SSMS, you’re used to being able to click the “Cancel” button on your query, and having your work rolled back.

You’re also used to being able to kill a query, and have it automatically roll back.

Neither of those are true with resumable index creations. In both cases, whether you kill the index creation statement or just hit the Cancel button in SSMS to abort your request, your index creation statement is simply paused until you’re ready to come back to it. (Or, it’s ready to come back to haunt you, as we saw above.)

So you’ve learned your lesson, right? Don’t use the resumable parameter if you don’t specifically need it. As long as you don’t ask for it, you should be fine…right? Wrong.

The final gotcha: you might get resumable even when you don’t ask for it.

SQL Server 2019 adds these database-level configuration options:

This means whenever you create an index, it’ll default to online creation and resumable even when you don’t specify it as part of your index creation statement! Translation: whenever you go to create any index, if you decide it’s a bad idea and click Cancel, your index creation is still left online partway done, lying in wait to disrupt other operations.

That’s why I’ve added an sp_Blitz check for resumable index operations that have been left behind. It’s in the dev branch now, and it’ll be in the April release. Enjoy!

Menu
{"cart_token":"","hash":"","cart_data":""}