Blog

There’s a 6-Month Statute of Limitations on “The Last Person.”

“The last person must have set it up that way.”

“The last person wrote that code.”

“The last person just didn’t configure it right.”

You can use that excuse for 6 months.

For six months, you’re allowed to get up to speed on the company’s politics, the intricacies of the app, and the responsibilities of different departments. You’re allowed to prove yourself to your peers, building up social capital so that they’ll take your recommendations and run with ’em..

You can take your time figuring out whether the last person knew what they were doing or not. You should start by assuming that you did, and give them some benefit of the doubt because they were under time pressure just as you are now. If you’re generous and curious, you can even try to contact them through unofficial channels, offer to buy them lunch, and chat about their experiences at the company.

But after 6 months, the statute of limitations is up.

TIME’S UP

After that, you’re not allowed to blame “the last person” anymore. 

The last person can no longer be prosecuted for their crimes against the database, and they are absolved of any guilt. It doesn’t matter who was originally responsible last year: the person responsible is now you.

So pull yourself up by the bootstraps, write up a health check with sp_Blitz, and start working through the problems. Put the correct backups and corruption checking in place. Schedule that outage. Apply the patches. Fix those linked servers using the SA login.

Because after 6 months, if you’re not fixing these problems, the clock is already starting to tick down to when you will be referred to as “the last person”, and they’re going to roll their eyes when they talk about your inability to get the job done.

Just like you’ve been doing about “the last person” for over 6 months.


[Video] Office Hours in a Hotel Room

Videos
1 Comment

While in Nashville for a creators conference, I went through your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

  • 00:00 Start
  • 00:38 Bisal Basyal: What is the best way to manage roles in sql server in Azure VMs (multiple). We want separate logins for each users but it should be same on all Azure SQL VMs. We are currently using windows Cred. setting credentials on credential manager for that server IP but it is too slow.
  • 01:06 MyTeaGotCold: I’ve only ever heard bad things said of MySQL, but it’s often above SQL Server in surveys. What am I missing?
  • 02:27 FIN7: What are the top gotcha’s you have run into when migrating SQL onprem to Azure SQL Managed instance?
  • 03:39 Unspoiled: is there any database level statistic you would recommend to monitor to know the impact or pressure a single database is generating. the goal being to understand how much pressure a single database is generating with multiple on a the server.
  • 04:59 Poul J: Hi Brent. I was wondering if you are using a dedicated tools for investigating complex query plans?
  • 05:08 SteveE: Hi Brent, I’m looking at your SQL ConstantCare® Population Report: Spring 2024 and can see a spike for Azure SQL DB in 2022 Q3 which then drops back to approximately the prior level in the next quarter. Are you able to offer an insight as to why this is please?
  • 05:41 Kansas4444: Do you often see CLR function / procedure used and what impact it has on performances ?
  • 06:08 Mattia Nocerino: Hi Brent! I’ve inherited a 3 node cluster (2 nodes FCI + 1 node AG) but it keeps going down for all the wrong reasons. I’ve never built nor managed a similar infrastracture. Could you point to some resources to get me started figuring out what’s going on! Hope you’re doing good!
  • 07:50 BackupsAreImportant: What’s the drawback to implementing a backup strategy that only used the read-only node of an AG? You can take COPY_ONLY backups and log backups from there. I know the COPY_ONLY won’t affect the log chain but you can still do restores and apply logs. Seems wrong but why?
  • 09:02 ThatSteveCena: With file system advancements on Windows, should we consider formatting MDF/LDF drives using ReFS or stick with NTFS?
  • 09:56 Peter: Hi Brent, some devs have started slapping OPTION (NO_PERFORMANCE_SPOOL) on their queries. This is not something you covered in your courses. Is it a real solution to a genuine problem or a workaround to a problem they could or should be fixing a better way.
  • 11:23 DATA cow: ALTER DATABASE DBname SET MEMORY_OPTIMIZED = ON; Version : 2019 what benefit we get by enabling ?I understand frequent data will loaded to memory. is that mean are we end up with run out of memory or memory or server pressure ?

Two Tweaks for Faster Backups with Ola Hallengren’s Scripts

Ola Hallengren’s free maintenance solution is widely used as a replacement for SQL Server maintenance plans. It’s a more powerful, flexible tool for backups, corruption checking, and index & statistics updates.

If you’re using it for backups, there are two quick, easy changes that can dramatically reduce your nightly job runtimes.

First, set @NumberOfFiles = 4.

SQL Server has internal bottlenecks for single-file backups, and striping your backups across multiple files removes that bottleneck. I’ve written about testing the number of files before, and in a perfect world you’d have the time to do that testing, but for starters, just try writing your backups across 4 files.

You don’t need separate drive targets or network targets – even writing the 4 files to the same volume usually produces pretty dramatic performance improvements for free.

I only do this on my user databases because the system databases are generally so small that it doesn’t matter. To set it up, just right-click on the Agent job for user database full backups, go into the step properties, and add a line for @NumberOfFiles = 4, like this:

Ola Hallengren job configuration with @NumberOfFiles = 4 parameter

This does mean you need all 4 files at restore time, and it means your restore scripts will be a little bit more complicated. But who cares? You’re not writing those scripts by hand in the year 2024, are you? You’re using sp_DatabaseRestore from the First Responder Kit, like a boss, just pointing it at a folder and letting it grab the most recent full, diff, and log scripts:

Good job. Moving on – eagle-eyed folks will notice another config change in that screenshot. This next one’s going to be a little more controversial, but hear me out.

Next, set @Verify = ‘N’.

By default, Ola’s backup jobs have @Verify = ‘Y’, which means after the backup command finishes, SQL Server reads the whole backup file to check to make sure it’s okay. This can massively extend your backup job times.

I’m not saying you shouldn’t verify your backups! I’m saying don’t verify them from the production box. Instead:

  • Write your backups to a network share, like a UNC path
  • Use sp_DatabaseRestore to verify them from a separate server, like DR
  • Bonus points for using sp_DatabaseRestore’s @RunCheckDB = 1 parameter, which is way better than @Verify = ‘Y’

This has a few important performance & reliability benefits:

  • Your production server’s nightly jobs finish faster
  • You really test the hell out of the backup, making sure it’s free of corruption
  • You offload that resource-intensive check work to a secondary server, not production
  • Hell, you don’t even have to pay licensing for that, because as long as you’ve got Software Assurance, you can do CHECKDB on secondaries for free, regardless of what SQL Server version you’re using

Presto: those two free, simple changes might cut your backup times by 1/3 or more. You’re welcome!


Who’s Hiring in the Microsoft Data Platform Community? June 2024 Edition

Who’s Hiring
13 Comments

Is your company hiring for a database position as of June 2024? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


[Video] Office Hours: 16 Questions

Videos
0

Like Sixteen Candles, but different: I go through your top-voted questions from https://pollgab.com/room/brento. Strangely, my video and audio is ever so slightly out of sync in this episode.

  • 00:00 Start
  • 01:26 GuaroSql: Hey Brent! How are you? It is necessary to enable ADR in sql 2019 in order to row versioning work better? Or it is optional? I just realized it is turned off in my sql 2019 databases. Thanks!
  • 02:42 RoJo: If I’m upgrading a major version (e.g. 2016 to 2019) and build a brand new install. Can I simply Detach the old DB and attach to the new one? or is there file structure changes that Backup / Restore handle (or any other reason). Goal to save time on big DB.
  • 03:54 Nestrus: Why are views sometimes much slower than the same query executed directly in the stored procedure and what can be done to make them equally fast? The only difference I can see is, that the query directly in the SP gets parallelized and the view is not.
  • 04:58 MyTeaGotCold: Do you know any good mailing lists for peer reviewing large writings about SQL Server?
  • 05:52 Bonnie Tyler: Did you experience a total eclipse?
  • 06:00 Jökull: What are your thoughts on remoting into the production SQL Server to run SSMS? We have several power users that like to do this. What’s the best way to deal with this?
  • 07:09 CreditToMyTL: Recently, a table with wrong data caused the business to go down for a few hours. We planned to create a trigger to check who was behind these. My TL prefers creating dynamic triggers so that new tables automatically have audit logs. Is a dynamic trigger the best option here?
  • 08:13 Garreth: How do change the culture so that production dba role is less of reactive role and more of a proactive role?
  • 09:00 Jökull: Many users have permissions on the production boxed SQL Server to create SQL agent jobs. How should you track down who to contact when one of these jobs starts to fail?
  • 10:29 Ignacio: In the DBA field, is deep or wide knowledge more valuable as a job seeker?
  • 11:19 DemandingBrentsAttention: Knowing your opinion on linked servers (go fetch the data directly), what are some uses cases that you begrudgingly tolerate, or enthusiastically support?
  • 12:17 Miles: Hi Brent, Could you please give some advice (especially when it comes to introverts) on how to network with people so that, people can genuinely help in one’s career instead of having 1000’s of LinkedIn connects?
  • 13:03 Jökull: PostgreSQL now supports MySQL wire protocol. Is it inevitable PostgreSQL will do something similar for Oracle?
  • 13:52 Gabbpoll: To fabric or not to fabric. Is it worth exploring?
  • 14:34 PAul: I always put DECLAREs at the head of my code, because in VFP they were not evaluated in-line in the code flow, say in a loop or IF statement. Does SQL server do the same thing?
  • 15:23 Nostradamus: Does parameter sniffing ever happen with smaller tables or just larger tables?

Coming in Entity Framework 9: Better Query Parameterization

Development
22 Comments

Hallelujah. With current versions of Entity Framework, when developers add a mix of parameters and specific values to their query like this:

See how part of the filter is hard-coded (“.NET Blog”) while the other part of the filter is dynamically generated, an ID the user is looking for? That causes Entity Framework to generate a query that is partially parameterized:

This is the worst of both worlds for SQL Server. If the query was fully parameterized, it’d get plan reuse. If the query wasn’t parameterized at all, we could turn on Forced Parameterization and get plan reuse. However, Forced Parameterization won’t do anything for the above query because SQL Server looks at it and says, “That query’s already parameterized – see, it has a parameter for @__id_0 right there! I’ll just skip it.”

In Microsoft’s example above, ‘.NET Blog’ is a hard-coded string, but the situation is much worse when that is dynamically generated. For every variation of the parameter, SQL Server sees a “new” query coming in. End result: increased CPU to compile “new” query plans, unpredictable plans, plan cache bloat, problems with monitoring tools and Query Store, and more.

Good news! The What’s New in Entity Framework 9 rundown shows a new EF.Parameter method to force parameterization:

Which uses a parameter now instead of a hard-coded string:

Yay! Lower CPU for plan compilation, more reused plans, less memory consumed by redundant plans, and better monitoring tools.

It is a bummer that we have to wait until an estimated November 2024 for EF9 (according to that same What’s New doc), and that developers will have to touch code in order to fix it. I can’t really complain about that, though, because I’m just happy that Microsoft is adding it. EF’s query generation keeps gradually getting better, and that’s awesome.


This Is It! The Last 3 Days of My Anniversary Sale.

Company News
0

Every year in May, I look back at when I first registered BrentOzar.com way back in May 2001.

Things were so different back then. I struggled trying to find fun, interesting ways to learn about Microsoft SQL Server. My learning options were really dry books, droning videos sold on DVD, or – brace yourself – the documentation. There weren’t blogs or YouTube videos or open source community scripts on Github.

Today, you have a bewildering number of choices when it comes to learning SQL Server. You can – and should – exhaust all of your free options first, like Google, blog posts, webcasts, and YouTube videos. Take your career as far as you can for free first – the SQL Server community is amazing!

Then, when you’re ready to take your skills and career to the next level, I’m ready for you:

Fundamentals, Yearly
$195Save $200
Fundamentals + Mastering, Yearly
$995Save $300

The sale ends May 31. To get these deals, you have to check out online through our e-commerce site by clicking the buttons above. During the checkout process, you’ll be offered the choice of a credit card or buy now pay later.

Can we pay via bank transfer, check, or purchase order? Yes, but only for 10 or more seats for the same package, and payment must be received before the sale ends. Email us at Help@BrentOzar.com with the package you want to buy and the number of seats, and we can generate a quote to include with your check or wire. Make your check payable to Brent Ozar Unlimited and mail it to 9450 SW Gemini Drive, ECM #45779, Beaverton, OR 97008. Your payment must be received before we activate your training, and must be received before the sale ends. Payments received after the sale ends will not be honored. We do not accept POs as payment unless they are also accompanied with a check. For a W9 form: https://downloads.brentozar.com/w9.pdf

Can we send you a form to fill out? No, to keep costs low during these sales, we don’t do any manual paperwork. To get these awesome prices, you’ll need to check out through the site and use the automatically generated PDF invoice/receipt that gets sent to you via email about 15-30 minutes after your purchase finishes. If you absolutely need us to fill out paperwork or generate a quote, we’d be happy to do it at our regular (non-sale) prices – email us at Help@BrentOzar.com.


Updated First Responder Kit and Consultant Toolkit for May 2024

If you’ve hard-coded installer file names, there’s a big change in this release. There are now just 2 installer scripts: Install-All-Scripts.sql, and a new Install-Azure.sql, which only installs the scripts that are compatible with Azure SQL DB. The old Install-Core scripts are gone because we’ve deprecated sp_AllNightLog, sp_BlitzInMemoryOLTP, and sp_BlitzQueryStore. Read on for why.

How I Use the First Responder Kit
Wanna watch me use it? Take the class.

To get the new version:

Consultant Toolkit Changes

This app has been awesomely stable & useful for years, but this month we finally had to pop the hood open to make a few big changes:

  • Supports Microsoft Entra multi-factor authentication (MFA)
  • Automatically retries failed connections, allowing you to more easily gather data from Azure SQL DB Serverless databases that auto-paused
  • No longer requires create-table permissions, so it works better in environments where you can pull diagnostic data but not see database contents
  • Requires .NET Desktop Runtime 7 or higher

That last bullet point refers to the machine where you’re collecting data from, typically your jump VM or laptop. I really do hate minimum requirements, but that desktop runtime is a really fast & easy install that doesn’t require a reboot. It was necessary to get the MFA authentication working.

Just to be super-safe, I’d keep the prior version around so that if you hit a weird client issue, you can still gather data. If you run into any issues, shoot us an email at help@brentozar.com and we’ll work through it with you.

I also updated it to this month’s First Responder Kit, but no changes to querymanifest.json or the spreadsheet. If you’ve customized those, no changes are necessary this month: just copy your spreadsheet and querymanifest.json into the new release’s folder. However, in the Dec 2023 release, we we did update those files, so if you haven’t updated the Consultant Toolkit in a while and you’ve customized it, read that version’s release notes about updating.

sp_Blitz Changes

  • Enhancement: new checks for SQL Server service & Agent service accounts that have too much permissions. (#3481, thanks Vlad Drumea.)

sp_BlitzFirst Changes

  • Enhancement: add 2 option for @ExpertMode to skip sp_BlitzWho output. Could already do this with the @OutputResultSets option, but this is much easier to do during emergencies. (#3486)
  • Enhancement: better startup/failover time detection in Azure SQL DB. (#3504)
  • Enhancement: new warning about approaching max worker threads. (#3491)
  • Fix: under heavy load, the headline news wait stats lines would underreport the number of seconds that the sample took. (#3507)
  • Fix: when the Hekaton “Garbage Collection in Progress” warning fires, mention that it can also be caused by a stressed-out memory-optimized TempDB. (#3488)
  • Fix: don’t show the plan cache result set if it’s not asked for in @OutputResultSets. (#3492)
  • Fix: incorrect URL for slow data file reads. (#3483, thanks Matt Mollart.)

sp_BlitzIndex Changes

  • Enhancement: performance tuning by skipping queries we don’t need depending on the mode. (#3462, thanks Erik Darling.)
  • Enhancement: skip filtered indexes checks if they don’t have permissions to query sys.sql_expression_dependencies. (#3522)
  • Fix: restore SQL Server 2014 compatibility. (#3452, thanks Brianc-DBA and others for reporting.)
  • Fix: unreachable databases are now excluded from the total database count. (#3516, thanks Gary Hunt.)
  • Fix: using DB_NAME() in more places for consistency. (#3517, thanks Gary Hunt.)
  • Fix: fixed typos. (#3519, #3515, #3512, thanks Gary Hunt.)

sp_DatabaseRestore Changes

  • Enhancement: during installation, if they don’t already have Ola’s scripts, explain why they’re about to see warnings. (#3499)

Deprecating sp_BlitzInMemoryOLTP, sp_BlitzQueryStore, and sp_AllNightLog

sp_BlitzQueryStore was originally written by Erik Darling when he worked here. He’s moved on to start his own excellent company, plus his own sp_QuickieStore. You should be using that instead.

sp_BlitzInMemoryOLTP was always kinda distributed as a courtesy – the real home for it is in KTaranov’s Github repository, and you can still find it there. It hasn’t been updated in over 6 years, and I’ve never seen anyone using it, so I’m removing it to streamline support issues.

sp_AllNightLog was a ton of fun when we built it several years ago, but it’s consistently had a problem. Companies start using it, then decide they want to build something even more ambitious, typically a C# service with robust error handling and scheduling. sp_AllNightLog isn’t the kind of thing I want to encourage beginners to use – it’s complex.

So when Reece Goding started working on cleaning up sp_BlitzQueryStore’s code, I decided that now was the time to deprecate stuff we no longer use or recommend. You’re definitely welcome to continue to use ’em if you get value out of ’em! I’ve going to move these procs into the Deprecated folder, plus simplify the installation scripts. For the rest of 2024, the only installer script will be Install-All-Scripts.sql.

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. Be patient: it’s staffed by volunteers with day jobs. If it’s your first time in the community Slack, get started here.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.


[Video] Office Hours: I Feel Like an Amateur Edition

Videos
0

I started with audio problems, then had video problems halfway in. It’s amusing because we also discuss feeling like a SQL amateur!

Here’s what we covered:

  • 00:00 Start
  • 01:52 Does Time Really Exist: Hi Brent! What is the best way to ETL? web application, SSIS, linked server, …
  • 03:55 MyTeaGotCold: What is your vision for what SQL Server will be like in 2030? I feel like a fool for thinking it will involve In-Memory OLTP.
  • 05:49 mailbox: Hey Brent, What’s your take on switching to specializing in Open Source Database platforms( on-prem or cloud)? What would the career impact be? Seems like there are a lot of new jobs focusing on open source technologies, such as python, linux, postgreSQL,mariaDB, mongo, etc..
  • 07:03 DBANoob: Hi Brent, our company has been adopting Objective Key Results (OKR) and each team is tasked with creating new OKRs. Are you familiar with OKRs? Do you have any suggestions for a database administration team?
  • 10:38 mailbox: What’s the value of high reputation/achievement on dba.stackexchange.com or sqlservercentral.com? Have you ever seen that land someone a job or negotiate a raise?
  • 11:58 Joe Gleason: Hello Brent! Since you have been performing PolGab for many years now, do you perceive the quality of the SQL Server questions being asked going up or down? I scratch my head sometimes, but I have not been at this for a very long time.
  • 18:57 Carlini Tassio: Recently start tu learn about IA and saw about copilot being used in many microsoft tools, you think studio in the future will be launched with copilot and can help people work with data to use T-SQL?
  • 20:31 Chevy 409: What’s your favorite vintage American car?

The Future of DBAs: History Rhymes

Eitan Blumin recently wrote a post about the future of the DBA career in which he talked about the rise of jack-of-all-trades IT professionals. It’s a good post. You should read it. I did, and over the last 20 years, I’m pretty sure I’ve read a dozen variations of that post.

There’s a saying: “History doesn’t repeat itself, but it often rhymes.”

To give an example, let’s rewind the clock back to 2008. Business Intelligence was all the rage. SQL Server shipped with Analysis Services, Integration Services, and Reporting Services all inside the box. I remember hearing, “You have to learn all of these tools, or else you’re gonna be a dinosaur. Nobody’s gonna only know the engine anymore. You have to know everything else too, or you won’t get a job.”

So for a while, people were misled into believing they needed to know a little bit about everything. Unfortunately, you only have so many hours per month for learning, and a lot of people wasted a lot of time learning things – only to never actually use those tools.

It took a few years, but the ‘experts’ saying that stuff finally quieted down. The world made it clear that business intelligence and database administration are two different careers – I mean, really, they’re lots of different careers. They just happen to work with data, in the same way that gynecologists and cardiologists both work with the human body.

Today, those ‘experts’ are chanting a new song that just happens to rhyme with the old ones:

  • “You have to learn every database, or else you won’t be able to find a job working with just one!”
  • “You have to learn the cloud, or else you’ll never find a job working on-premises!”
  • “You have to learn to use AI as a tool, or else you won’t be able to get a job!”

Should you learn? Absolutely. But can you learn everything? Absolutely not. Given your limited time, you need a strategy for where your career is going. If you don’t have one, watch my 300-Level Guide to Career Internals.

You have a future. It can be as a database administrator, if that’s what you’re passionate about. It can also be as something else! But don’t let some talking head “expert” tell you what specific technologies you have to learn to stay relevant, because they can’t see the future any better than you can.


[Video] Office Hours: Answers for Your Database Questions

Videos
1 Comment

Y’all post questions and upvote the best ones at https://pollgab.com/room/brento, and I discuss ’em in my home office before tackling a client’s performance issues:

Here’s what we covered:

  • 00:00 Start
  • 00:45 Eve: When should you/should you not execute the SSMS recommended Missing Index in the execution plan?
  • 02:42 MyTeaGotCold: Logging sp_WhoIsActive and sp_BlitzFirst to a table has come up on your blog a few times. Do you still bother with it on versions that support Query Store?
  • 04:03 Frozt: Hi Brent, There is a upcoming DB we have that they configured where AG is not in automatic failover because of the dependecies of Reporting Services. Do they have other options to make AG highly available and reporting services as well?
  • 05:25 GrumpyOldDBA: Why does this variable @S get declared, even though IF evaluates as false:
  • 06:24 Luis C.: Hi Brent, Why sometimes when you use Statistics time, the CPU time is greater than the Total Exec Time?
  • 07:12 Mean Gene: Do physical reads ever matter when query tuning or just logical reads?
  • 08:57 bbk0919: Which performs better, a materialized view for a specific query? Or a non-clustered index with included columns for the same query?
  • 09:36 Luis C.: Hi Brent, Is it good or bad to put the same column in more than one index?
  • 10:04 Chris: Hey Brent, in a previous office hours you hinted at a potential Mastering Columnstore class, is this still on the horizon?
  • 10:24 Ignacio: What is your opinion of Azure SQL database watcher?
  • 11:58 Poul J: My friend asked me if it will make sence to delete autogenerated statistic on a regular basis. He is assuming that most of them are created due to adhoc queries, and he would like to save the ressources spend on updating them. I’m not sure what to respond… Any thoughts?
  • 12:40 Jökull: What is your opinion of Microsoft certifications relating to SQL Server? It seems like an opportunity to learn then forget.
  • 13:50 mailbox: Hey Brent, Do you think the avg DBA salary has decreased since 2005? I recall many DBA jobs being minimum $100,000, and now I’m seeing the average come down to like $78,000 in my area. Just curious what your thoughts are on this. Thanks!
  • 14:48 mailbox: Hey Brent, Given the news about Broadcom discontinuing free version of ESXi, what do you think the impact will be to the on-premise, SQL Server DB crowd? I’ve only ever supported SQL Server instances on vmware VMs or physical server.
  • 15:28 DadJokerDetroit: Recently, the brilliant Steve Sanderson did a great demo about Vector Search. Have you seen more demand for Semantic Search in SQL Server?
  • 15:43 Poul J: Is there a way to get information about how often the statistics of a table have been updated and how much time was spent on the updates?
  • 18:20 chandwich: Hey Brent! When do we get a tour of the office/home/garage?
  • 20:08 MustangKirby: I have hundreds of customer databases on several servers. Company policy says the dbs must be deleted if not being currently used. Other than taking all of them offline and seeing who yells, is there a way to determine the last time a db was accessed?

Getting Data Out of an Audit Table: Answers & Discussion

In last week’s post, I gave you a trigger that populated a history table with all changes to the Users.AboutMe column. It was your job to write a T-SQL query that turned these Users_Changes audit table rows:

Into the full before & after data for each change, like the earlier query from the blog post:

The trigger will always log the “before” values into the Users_Changes table, but the “after” value could be in a few different places:

  • If other changes have been made to that row, then the “after” value will be the next change for this Users.Id in the Users_Changes table
  • If no other changes have been made to that row, then the “after” value is the current state of the row in the Users table
  • If the row has since been DELETED, the “after” value isn’t anywhere!

To test your own code alongside mine, run these changes:

I didn’t explicitly call the DELETE scenario out in the challenge, and to be honest, I didn’t expect anyone to think of it when they were writing their answers. The post was all about catching mangled changes to the Users.AboutMe column, and the post’s trigger focused on updates of Users rows, not deletions. However, it’s still perfectly valid for someone to delete a row – and ideally, our query should be able to handle that situation. We didn’t get an answer that returned accurate data in that scenario, and I don’t blame y’all.

It’s an edge case – and it’s even harder of an edge case to handle than it might seem once you factor in NULLs. Let’s think about possible changes:

  • The UPDATE might have set the row’s contents to NULL, in which case we want the AboutMe_After to show NULL.
  • The UPDATE might have set the row’s contents to something, but then the row was subsequently deleted (and our trigger didn’t catch that) – in which case we want the AboutMe_After to reflect that we don’t know what the UPDATE did. NULL isn’t a valid answer here because it implies that the UPDATE set the AboutMe to NULL, which isn’t necessarily true. We want to show something like a “(Unknown, Row Has Been Deleted)” warning.

So that means we can’t just slap a COALESCE in there and call it a day. We’re gonna need some tricky CASE statements.

Here’s the query I came up with, and note that I’m including the different AboutMe_v2 and AboutMe_Current data just for diagnostic purposes so you can see how NULL handling is tricky:

Which, when auditing this series of changes:

Comes up with these results:

Notice how rows 2 & 3 both have nulls for the v2 and Current columns, but they have different results for AboutMe_After? That’s because the 2nd update statement actually did set AboutMe to NULL, and we do have a Users_Change row that knows about it (from Users_Change_Id = 3.)

In production, if I was faced with this problem, I’d probably want to change the trigger itself to log deleted rows as well as updated ones. However, I found this to be a really fun exercise because I was faced with this exact problem at a client – we were auditing past data whose changes had already been made, and I couldn’t build the historical data retroactively after the changes had been made.


Why Generative AI Scares the Hell Out of DBAs

I was chatting with a client’s DBA about this thought-provoking blog post about data governance in the age of generative AI. The DBA’s concern was, “What if we hook up generative AI tools to the production database, and someone asks for, say, recommended salary ranges for our next CEO based on the current CEO’s salary? Will the AI tool basically bypass our row-level security and give the querying person information that they shouldn’t be allowed to see?”

And she had a really good point.

If you haven’t worked with AI tools yet, I don’t blame you. You’ve got a real job, dear reader, unlike me who’s just a consultant who gets to travel all over the world while reading cool blog posts on planes. Anyhoo, since I get to play around a lot more, I’ll give you a quick recap:

  • Large language models (LLMs) like ChatGPT take plain English requests and turn them into results
  • Those results aren’t just text: they can include tables of results, scripts with T-SQL commands, or JSON files
  • The LLMs are trained on publicly available data
  • You can enhance their training by providing additional data, like the contents of your file shares and databases
  • There’s almost zero security on LLM requests and result sets

So the scenario that scares the hell out of me is:

  • A power user signs up for an LLM service or runs one on their computer
  • The power user wants better results, so they provide additional training data: they point the LLM at the company’s data set and load it with financial results, customer info, employee salaries, etc
  • The power user loves the report results, so they give other people access to the LLM

And the end result is that anyone who queries the LLM suddenly has access to everything that the power user had access to at the time of the LLM training. Anyone with access to the LLM might ask for a list of customers, employee salaries, or whatever.

Me, as pictured by Copilot

That would be what we call “bad.”

Sure, in a sense, this is the same problem data professionals have been struggling with for decades: people can export data, and once it’s out of the database, we don’t have control over who gets to see it. This isn’t new. It’s been the same story ever since Larry Tesler invented copy/paste.

But what’s new is that large language models:

  • Don’t make it clear where their data comes from
  • Aren’t easily reverse-engineered
  • Have damn near zero security on inputs and outputs

So now, the Amazon blog post explains why smart people are going to burn cycles reinventing row-level security. I’m not saying the blog post is bad – it’s not! It’s just a damn shame that the next 10-20 years of data governance are going to look exactly like the last 10-20 years of data governance.

Spoiler alert: we’ve sucked at this kind of security for decades, and we will in the next decade, too.


I’m Getting Antsy for SQL Server vNext.

SQL Server 2025
24 Comments

Historically, Microsoft publicly announces the next version of SQL Server about a year before it ships. For example:

This was pretty consistent with past releases: about a year before the release, Microsoft would go on record with the features included with the next version, and some companies would be able to start using it in production to prepare for the eventual release.

<checks calendar>

Well, it’s been about 2.5 years since SQL Server 2022’s announcement, so we’re looking at at least a 3-year release cycle. We might be coming up on the announcements for SQL Server 2025.

They could either announce the next version’s features at Microsoft Build on May 21-24 or at the PASS Summit on Nov 4-8. I’ll be watching the Build session catalog as Microsoft updates it because when Microsoft announces a new product, the related sessions are hidden until after the product is officially announced. It’s free to attend Build online, so if they announce 2025 during a keynote, I’ll tune in to the related general sessions.

I personally love a 3-year release cadence, and I don’t want new versions any more frequently than that. People have a hard time upgrading, and frankly, Microsoft has a hard time shipping that quickly. Just release every 3 years, and make sure the product is actually ready at the time of release. That’s good enough.

I’ve heard some folks say, “Maybe Microsoft is done with SQL Server because they want everyone to move to Azure.” Well, I’m sure they do, but they also acknowledge the reality that many companies host their infrastructure in Amazon and Google. Those clients need SQL Server licensing, because you can’t practically run Azure SQL DB in other clouds.

Microsoft also has to keep releasing new versions of SQL Server because they need to keep cashing your sweet, sweet Software Assurance checks. While SA does come with other benefits, like free licensing for a couple/few passive standby servers for HA and DR, those benefits aren’t priceless. If Microsoft didn’t ship a new version for, say, 5 years, I know some companies that would simply stop paying Software Assurance. Even if these companies don’t actually upgrade that often, they pay for the ability to know that they could upgrade – which means Microsoft has to keep bringing out versions & features.

What might Microsoft add in 2025? Clearly, AI is the buzzword du jour, and Microsoft’s Joe Sack has already demoed some of the Copilot things they’re working on in Azure SQL DB. Copilot integration requires internet connectivity, something that’s easier to integrate in cloud databases like Azure SQL DB. If you try to do large language model work on-premises, you’ll quickly learn that it’s extremely CPU and GPU intensive, and that doesn’t really make sense for a product like SQL Server that’s licensed by the CPU core. I’ll be curious to see how they package AI services to a crowd that sometimes screams in horror at the idea of giving their database servers access to the Internet.

Would you be okay giving your SQL Servers access to the Internet? And what AI-related features would you love to see included?


Who’s Changing the Table? Answers and Discussion

Your challenge for this week was to find out who keeps mangling the contents of the AboutMe column in the Stack Overflow database.

Conceptually, there are a lot of ways we can track when data changes: Change Tracking, Change Data Capture, temporal tables, auditing, and I’m sure I’m missing more. But for me, there are a couple of key concerns when we need to track specific changes in a high-throughput environment:

  • I need to capture the login, app name, and host name that made the change
  • I need to capture a subset of table columns, not all
  • I need to capture a subset of the before & after changes, not all

For example, in this week’s query challenge, the Users table has a lot of really hot columns that change constantly, like LastAccessDate, Reputation, DownVotes, UpVotes, and Views. I don’t want to log those changes at all, and I don’t want my logging to slow down the updates of those columns.

Furthermore, I probably don’t even want to capture the entire before or after values of the AboutMe column, either. It’s NVARCHAR(MAX), which can be huge. Depending on what “mangling” means, I might only need to grab the first 100 characters, or the length of the before/after changes. That’ll reduce how much I store, too, important in high-transaction environments that push the data to several Availability Group replicas.

Let’s say here’s the data I decided to gather:

You could refine that further by capturing less data if you knew exactly how the data was being mangled. For example, if the mangler always sets the rows to null or to short strings, you wouldn’t have to gather the before/after contents – just the length would be enough.

You could also get a little fancier by only storing the “before” data, but I kept it simple and just logged both before & after here for simplicity’s sake. (Spoiler alert: the next Query Exercise at the end of this post is going to be a related challenge.)

To populate this data, I love triggers. Here’s the one I came up with after a little testing:

The actual query plan on an update is nice and quick. A few things about the code that may not be intuitively obvious:

  • I didn’t use the UPDATE() function because it doesn’t handle multi-row changes accurately
  • The SET NOCOUNT ON stops the trigger from reporting back the number of rows affected by the insert, which can break apps that weren’t expecting to see multiple messages back about how many rows just got inserted/updated
  • The COALESCEs in the WHERE are to handle situations where someone sets the AboutMe to null (or changes it from null to populated)
  • The COALESCEs in the SELECT are to properly set the length columns to 0 when the AboutMe is null
  • I only fire the trigger on updates, not inserts or deletes, because the business request was about mangling existing AboutMe data

If you knew the exact kind of mangling that was happening, you could refine the WHERE clause even further, looking for specific data patterns.

Here’s what the output table contents look like after changing a few rows:

Nice and simple, and makes for really easy investigations. Just make sure to drop the trigger and the change table when you’re done! I had a really hearty laugh at one client when I returned a year later and they still had both in place.

Other Solutions

Tom aka Zikato aka StraightforwardSQL commented with a pointer to an awesome blog post he’s written on this very topic! He compared & contrasted a few different solutions, and then ended up with a hybrid solution involving a trigger, XE, and Query Store.

Connor O’Shea wrote an extended events session that filters query text based on its literal contents. I get a little nervous about that kind of thing, because they often miss weirdo situations (like synonyms and aliases), and they’re a little tricky to debug. For example, his first iteration also caught selects – something that would be terrible on a production system. It’s a good starting point though.

Erik Darling used temporal tables to capture the changes, and I especially love his security disclaimer. I don’t have any experience with login impersonation either – it’s funny, but my contract actually prohibits any security work whatsoever. I hate security work.

ChatGPT is helpful for tasks like this, even for a starting point if you don’t end up actually using the code. I asked ChatGPT to do this for me, and it came up with different results, but a good starting point nonetheless.

Your Next Query Challenge:
Getting Data Out of a Fancier Trigger

The above process works great if we’re willing to store both the before AND after values, but what if we’re dealing with a really high-throughput system with a ton of changes per second, and we want to avoid storing the “after” values each time? The table & triggers would look like this:

Now, when we want to get data out to see the before & after values, it gets a little trickier. Say we run 3 update statements, and then check the value of the change table:

The resulting query only shows the before data, but not the change that was actually affected during that update statement:

For example, notice that the last update above set the contents to Update Part 3, but that value doesn’t show in the screenshot. That’s your challenge: I want you to write a SELECT query that reproduces the full before & after data for each change, like the earlier query from the blog post:

Share your answer in the comments, and feel free to put your queries in a Github Gist, and include that link in your comments. After you’ve worked on it for a while, check out the answers discussion post.


SQL ConstantCare® Population Report: Spring 2024

The short story: SQL Server 2019 appears poised to swallow the SQL Server market altogether, hahaha.

The long story: ever wonder how fast people are adopting new versions of SQL Server, or what’s “normal” out there for SQL Server adoption rates? Let’s find out in the winter 2023 version of our SQL ConstantCare® population report.

Out of the thousands of monitored SQL Servers, SQL Server 2019 is now at 49% of the market! That’s the highest percentage we’ve seen for any version in the 3 years that we’ve been doing this analysis. Here’s how adoption is trending over time, with the most recent data at the right:

SQL Server 2019 still continues to grow while everything else shrinks, with the exception of 2022 treading water:

  • SQL Server 2022: 7%, up from 5% last quarter
  • SQL Server 2019: 49%, holding pretty steady
  • SQL Server 2017: 15%, holding steady
  • SQL Server 2016: 18%, down from 22%
  • SQL Server 2014: 6%, steady – and goes out of support in just 2 months!
  • SQL Server 2012 & prior: 4%
  • Azure SQL DB and Managed Instances: 1%
  • New metric this month: 4% of the SQL Servers here are Amazon RDS. They’re included in the SQL Server 2019/2017/2016 metrics above, since they’re technically part of those crowds.

How big are typical SQL Servers?

Let’s group the market together roughly into quarters:

For a long time in the SQL Server space, people have used the term VLDB to denote a very large database, and we’ve usually marked 1TB as the town borders of VLDBville. Today, given that about 1/4 of all SQL Servers host 1TB or more, and given how fast modern storage is able to back up 1TB databases, that 1TB threshold is less meaningful.

That large number of small servers means the CPU distribution is also fairly small:

As is the memory target distribution:

Although if you slice & dice by data size, then that changes the CPU & memory numbers differently. The number of CPU cores people typically use for >3TB SQL Servers is very different than the core count for, say, 50-250GB servers. And now if you’ll excuse me, I gotta do a whole bunch of slicing and dicing because I share that data with clients when we’re analyzing their SQL Servers during a SQL Critical Care®.


[Video] Office Hours: Ask Me Anything About Azure and Microsoft Databases

Videos
4 Comments

Back at home in the office, time to settle in with a nice caffeine-free Diet Coke and go through your top-voted questions from https://pollgab.com/room/brento. Why caffeine-free? Because I slug multiple coffees first thing in the morning when I wake up (usually around 3am-4am), and by the time I stream with y’all, I don’t need any more go juice.

Here’s what we covered:

  • 00:00 Start
  • 02:30 MyTeaGotCold: Has your opinion of Lock Pages in Memory changed over the past 10 years?
  • 03:48 MustangKirby: How can I check what data or pages are in cache? I woke up last night wondering if data I’m writing takes up cache memory.
  • 05:50 DBADoug: Why is SELECT INTO faster than INSERT INTO (of the same exact schema) with no indexes? When testing I notice about 9x more time on the “Table Insert” block in the plan) when the table is pre-defined.
  • 06:30 John R: Does SQL Server 2014+ have stored proc execution plans by user/spid? We saw 1 stored proc get slow for just 1 user (timing out after 30 sec, but 300ms for other users). Once stats were updated, stored proc was fast for that user again. I
  • 07:31 DemandingBrentsAttention: Does order of “include” index columns matter, like it does for key columns?
  • 08:40 Newtopg: Hi Brent , are there any tools You recommend for MSSQL to PostgreSQL migration ? Any open source tool that does both schema and data migration ? We are testing Pgloader and wanted your opinion on if there is any popular tool you recommend free or paid . Thank you
  • 09:36 GrumpyOldMan: I have a query that is causing a lot of blocking. SQL is recommending an index, which I agree should be used. Problem is, index already exists, exact index that SQL is recommending, but not being used. What’s up with this?
  • 10:35 bamdba: PowerShell for SQL Server? Love T-SQL & its effectiveness, but surprised by PowerShell’s popularity. Am I missing automation & scripting benefits?
  • 11:30 Sergio B: Can the tools sp_blitz and especially sp_BlitzWho capture the statements executed with sp_executesql?
  • 11:45 Chris Blain: I recently rebuilt indexes with compression. If I take a full backup, and logs, and restore to a brand new instance, will the restored database’s indexes have compression applied, or will I have to manually run the “alter index ” again on each one to apply data compression?
  • 12:35 Chris Blain: We have a database that is a single partition, some of the tables specify TEXTIMAGE_ON [PRIMARY]. this is stopping applying data_compression to the indexes. is it needed? if it was removed, could it cause issues for the underlying table, could I then apply index compression?
  • 13:13 SickieServer: Have you ever encountered databases (that maybe have been in service for decades) where the queries, while potentially fixable, are so numerous, and so bad, that the only practical solution is for the organisation to license more cores and buy more RAM?
  • 16:30 Gustav: What’s your opinion of the new regex support in Azure SQL DB? Will we see this flow down to canned SQL Server?
  • 18:33 I’m a potato ?: What’s different from Hong Kongs Databases to the rest of the world?
  • 23:06 Miles: Hi Brent,while tuning queries, which ones to be tuned first? is it high logical reads queries or high cpu queries or high duration queries? which one to be focused first?

Who’s Hiring in the Microsoft Data Platform Community? May 2024 Edition

Who’s Hiring
8 Comments

Is your company hiring for a database position as of May 2024? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here.

If your company is hiring, leave a comment. The rules:

  • Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
  • An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
  • Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
  • Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
  • Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
  • Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.


Query Exercise: Who’s Changing the Table?

Query Exercises
27 Comments

For this week’s Query Exercise, your challenge is to find out who keeps messing up the rows in the Users table.

Take any size version of the Stack Overflow database, and the Users table looks like this:

People are complaining that from time to time, the contents of the AboutMe column in some – but not all – of the Users rows are getting mangled. We’re not sure if it’s an app bug, an ETL problem, or someone goofing around in T-SQL, and we need you to find out.

At the same time, this is also a busy production database, and we want to minimize impacts to the end users. The site still needs to go fast.

For this exercise, I’m not expecting a specific “right” or “wrong” answer – instead, for this one, you’re probably going to have a good time sharing your answer in the comments, and comparing your answer to that of others. Feel free to put your queries in a Github Gist, and include that link in your comments. After you’ve worked on it for a while, check out the answers discussion post.


Join Me at the PASS Data Community Summit in November!

#SQLPass
4 Comments

On November 4 in Seattle, I’m presenting a new pre-conference workshop!

Tuning T-SQL for SQL Server 2019 and 2022

You’ve been working with SQL Server for a few years, and you’re comfortable writing queries and reading execution plans.

Your company is now using SQL Server 2019 or 2022 in production, and you want your queries to go as fast as possible. You want to know what changes to make to your existing queries in order to speed them up dramatically. You don’t want a “what’s new” session – you want practical information you can use to identify T-SQL that used to work fine in older versions, but now needs attention in newer versions.

In this 1-day session, Brent Ozar will use the same practical before-and-after techniques that he uses in his Query Challenges blog series in order to demonstrate what parts of your skills need to change as you modernize your databases.

You should be comfortable using SSMS to write multi-page queries, functions, and stored procedures. You should be comfortable identifying common query plan operators like index seeks & scans, key lookups, sorts, and parallelism, and comparing plans.

In this session, you’ll:

  • Learn what kinds T-SQL should be rewritten to aim for batch mode
  • Understand the effects of cardinality estimation feedback & parallelism feedback, and how to improve them
  • Discover new monitoring potential in query plan DMVs to troubleshoot bad plans and blocking

Attendees will get one year free access to my Fundamentals of Query Tuning and Mastering Query Tuning classes, both of which will be updated with content from the pre-con.

Register here, use coupon code BRENTO24 to save $150 on your full conference registration, and see you in Seattle!