Blog

Office Hours Speed Round: Text Edition

Last Updated July 13, 2022

Not all of the questions you post at https://pollgab.com/room/brento are hard. Some of ’em can be answered in one line:

Q: accidentalDBA: Hi Brent, My friend is moving from two physical servers (production + failover) to VMs on the same host and SAN. His manager wants to keep log shipping as a DR solution. I advised against this as most disasters would affect both VMs equally. Am I missing something?

If both your servers are in the same physical location, you don’t have disaster recovery. You just have disaster.

Q: Efraim: When refactoring a clustered index for a multi-terabyte table, is it recommended to take the db out of full recovery and put it into simple recovery mode first? Concerned about maxing out the transaction log during the refactoring.

Will the business allow you to lose point-in-time recovery while this operation is happening? If no, then no.

Q: Doug E: Is Azure Data Studio ready for SSMS users to make the switch? If not, what functionality needs to be added to Azure Data Studio?

Why do you need to make the switch? Your answer to that question will determine whether Azure Data Studio has what you need in order to switch.

Q: Pramod A: On one of the servers, BPE is enabled and the SQL max memory is set to 4x the physical RAM. Does this make sense ? Are there any best practices for SQL max memory configuration when BPE is enabled specifically ?

I haven’t seen a good use case for BPE. Read this. I’m sure there *is* one somewhere, I just haven’t seen it.

Q: SaveTonight: You always say that once we move to the cloud we cannot move back to on-prem. Azure doesn’t let you manage your backups, but there must be a way to go back to on-prem from Azure SQL Database. Azure/AWS cannot steal your data, that would be a law infringement, right?

They didn’t steal it. You gave it to them.

Q: AlwaysLearningDBA: Hello Brent, a friend of mine : ) has been asked to implement TDE on two node AlwaysOn with Replication. He is starting with Failover Cluster with Replication for now.He knows that you don’t work with Replication.Google didn’t help much.Do you recommend any resources about this?

You’re asking me for good learning resources for things that you know I don’t use? Okay, uh, sure, try this. That video is everything I know about 2-node AGs with TDE and replication.

Q: Jack McCoy: Which job provides more satisfaction / income, development DBA vs production DBA?

For me, development DBA because I’m not on call, and I can provide a high value in a short period of time. Production DBA work is more grueling, involves a lot of on-call and weekend time, and isn’t as financially rewarding because you’re seen as the cost of doing business to keep the plumbing flowing.

Q: Anthony DiNozzo: What percentage of work time should DBA’s commit to continuing technical education?

Learning the right stuff pays off in higher future earnings. How much more would you like to earn in the future, as opposed to today? Figure out what you want to earn, and then learn what it would take to earn that. If your goals are minor and incremental, your current employer might sponsor it. If your goals are major and involve leaving your current employer, well, you’re gonna have to figure it the %$&* out.

Q: Joe Friday: Is clustered index fragmentation any more / less detrimental to query performance than non clustered index fragmentation?

Watch this.

Q: Pramod A: A table has 80 statistics ( Index + system ) on table row count of 160 million. To improve the performance of the query, if I have to pick few stats specifically and update those stats with full scan, how would I know which stats to pick in order to expect a performance gain ?

The ones the query is using for its cardinality estimation decision. We cover how to do that in my Fundamentals of Query Tuning and Mastering Query Tuning classes.

Q: Omer: Did DBA Brent have a outlook rule / folder strategy for classifying / triaging the various SQL errors encountered on a daily basis during their career? If so, what strategy was used?

As soon as you start putting alert emails into a folder, you’re screwed. You’re not looking at those folders. Go change the alert thresholds so you only get actionable emails you’d want to have in your inbox.

Q: Ned B: Is there any harm / benefit in using SQL page compression if the SAN is also compressing the data?

Compressed pages can help you can store more in memory and get faster throughput to storage since less 8KB pages go across the wire.

Q: DBAInHiding: If you’ve been employed as a staff employee for various companies over the years but feel your next gig as a DBA or developer will require you to be a 1099 contractor, what’s the essential checklist to prepare for such a move and what gotchas should you be on the lookout for?

I don’t have an easy answer for that one. I’d pick up a book on how to start freelancing.

Q: Anul: What is the ideal cluster and sector size for new SAN drives hosting SQL data files and transaction log files?

Ask your storage vendor.

Q: Kurt Wagner: What are the monitoring hurdles / challenges when moving bare metal SQL from onprem to Azure SQL? Do we throw away our current SQL monitoring software?

Not all of the same monitoring data is available, and when it’s available, it has to be captured in different ways.

Q: Hello Brent, my friend need some help to understand apply, cross apply and outer apply operator. He understand joining table and function using those, but not joining 2 table using cross or outer apply. Probably he is missing some basic understanding.

Pick up this book.

Q: Andrew: Hi Brent, On daily basis we keep our data on a SAN storage, so does the SQL servers, not so long ago we made a SQL RAS, and i asked the Microsoft Premier Field Enginer about the virtual disk separation for the SQL data files. What do you think about it?

What’s the problem you’re trying to solve?

Q: Raphra: Hi Brent, I thought I knew how a recursive CTE works until I saw the execution plan of a recursive CTE (that was to return the last 100 calendar years) and I saw a Nested Loops (inner join). Can you explain what a Nested Loop operator is doing there?

Not without seeing the plan, but I’m not sure why you think nested loops are bad. They can make perfect sense for some operations.

Q: Hany: Hello Brent, what is the best Azure video learning resources out there?

Read this.

Q: Bob: Why is an Int primary key on a date dimension recommended instead of a date? All tables that link to the dim keep their native type and can use built in date functions. This can also avoid joining the Date dim for simple queries.

Ask whoever recommended that to you.

Q: Quartz: Why does CAST(‘2022-06-13’ AS DATETIME) fail for SET LANGUAGE british but work for SET LANGUAGE us_english?

I didn’t know British was a language.

Q: DBABA (database administrator by accident): What are good alternatives to dacpac deployments? (Not having to write manual migration scripts but defining the target state of the database)

Check out Redgate Schema Compare.

Q: I Inherited It: I have a server that has a 14TB Data Warehouse database on it. For a couple of days each month, it get’s hammered, often by very poorly written queries and performance suffers. It never recovers, even when the traffic backs back down, performance remains bad until reboot. Why?

Analyze its wait stats like we talk about in Mastering Server Tuning.

Q: YourbiggestFan: Hi Brent, If you are asked to trbshoot perf issues on az paas database, how would you go abt finding the main bottleneck. Would you use sp_blitzfirst(log to table) or use query store to get reliable wait stats info or via any other way? (Account for DB scale down during offhours)

I’d log wait stats to table.

Q: SteveB: In a recent office hours you said the best way to upgrade an AG was with a Distributed AG. Is there a reason you would not add the new servers to the current AG and just failover to them? We done this and it worked great but I might have just gotten lucky.

Yeah, whenever I have new servers in an AG, I prefer testing their failover processes, doing cluster validation, etc before go-live, and that’s really dangerous when you have an existing AG.

Q: Yabba DB Doo: We have a web app in Azure that has individual databases for each of our customers. What’s the best way to manage DB changes during development and then deploy them out to multiple production databases?

See above.

Q: MikeO: My vendor friend is using sp_prepexec. When passing the query with the recompile option the plan shows using an index seek. When passing the same query without the recompile option the plans shows using an index scan. Is this a parameter sniffing issue or a quirk with sp_prepexec

I can’t tell without seeing the queries and plans.

Q: mr_Arturo: hi Brent, a friend of mine (: recently changed the job and “he” (: noticed that the new company use lots of views, but even stranger is that some of views are request data from other views! What about performance and should it be avoided request the view data of the view data?

Read this.

Q: Ozymandias: Is there a DB size where native SQL backup / restore becomes a concern?

Generally by the time you hit a terabyte, full restore times are tough.

Q: mukesh chaurasia: Dear i am new in sql administrator so i want to know about which book is best to guide about sql administrator real time problem.

Read this.

Q: Daniyyel: Is it ok to run perfmon directly on the SQL server?

No, try this DMV instead.

Q: Konstantin: How does Jorriss combat oxidative stress?

He’s made of carbon fiber, not metal.

Q: Odafin: Any noteworthy columnstore improvements in SQL2022?

Read this.

Q: Jethro: Is the OPTION Label hint ever helpful?

I’ve never used it.

Q: Experto no en la materia: What is the largest number of active databases you have seen hosted by a single SQL instance?

10-12,000.

Q: Arslan: When troubleshooting a SQL agent job step failure, is it better to “Include step output in history” or to output job step results to an Output file?

Which one is easier for you to parse when it’s a big long output? There’s your answer.

Q: David: Who is the “Brent Ozar” of the SSIS world?

Andy Leonard.

Q: SqlPadawan: Hi Brent! A part from grabbing parameters and good/bad plans (as you teach in MPS), how have you used Query Store during your 2-day emergency rescue? Any other practical uses in terms of performance tuning?

Not during my 2-day SQL Critical Care, no, but if I had a full time DBA job again, I’d use it.

Q: Dev on the Dark Side: What are the top 10 (or even 5) things that you can “teach” a development team to do / not do in regards to a mix of stored procs, entity framework, & CF ad hoc queries. So many problems… so little time…

If your developers can only learn 5-10 things, they’re doomed.

Q: Kunoichi: Hi Brent! My question is related to CDC : Why does MS SQL Server allows ONLY TWO capture instances on any given table at a given time?

For “why”, ask Microsoft.

Q: Steve E: Hi Brent, All of our SQL Servers are currently on premise but I feel I should gain some awareness of cloud services such as Azure Managed Instances or Azure SQL Databases. Is there a way I can do this as a personal user? Most of the plans seem to be 00’s of (British) pounds p/m

Read articles or watch videos rather than running your own server.

Q: Eric Beaumont: My friend read a blog that write once file shares are good at stopping potential ransomware from infecting sql backups. Have you ever implemented at write once file share for sql backups? What all is involved?

The DBA shouldn’t be the one implementing it. The sysadmin team should be. This kind of thing is usable across all servers, not just the database servers.

Q: DBA by addiction: What kind of malware / virus protection can or should be installed on a SQL-Server? Can you give some advice if things around this topic are pretty much senseless?

Whatever your security team requires.

Q: Have you used SQL Server with docker? If so, please elaborate.

Yes, in an early attempt to do SQL Server development on my Mac. It was a huge pain in the rear. I hated it. I can see why people would use it for continuous integration, but it’s not for me, especially with the new Apple Silicon processors.

Q: Hi Brent! A CLR function uses a temporary table. I don’t want to use forced parameterization at the database level at this time. In this case, how to deal with the bloated plan cache?

I’m not a fan of using CLR in SQL Server for a whole bunch of reasons, but if you insist on using it, then I’ll assume you can solve the problems you run into.

Q: Emelio: Does page level compression affect backup performance or size?

Read this.

Q: Fyodor: Do you have a File Group strategy you recommend when creating new SQL databases?

If it’s going to be 1TB or larger within 30 days, create a new filegroup with 4 files spread across 4 volumes, and make that the default filegroup.

Q: Does Basically Anything: Hi Brent, I’m having issues with the T-log not freeing space after log backups and growing a lot. There are no runaway queries, but there are 200+ sleeping SPIDs with is_implicit_transaction = 1. Are they the likely culprit? What troubleshooting path do you recommend? It’s ISV.

Read this.

Q: Kinneret: What is your opinion of SQL 2022 Buffer Pool Parallel Scan feature? Who will this benefit the most?

I haven’t looked into it at all, but the Microsoft post on it implies that it focuses on people with a lot of memory, say 512GB and above.

Q: Jayden: What are top gotcha’s you see when using implicit transactions in TSQL and C#?

Read this.

Q: Biz: Is there a good way to view cache hit / cache miss ratios for when SQL has to load data from slow disk vs fast memory?

By going back in a time machine to 1999 when that was how we did performance tuning. Today, use wait stats instead.

Q: Bart: Have you consulted on Microsoft Dynamics AX (2012) database query/index optimizations?

Yes.

Q: Lincoln: What bad things can happen when data warehouse and OLTP mixed workloads are run against the same SQL 2019 enterprise instance?

Slow performance.

Q: RoJo: We have Dev, Stage, Prod environments. Sometimes the config (for example) points to the Dev from Stage or worse. To isolate them from each other, I’m thinking white listing inside Windows Firewall. Is this too slow, or any best practice to keep collateral damage from near servers

I don’t think I even understand this question. It’s probably too big of a scope to ask for Office Hours – try dba.stackexchange.com.

Q: Henrique Almeida: Hi Brent, how are you, I hope so. Which free SSMS add-ins do you recommend and if not, why?

You hope so what? What the hell does that even mean?

I don’t use any because I can’t usually install apps on the client machines I work with. I’m sure there are good ones out there, but I just don’t have experience on what they’d be.

Q: Raphra: I like notebooks in Azure Data Studios and I can see how we can add comments to stored proc, by replacing the stored proc’s text with a notebook file (with markdowns for comments). However, do you know how I can call this notebook from SQL Agent instead of the stored proc?

I adore notebooks too. I wouldn’t use them as scheduled jobs though.

Q: Maani: Hi Brent! how come select from a view executed by ‘sp_executesql’ is much slower than executing it as a query (it took 4 mins to be executed)?I wrote option(recompile) at the end of the select clause from view, and the request was executed in less than a second.(was no parameter)

I would need to see the query and the execution plans to answer that.

Q: Neil: i have a job that uses OLE Automation to pull some json stuff. can i enable and disable OLE automation at the beginning and end of the job to prevent anyone else from using it ?

No, it’s completely impossible to enable and disable OLE automation, as shown by this link.

This Is Your Last Chance to Attend My Mastering Classes Live This Year.

Last Updated July 9, 2022

Brent Ozar

Conferences and Classes

You’ve conquered my Fundamentals classes, and you’re ready to take it up a notch.

You know you can get the recordings of my Mastering classes, but you prefer the interaction of live classes – the ability to ask questions live, have me go off-topic for demos, and block out your calendar to make sure you conquer the material.

You have one last chance for 2022:

Mastering Index Tuning, $695: Aug 1-3 in Europe-friendly times, Aug 3-5 in Americas-friendly times
Mastering Query Tuning, $695: Aug 29-31 Europe, Aug 31-Sept 2 Americas
Mastering Parameter Sniffing, $695: Sept 26-28 Europe, Sept 28-30 Americas
Mastering Server Tuning, $695: Oct 24-26 Europe, Oct 26-28 Americas

Or attend the whole set for $1,995.

Once this round of live online classes finishes, I won’t be teaching classes until at least mid-2023. I totally love doing it – but after the pandemic marathon of teaching weeks of online classes every month for years on end, I’m ready to take a break for a while! So if you wanna hang out live, come on down. See you in class!

Here’s What I’m Teaching at the 2022 PASS Summit.

Last Updated July 7, 2022

Brent Ozar

#SQLPass

The lineup for the PASS Data Community Summit is out, and in addition to my pre-conference workshop on Mastering Parameter Sniffing, I’m also teaching two sessions:

Everything You Know About Parallelism Is Wrong. Just because you see parallelism icons on execution plans doesn’t mean the query went parallel. SQL Server doesn’t load-balance work across CPU cores. CXPACKET isn’t a problem you fix by changing MAXDOP. Hell, even the term “MAXDOP” doesn’t mean what people think it means. It’s not that SQL Server’s parallelism is bad, it’s just not what you think it is.

Fragmentation Explained in 10 Minutes. Short, fast-paced lightning talks require difficult choices for presenters. If things go even the slightest bit off the rails, you are completely screwed. You have to viciously edit down your material to get the right points across, and no more than that. Then, you have to rehearse, rehearse, rehearse to get the timing right. The last lightning talk I did at PASS, I was wearing a Bob Dylan costume, a wig, and a harmonica. This year, no costumes – just aiming for ambitious delivery.

I’m really excited because I still think of the PASS Summit as the best-of-the-best, the place where the sharpest presenters show their top work. I wanna bring my A-game to this premier event, and I’m so psyched to share it with you.

See you in Seattle!

[Video] Office Hours Speed Round

Last Updated July 7, 2022

Brent Ozar

Videos

In ten minutes, let’s cover as many questions as possible from https://pollgab.com/room/brento:

Here’s what we raced through:

00:00 Introductions
00:17 dbacat: We’ve been fighting an AG issue for a while. Sometimes during failover, one or a few databases go into initializing/recovery state on the new secondary. It’s random, never consistent. Only fix is to drop DB from AG and add back. MS support was not helpful. Have you seen this?
00:38 Gaspard: Is performing RESTORE WITH VERIFY_ONLY enough to test backup health? If not, what could go wrong between this and a full database restore?
01:22 RoJo: It used to be common to have a dedicated IO channel for App, Data, Logs. Now that SAN is common can everything go to the same channel or would it be better to keep separate? Would it ever be better to keep SAN as one channel and others on say local RAID?
01:57 MyFianchetto: LPIM. Is it a must have after setting MAX SERVER MEMORY?
02:50 Steve McGarrett: Do you have any sizing guidelines / tips when migrating an onprem SQL 2019 Enterprise instance to Azure SQL?
03:37 Dave: Hi Brent, we have an on-prem sql server used as a DWH, my company is starting to move some of the workload to AWS, what is your take on Amazon Aurora as an alternative (we can change the application to support it)?
04:34 Anatoli: How do you know if your SQL server physical RAM is going bad?
04:55 Mirza: Hey Brent, you had mentioned that PLE metric is useless. However, I had a talk with someone from Microsoft and that person says that PLE is a good way to measure memory pressure. How do I convince him to stop paying attention to PLE? Thanks in advance.
05:34 Gerardo: sp_BlitzFirst is showing several hours of CMemThread waits on our 64 core SQL 2019 Enterprise CU16 instance. Trace flag 8048 did not help. How do we use sp_blitzcache to find the top queries causing this wait?
07:00 Hiroto Katagiri: What is your opinion of Azure Synapse and Snowflake cloud data ware houses?
07:50 Danny Reagan: What are your pros and cons for using SQL tables to implement queuing?
08:25 Levi: Is SQL page level restore a practical solution for dealing with data corruption? Have you seen it successfully used?
09:17 chandwich: Will we ever get a, “Watch Brent upgrade SQL Server”?
09:59 Bruno Luis: We use PARSENAME as a sneaky way to split strings in 2017 (eg IP addresses). However, we can’t create an index with it because it’s non-deterministic? But… WHY? Please, WHY?

Who’s Hiring in the Database Community? July 2022 Edition

Last Updated August 1, 2022

Brent Ozar

Who’s Hiring

Is your company hiring for a database position as of July 2022? Do you wanna work with the kinds of people who read this blog? Let’s set up some rapid networking here. If your company is hiring, leave a comment.

The rules:

Your comment must include the job title, and either a link to the full job description, or the text of it. It doesn’t have to be a SQL Server DBA job, but it does have to be related to databases. (We get a pretty broad readership here – it can be any database.)
An email address to send resumes, or a link to the application process – if I were you, I’d put an email address because you may want to know that applicants are readers here, because they might be more qualified than the applicants you regularly get.
Please state the location and include REMOTE and/or VISA when that sort of candidate is welcome. When remote work is not an option, include ONSITE.
Please only post if you personally are part of the hiring company—no recruiting firms or job boards. Only one post per company. If it isn’t a household name, please explain what your company does.
Commenters: please don’t reply to job posts to complain about something. It’s off topic here.
Readers: please only email if you are personally interested in the job.

If your comment isn’t relevant or smells fishy, I’ll delete it. If you have questions about why your comment got deleted, or how to maximize the effectiveness of your comment, contact me.

Each month, I publish a new post in the Who’s Hiring category here so y’all can get the latest opportunities.

Columnstore Indexes are Finally Sorted in SQL Server 2022.

Last Updated July 12, 2022

Brent Ozar

Columnstore Indexes

There’s a widespread misconception that SQL Server’s columnstore indexes are like an index on every column.

I debunk that myth in the first 30 minutes of my Fundamentals of Columnstore class, where I explain that a better way to think of them is that your table is broken up into groups of rows (1M rows or less per group), and in each group, there’s an index on every column.

So essentially, every column has a whole bunch of indexes on it.

But there’s no order whatsoever as to which rows end up in which index.

This isn’t a problem for relatively small tables, but as you get to billion-row data warehouse fact tables where columnstore should really shine, performance gradually degrades. In data warehouses, fact tables often have a commonly filtered column, like SaleDate. However, until SQL Server 2022, even if you wanted a small SaleDate range, your query would likely check hundreds or thousands of row groups, each of which had a huge range of data.

Here’s the problem before SQL Server 2022.

Take the Users table from the Stack Overflow database – it’s a bad candidate for clustered columnstore for reasons that we discuss in the class, but it’s the table most of y’all are familiar with, so we’ll start there. We’ll create a new table and put a clustered columnstore index on it:

DROP TABLE IF EXISTS [dbo].[Users_columnstore];
GO
CREATE TABLE [dbo].[Users_columnstore](
	[Id] [int] IDENTITY(1,1),
	[AboutMe] [nvarchar](4000) NULL,
	[Age] [int] NULL,
	[CreationDate] [datetime] NOT NULL,
	[DisplayName] [nvarchar](40) NOT NULL,
	[DownVotes] [int] NOT NULL,
	[EmailHash] [nvarchar](40) NULL,
	[LastAccessDate] [datetime] NOT NULL,
	[Location] [nvarchar](100) NULL,
	[Reputation] [int] NOT NULL,
	[UpVotes] [int] NOT NULL,
	[Views] [int] NOT NULL,
	[WebsiteUrl] [nvarchar](200) NULL,
	[AccountId] [int] NULL)
GO
CREATE CLUSTERED COLUMNSTORE INDEX CCI 
	ON dbo.Users_Columnstore;
GO
SET IDENTITY_INSERT dbo.Users_columnstore ON;
GO
INSERT INTO dbo.Users_columnstore([Id], [AboutMe], [Age], [CreationDate], [DisplayName], [DownVotes], 
		[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId])
	SELECT [Id], LEFT([AboutMe], 4000), [Age], [CreationDate], [DisplayName], [DownVotes], 
		[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId]
	FROM dbo.Users;
GO
SET IDENTITY_INSERT dbo.Users_columnstore OFF;
GO

DROP TABLE IF EXISTS [dbo].[Users_columnstore];

CREATE TABLE [dbo].[Users_columnstore](

[Id] [int] IDENTITY(1,1),

[AboutMe] [nvarchar](4000) NULL,

[Age] [int] NULL,

[CreationDate] [datetime] NOT NULL,

[DisplayName] [nvarchar](40) NOT NULL,

[DownVotes] [int] NOT NULL,

[EmailHash] [nvarchar](40) NULL,

[LastAccessDate] [datetime] NOT NULL,

[Location] [nvarchar](100) NULL,

[Reputation] [int] NOT NULL,

[UpVotes] [int] NOT NULL,

[Views] [int] NOT NULL,

[WebsiteUrl] [nvarchar](200) NULL,

[AccountId] [int] NULL)

CREATE CLUSTERED COLUMNSTORE INDEX CCI

ON dbo.Users_Columnstore;

SET IDENTITY_INSERT dbo.Users_columnstore ON;

INSERT INTO dbo.Users_columnstore([Id], [AboutMe], [Age], [CreationDate], [DisplayName], [DownVotes],

[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId])

SELECT [Id], LEFT([AboutMe], 4000), [Age], [CreationDate], [DisplayName], [DownVotes],

[EmailHash], [LastAccessDate], [Location], [Reputation], [UpVotes], [Views], [WebsiteUrl], [AccountId]

FROM dbo.Users;

SET IDENTITY_INSERT dbo.Users_columnstore OFF;

After loading the data, visualize the columnstore contents with sp_BlitzIndex, and at first, it looks like the table is broken up into randomly sized rowgroups, sorted by their Id:

But that’s simply because our source, the Users table, happened to be sorted by Id. Over time, as you delete/update/insert rows in this table, you’re going to need to do index maintenance on it. (I know I talk a lot of smack about people who do too much index maintenance on rowstore tables, but columnstore tables are very different – they actually need maintenance for the reasons we discuss in the class.)

After a couple of rounds of index rebuilds, check the contents again:

ALTER TABLE dbo.Users_columnstore REBUILD;
ALTER TABLE dbo.Users_columnstore REBUILD;
sp_BlitzIndex @TableName = 'Users_columnstore';

ALTER TABLE dbo.Users_columnstore REBUILD;

sp_BlitzIndex @TableName = 'Users_columnstore';

And the rowgroups are literally sorted in random order:

Meaning that if you want to find Id = 26837, you’re going to be checking multiple rowgroups – and more and more over time, as your data’s locations continue to randomize. This isn’t such a big problem for a tiny table like the 1GB Users table – but it’s a terrible problem for real-life sized tables, like the ones that really need columnstore.

You could work around this –
emphasis on work.

You could reload the entire table regularly. It sounds extreme, but if the table isn’t large, and your hardware is fast enough to make a copy of the table sorted in the order where you want rowgroup elimination, it works. Copy the data into a new structure with a clustered index on that column, then create a clustered columnstore index again. The data will be ordered by the rowstore clustered index columns.

That solution doesn’t scale well for the kinds of shops that really need columnstore, though, because we’re talking about blowing the table back up to its full uncompressed size, and then compressing it all back down again – and all of this is a logged operation.

A better option is to partition the data by your commonly filtered column. I like this solution a lot at the billion-row tier because it also enables much better index maintenance strategies. I frequently point out that daily rowstore index maintenance is usually a waste of resources, even suggesting to scale back to monthly, but with columnstore indexes, things get a lot more complicated depending on your workload. We talk about that in class, too.

In theory, I’m a fan of partitioned views, too – using a table per year or per quarter – and then unioning them together. I just haven’t seen that solution implemented in the last several years though. I’m sure it’s out there – I just haven’t seen it lately.

SQL Server 2022 fixes this
with ordered columnstore indexes.

Hey, go figure, indexes need to be put in order! Who knew? I mean, aside from all of us. Literally, every one of us. Here’s the syntax from Books Online:

CREATE CLUSTERED COLUMNSTORE INDEX CCI
ON dbo.Users_columnstore ORDER (Id)
WITH (DROP_EXISTING = ON);

CREATE CLUSTERED COLUMNSTORE INDEX CCI

ON dbo.Users_columnstore ORDER (Id)

WITH (DROP_EXISTING = ON);

Rejoice! The data is now sorted by Id, so if you’re looking for a particular Id, SQL Server can narrow its search down to just one rowgroup:

Okay, I lied. It’s still not aligned. In fact, if anything, it’s even worse. You might argue that it’s because I didn’t specify a single-threaded index creation, but…those ranges overlap across lots of rowgroups, not just the 4 to represent the 4 cores in my VM.

When I try it again with MAXDOP 1:

That doesn’t work either, and even if it did, MAXDOP 1 isn’t really doable in real-world table sizes – even the 1GB Users table took over a minute to do the above work.

In theory, when the feature finally works, it’d be useful if many of your reporting queries share a common filter – like in data warehouses, a SaleDate column. Again, definitely doesn’t make sense for the Users table – it’s not a good fit for columnstore at all – but it does fix the problem of the data being randomly ordered between rowgroups.

When examining your own data and reporting queries to figure out which column(s) to order by, check out the guidance for Synapse Analytics, which already has this feature. And I assume it actually works out there. Although you know what they say about assume…

Update 1, 2022-07-05: the command actually executes in CTP 2.0 when you use the right syntax, as pointed out by Adam Machanic in the comments.

Update 2, 2022-07-05: no, it doesn’t work.

Based on suggestions by Joe Obbish in the comments, I spent hours today trying repeated testing with columnstore indexes on tables up to 100GB on a VM with 30GB RAM. I even tried with MAXDOP 1, and still no dice – they’re not sorted:

Note the overlap in CreationDate ranges. I give up – until Microsoft has a demo showing this feature actually works, I’m going to hold off on further testing.

Update 3, 2022-07-11: it kinda works.

Ryan Stonecipher (Microsoft) reached out to me and we talked through it.

The data is kinda sorted, and they’re referring to it as a “soft sort.” The sort order is maintained in-memory as the index is being built, but if the sort runs out of memory, the currently sorted data is flushed to the next operator in the index build plan. Joe Obbish reverse engineered this in this well-written blog post.

The benefit of this design decision is that it avoids spilling to TempDB – that’s good.

The drawback is that the rowgroups aren’t perfectly sorted. There are going to be overlaps between rowgroups. The idea is just that there’s going to be way less overlaps than there would be with a completely unsorted set of rowgroups.

I’m totally fine with this. At the end of the day, it’s way better than the ALTER INDEX REBUILD behavior of columnstore, except for the fact that it’s offline only. The docs just need to better reflect that “order” is a best effort thing, not the kind of exact stuff that we usually expect from databases. I kinda jokingly think of it as a MongoDB simulator.

[Video] Office Hours: Professional Development Questions

Last Updated June 29, 2022

Brent Ozar

Videos

Most of the questions y’all post at https://pollgab.com/room/brento are technical ones, but there were a handful of interesting professional development and career ones in the queue, so I cherry-picked those for today’s episode:

00:00 Introductions
00:20 GI Joe DBA: Have you ever refused a consulting project \ opportunity and why?
03:35 Doug E: How do you prefer building up your SQL vm’s in AWS? Terraform, docker, chef, etc?
04:46 Hany: Hello Brent, Who is “Brent Ozar” in the Azure world?
07:00 GI Joe DBA: What do you do when you’ve inherited a project and the business logic and institutional knowledge is overwhelming and it makes your “eyes glaze over”?
08:35 DBA_preparing_for_jobsearch: what is your opinion on interviewers asking SQL Server internals questions instead of practical day-to-day activity based questions?
10:26 Trushit: How would you handle clients/managers who want everything “ASAP”, especially their “ASAP” expectation is weeks and somethings months apart from my “ASAP”? Usually they come from business background and think that if something is easy to use, it must be easy to build.

Is Remote Work the New Normal for DBAs?

Last Updated June 28, 2022

Brent Ozar

Professional Development

An intriguing Office Hours question came in through my PollGab room.

Anatoli asked, “Is remote work the new normal for DBAs? What are the pros / cons?”

I happened to be teaching an online training class yesterday, so I asked my attendees where they were attending the class from.

The vast majority – 27 out of 30 students – were attending the class from home. Now, naturally the audience represents a biased sample, because not everyone’s employer is generous enough to pay for my training classes. It’s possible that this audience represents a luckier group than average.

Just out of curiosity – not for any scientific purpose – I also posted a poll on Twitter that specifically called for folks who are DBAs:

If your job title is Database Administrator, tell me where you work most of the time in this poll:

— Brent Ozar (@BrentO) June 28, 2022

I’ve been working remotely for almost two decades, so I don’t want to answer the pros & cons part – but I’ll leave it to you, dear reader, in the comments. What have been the pros and cons of doing DBA work from home?

Now is the Right Time for the 2022 Brent Ozar Unlimited Scholarship Applications.

Last Updated June 27, 2022

Brent Ozar

Company News

We normally do this in the fall, but given last week’s news, I bet you can understand why we’re doing this a little early, dear reader.

You work at a charity or non-profit, helping them make a difference with data, fights for the rights of the underrepresented, or cares for them.

Maybe you write reports to help fundraisers do a better job of raising money to protect voting rights. Or maybe you’re a developer at an organization who prevents suicide amongst lesbian, gay, bisexual, transgender, queer, and questioning youth. Or maybe you’re a sysadmin at an organization that provides reproductive health care, but you can’t get training because your organization desperately needs to devote all their financial resources to protecting women right now.

That’s where I come in. I wanna help.

I want to empower you to continue making a difference. My scholarship program is simple: recipients get a Level 2 Bundle, which includes all of my recorded training, SQL ConstantCare®, and the Consultant Toolkit.

To give you an idea of the kinds of organizations I’ve supported over the years:

The American Institute of Physics is committed to the preservation of physics for future generations, the success of physics students both in the classroom and professionally, and the promotion of a more scientifically literate society.
UNOPS helps people build better lives and countries achieve peace and sustainable development.
International Justice Mission – a global organization that protects the poor from violence in the developing world.
Elizabeth Glaser Pediatric AIDS Foundation – 400 children are infected with HIV every day. I don’t know how to type those words without crying and taking a break from the keyboard.
Mencap – improving the lives of UK people with a learning disability.
The Smith Family is an Australian charity helping disadvantaged children get the most out of their education so they can create better futures for themselves.
Easter Seals-Goodwill Northern Rocky Mountain serves children and adults with autism and other disabilities, plus disadvantaged families in Idaho, Montana, Utah, and Wyoming.

The fine print:

You must work for a foundation, non-profit, charity, or similar company that’s doing good work. (If you work for Ginormous Profitable Global Corporation, you’re not going to make the cut, so don’t waste my time or yours.)
Your organization or government rules must allow you to receive free training. (Some companies prohibit their employees from accepting gifts.)
You must already have a job working with SQL Server. (This isn’t about getting a new job.)

Apply now. Applications close July 4th, aka Independence Day in the United States. I’ll just leave that there. In the words of Theodore Roosevelt, do what you can, with what you have, where you are.

I’m Coming Back to Israel for the Data.TLV Summit!

Last Updated June 26, 2022

Brent Ozar

Conferences and Classes

The Data.TLV Summit on September 15th is a big free conference on data engineering, business intelligence, data analysis, and … beer.

I’m excited to announce that I’ll be back in Tel Aviv again for it, and this time I’m teaching a pre-conference workshop on the cloud.

Running SQL Server in AWS & Azure

You’re used to managing your own SQL Servers in VMs, but now they’re asking you to work with the cloud.

You’re already comfortable installing, backing up, and performance tuning “regular” SQL Servers, and you just need to catch up with what’s different about the cloud.

You don’t want hype – you just want honest opinions from someone who doesn’t work for a cloud vendor.

In this one-day in-person workshop before the Data.TLV Summit, you’ll learn:

How to assess your current workloads
How to pick the right VM type for your workload
How to size storage for performance, not size
How to assess wait stats in cloud VMs
How to back up directly to cloud-native file systems
Where Azure Managed Instances & AWS RDS make sense

I’m Brent Ozar, and I’ve been managing client VMs in the cloud for over a decade. I’ll get you up to speed with unbiased facts, not hype, and save you valuable time as your career transitions to the next level.

There are only 100 seats available – register now, and then grab your seat at the free Data.TLV Summit too. See you in Tel Aviv!

I Wish SQL Server Warned About Hard-Coded Estimates.

Last Updated June 23, 2022

Brent Ozar

Execution Plans

I wish we got a yellow bang on execution plans when SQL Server was making up an estimate out of thin air.

I’ll give you an example: if you compare two columns on the same table, looking to find rows where they’re equal, SQL Server has a hard-coded estimate that 10% of the rows will match.

I’ll query the Stack Overflow database’s Users table to find people whose DisplayName is the same as their Location. To give SQL Server a fighting chance at estimation, I’ll create indexes on both columns, in both orders, just to preemptively head off the folks in the comments section who will suggest it’s a stats problem:

CREATE INDEX DisplayName_Location
	ON dbo.Users(DisplayName, Location);
CREATE INDEX Location_DisplayName
	ON dbo.Users(Location, DisplayName);
GO
SELECT COUNT(*)
FROM dbo.Users
WHERE DisplayName = Location;

CREATE INDEX DisplayName_Location

ON dbo.Users(DisplayName, Location);

CREATE INDEX Location_DisplayName

ON dbo.Users(Location, DisplayName);

SELECT COUNT(*)

FROM dbo.Users

WHERE DisplayName = Location;

Look at the far right operator of the actual execution plan and compare estimated versus actual rows:

SQL Server brought back 243 rows of an estimated 891,751:

That 891,751 sounds awfully scientific, doesn’t it? Like SQL Server put some serious thought into it? Well, that idealistic dream is shattered when you discover that the full table size just so happens to be 8,917,507 rows:

SQL Server’s estimating that exactly 10% of the rows will have a matching DisplayName and Location. That’s a ridiculous, completely made-up number that has no bearing on reality.

This is hard to spot
even in simple queries.

Let’s say you wanted to find those users with matching DisplayNames & Locations, and find the top 100 reputations amongst them:

SET STATISTICS IO ON;

SELECT TOP 100 u.Reputation, u.DisplayName, u.Location, u.Id
FROM dbo.Users u
WHERE u.DisplayName = u.Location
ORDER BY u.Reputation DESC;

SET STATISTICS IO ON;

SELECT TOP 100 u.Reputation, u.DisplayName, u.Location, u.Id

FROM dbo.Users u

WHERE u.DisplayName = u.Location

ORDER BY u.Reputation DESC;

Let’s also say your Users table has these three indexes:

CREATE INDEX DisplayName_Location
	ON dbo.Users(DisplayName, Location);
CREATE INDEX Location_DisplayName
	ON dbo.Users(Location, DisplayName);
CREATE INDEX Reputation
	ON dbo.Users(Reputation);

CREATE INDEX DisplayName_Location

ON dbo.Users(DisplayName, Location);

CREATE INDEX Location_DisplayName

ON dbo.Users(Location, DisplayName);

CREATE INDEX Reputation

ON dbo.Users(Reputation);

Note that none of those three indexes cover the query – because the query needs both DisplayName & Location, AND Reputation.

One approach SQL Server could use would be to scan the clustered index, like this:

That approach does 141,970 logical reads:

Or, because so few users match, SQL Server could scan the DisplayName_Location index, make a list of the 243 users with matching values, and then only do key lookups for those 243 users, like this:

That version of the execution plan does just 49,633 logical reads.

But SQL Server doesn’t do either of those. Let’s take the index hints off and watch the smoke rise:

SQL Server thinks, “Hey, about 10% of the rows have matching DisplayNames and Locations! I’ll use the Reputation index, and I’ll scan it from top to bottom. For each high-ranking User, I’ll go do a key lookup, and I won’t have to look at too many rows before I find 100 that have matching DisplayNames and Locations! Hell, I don’t even have to allocate multiple CPU cores – I can do all this single-threaded because it’s so easy.”

Of course, all of that is wrong, and it leads to:

That’s 9 million reads on an object that only has 140K pages.

So what went wrong? Good luck finding out if you don’t know about the secret 10% estimate. There’s no yellow bang anywhere on this plan, but worse, there’s no indication that the whole plan is based on that wild guess. On a plan like this, most folks start their performance tuning journey by examining the Reputation index trying to figure out why the stats on there are wrong – when in reality, the issue’s something completely different.

And in complex real-world queries with multiple joins? Forget it. This kind of thing is completely stealthy in execution plans.

I wish SQL Server would warn folks about that.

But I know wishes don’t get us anywhere, so I’ll be a good blogger and I’ll finish this blog post with a call to action to upvote a feature request at feedback.azure.com. I’ll just go log in and…

Oh God, my mortal enemy. Let’s try one, and:

Uh, yes, I know it’s taken – it’s taken by me, and I’m trying to log in, and I have the same problem with the other one, too.

It’s almost as if Microsoft doesn’t want to hear feedback from their customers.

[Video] Office Hours: Ask Me Anything About SQL Server

Last Updated June 21, 2022

Brent Ozar

Videos

You posted questions at https://pollgab.com/room/brento, upvoted the ones you’d like to see me cover, and I did the needful:

Here’s what we covered:

00:00 Introductions
01:19 chandwich: Hi Brent. I’ve been unsuccessful in my attempt to convince my employer I need formal SQL training. I help manage hundreds of customers’ SQL Servers and Applications, but they refuse to pay for my training. Should I leave? Should I just pay for it myself? Any advice?
03:45 Mehdi: Hi Brent! Do you recommend using parallel hint?
04:16 DBAInHiding: You’ve mentioned often that there’s usually only one person in the DBA role for a given company (larger corporations may have teams instead). If you’re a solo DBA, how can you take a restful vacation or any time off when you need it most?
06:07: DBA_Willing_to_learn: Hi Brent, Hope you are doing well. I would be interested to know about one feature in SQL server 2022 which would interest you the most and why ?. Thank you in advance for answering my question.
07:16 Neil: Hi Brent. We’re a growing SASS provider. We have no dedicated DBA resource, but recognize this is something that is required. Any recommendations about what kind of things we should be looking for?
08:25 Sean C: Do you have any “petty” or “ridiculous” hills that you will die on regarding anything in SQL or SQLServer?
11:54 Mark E: When you were a DBA, what were your top distractors and time wasters?
12:42 Jim: We have a stored procedure running on SqlAzure that brings back a list of hotels for a given city. The first time it executes against a particular city it is very slow 14-20 seconds. Subsequent runs querying the same city are fast 2-5 seconds. Having trouble figuring out why.
14:08 Carrie Jeffries: Discontinued database engine functionality in SQL Server Standard going from 2014 – 2019?
15:21 Juan: Do you have any common gotcha’s to avoid when adding a computed column to a table?
16:46 Yitzhak: Do you have a suggested way to see how much space a long running active transaction is currently contributing to the transaction log? Worried about maxing out transaction log.
20:44 seems_like_avi: What are the downsides of using TVFs from a performance perspective? It seems like they can reduce code duplication and make a code base much easier to maintain and expand. When would you recommend using TVFs vs not using them?
21:54 Lenny: How do you decide whether to start with query tuning or index tuning on SQL Server with occasional performance issues?
23:23 MikeNM: Regarding your post about SCOM the other day. Can you recommend any decent guides to setting up SCOM for SQL? I have been both SCOM admin and now a DBA and trying to convince my peers that moving away from SQL e-mail alerts isn’t more trouble than it’s worth.
24:34 Maksim: What are the pros / cons of doing encryption using SQL TDE vs using SAN level encryption?
26:30 SQLPadawan: Hi Brent, I just read your “Data Model for Gender” post…(Sorry if this is the wrong place to bring this up) You still have a follower around here. I just will continue to read your blog and go thru your training. You’re the best SQL Server teacher and career model 4 all of us.
27:05 RoJo: Want to upgrade from sql2016 with Sync-AG to latest version. What is best method to upgrade, for least downtime and stability (and AG issues) ? Any gotchas?
28:15 IWantToBeYouWhenIGrowUp: I’m interested in switching to a MacBookPro as my primary development/work laptop but have some reservations with a major platform shift. How is the Mac laptop performing? Are you using the M1 version or Intel? What VM manager/container are you using to host locally?
30:05 Lenny: Who is the Brent Ozar for all things related to SQL Server CI/CD? 30:036 Anatoli: What are the tell tale signs that your business has outgrown log shipping? Is AG the next step up for SQL HADR?
31:39 Zeratul: Is it therefore wise to boost CTFP from the default value of 5 to something like 50?
32:47 missing the updates thoughts: hi brent! my friend was wondering why you don’t post blog-style posts on sqlserverupdates.com anymore with every new CU like you used to. miss reading your thoughts every CU!
33:55 Jimin: Do you see any new SQL 2022 box checking features that Microsoft included for competition reasons?
35:38 Gomer: Would DBA Brent ever leave a higher paying job for a lower paying job?

Pour One Out for Distributed Replay. It’s Deprecated in SQL Server 2022.

Last Updated October 19, 2022

Brent Ozar

SQL Server 2022

I love how Microsoft treats deprecated and discontinued features in SQL Server.

No, seriously. I give Microsoft a lot of sarcasm and lip around here, but Microsoft takes serious care to make it easy to upgrade versions without worrying about your application breaking. If you disagree, hear me out for a minute.

“Discontinued” means it’s dead and removed.

The Books Online page for discontinued database engine functionality is pretty doggone short. There have only been a few things deprecated in the last several years:

Big Data Clusters
PolyBase scale-out groups
A few database-scoped configuration options
And a few others you’re not gonna miss, like the 32-bit version of SQL Server

This is fantastic. If you’ve written a new app in the last 10 years and it used SQL Server as a back end, odds are every query you wrote is still going to compile and execute today. Contrast that with the nightmare that developers have to deal with around .NET and .NET Core, and the difference is night and day. SQL Server queries just work, and they’ve worked the same for decades, with almost all additive changes only.

Deprecated means alive,
but it’s walking dead.

The Books Online page for SQL Server 2022’s deprecated features is tomorrow’s obituaries. These are features Microsoft has publicly walked away from, and while the features are still in the product, they’re not getting any love, and they might be removed at any time.

For SQL Server 2022, Microsoft deprecated Distributed Replay.

The idea behind the feature was that you’d capture a trace against your production environment, set up another environment for load testing or QA testing, and then replay that exact same workload against it. You’d be able to measure which queries got better or worse, and how.

The reality was a complete mess. It was a giant pain in the rear to set up and use, to the point where I got frustrated with it within a few hours and asked my peers about their experiences with it. I got back a string of four-letter words – everybody really struggled to get it across the finish line. Over subsequent versions, Microsoft made token efforts to improve it, but never really gave it the love it required.

The writing was on the wall when Distributed Replay didn’t support collecting a trace from SQL Server 2019.

Now, the writing’s getting carved into stone. My guess is that even in SQL Server 2022, Microsoft still won’t support gathering data from SQL Server 2019, which means this thing’s already dead. Sure, you can technically use it with SQL Server 2017, but…

So what do you use instead?

Me personally, I don’t believe that capturing a production workload trace has ever been a good long-term idea. I’ve written about the problems with database load testing before, and my opinion still stands: it’s a really, really hard problem, much harder than it looks at first glance.

If you do wanna try it, check out Gianluca Sartori’s open source WorkloadTools.

Otherwise, you need application-level load testing instead. Something needs to call the app’s APIs, generate the relevant workload, and get the app to send in the appropriate queries. Plus, then you’re testing the entire stack – not just the database server.

New Buying Option: Lifetime Access to Class Recordings

Last Updated June 21, 2022

Brent Ozar

Company News

Wanna buy my Recorded Class Season Pass: Fundamentals or Mastering, but you don’t want an annual subscription?

Now you can buy either of those bundles, pay one price, and get permanent access to the classes without worrying about recurring fees.

The price is the same as if you’d have had the subscription for 2 years.

So you can kinda think of it as a gamble: if you can finish the classes in 1 year, you’re best off buying a subscription, but then canceling before the year is over.

Or, if you’re busy, or if you think you want to revisit the content over and over, you can buy the Lifetime Access version without worrying about when your content will expire.

I will say this: every month, I get emails from students who beg for just one more month of access because they postponed all year long, and then realized their subscription was almost up. I know how it goes: work is a zoo!

Update: I’ve gotten a lot of questions about buying the Level 2 Bundle for a lifetime. That isn’t an option at this time – that bundle includes software like SQL ConstantCare and the Consultant Toolkit. That software & online service takes an ongoing investment to maintain, so we don’t do lifetime pricing on that – just the class recordings.

[Video] Office Hours: Palm Springs, California Edition

Last Updated June 1, 2022

Brent Ozar

Videos

On a road trip from San Diego to Vegas, I stopped in Palm Springs and answered your highly-upvoted questions from https://pollgab.com/room/brento.

Here’s what we covered:

00:00 Introductions
00:45 i_use_lowercase_for_select: Hi Brent, just wanted to give you my support on your May 24th blog.
01:19 SQL Crooner: Who would win a SQL trivia contest between Brent Ozar, Pinal Dave, Paul Randal, and Erik Darling?
03:13 Brandon F: Does SQL 2022 have any “killer” performance features that make it a must have upgrade from SQL 2019?
05:40 YouGottaDoWhatYouGottaDo: Hi Brent, do you know (or do you have any idea) if MS use telemetry to do statistics on what SQL Server features are the most/less used and appreciated by users, in order to model or deprecate features in new versions?
06:49 Too Much Spare Time: How much time if any did you/do you spend running sp_helptext on system procs/views/functions, etc (or is it just me that likes to nose about in these…?) and wondering about the souls who wrote it all? Most interesting thing you’ve seen in a system object?
08:11 Jacob H: Hi Brent, you mentioned previously that you recommend turning off adaptive memory grants and scalar function in-lining in SQL Server 2019. Can you elaborate more on the problems you have found with those features?
09:29 Have you tried turning it off and on: We want to implement RCSI on a (10k batches/sec / 4.5TB / 7K session) server to help reduce blocking. I understand the additional tempdb requirements, however what QA do you suggest to make sure the output to queries wont be affected? Any books/videos to consult?
11:34 Keller: I have a query with inner join and 2 more columns in the WHERE clause.Join columns are not primary key.What is the best way to create index on both tables.
12:53 Edward Anil Joseph: The auto-numbering for identity column got inserted late. The next number was inserted before the actual number should be inserted. Example: 1 was inserted at 2022-05-31 00:00:01.009 and 2 was inserted at 2022-05-31 00:00:01.001 How is this possible? Something to do with disk?
15:30 Preben: In your training classes do you touch the subject of using apply instead of joining to force a specific query plan? If yes, what training.
18:53 Wrap-up and discussing my Porsche 944 Turbo

Office Hours Speed Round, Text Edition

Last Updated June 12, 2022

Brent Ozar

Production Database Administration

Not all of the questions y’all post at https://pollgab.com/room/brento are hard – some of ’em can be answered in just a line or two:

Q: Hany: Hello Brent, Who’s “Brent Ozar” in the Azure world?

Microsoft, and I wrote why here.

Q: Neutron Jack: Do you see any concerning TSQL deprecations in SQL 2022?

No, neither deprecated nor discontinued.

Q: MuleDonkey43: Have scalar functions improved much in SQL 2022?

No.

Q: Looking For An Anti-Histagram: Our databases are many years old and over time have a lot of auto generated stats on the tables. I’m sure many are not used, is there an easy way to tell?

Not accurately, no.

Q: Yakira: Is there good way to identify the top 10 SQL queries using the most worker threads?

What’s the problem you’re trying to solve? If you’re working on THREADPOOL, I have a training class on that.

Q: Youssef: Do you have any recommended books for learning the XML behind a SQL query plan?

The XML, no.

Q: Ronaldo: Since SQL monitoring is broken in latest SQL 2022 CTP, do you foresee Microsoft kicking current monitoring vendors to the curb and creating their own commercial monitoring software?

You mean Microsoft System Center?

Q: Juan: Do you have any tips / precautions when using a table as a queue with RCSI?

Not RCSI specifically, but read this.

Q: Mashrur: Hello Brent, I completed your column store index training recently and it blew my mind. On that training you mentioned you only scratch the surface of table partitioning. You please kind enough to provide some advance reference (blog, YouTube, Paid) regarding table partitioning.

Thanks, glad you liked it! Sure, here you go.

Q: Kagamine Len: What are your thoughts about requiring a formal retention policy for all new tables created on production? Must have, nice to have, optional?

That’s determined by your company’s compliance department.

Q: Enrique: What are your thoughts on the new Json enhancements for SQL 2022?

I think it’s dumb to spend $7,000 USD per CPU core to use SQL Server as a file server. Microsoft thinks it’s a great idea. Go figure.

Q: Kol Dar: Is unbalanced parallelism a mainstream problem that SQL DBA’s should be regularly on the lookout for?

No. Use the methodology I teach you in the first module of Mastering Server Tuning.

Q: Joe: Good day to you Brent! The hardware provider allocated and presented 196 GB to our SQL 2016 Standard VM. After reading your BOU Weekly Links, June 6th Edition email it got me thinking. Would there be any benefit to setting the Max Memory above the 128 GB limit?

I don’t have experience with that. If you regularly need over 128GB RAM, you probably need Enterprise.

Q: Shlumiel: Are include fields part of your 5 x 5 suggestion for NC indexes?

Yes.

Q: Dominique B: Hi Brent, wanted to start by “I’m a big fan of your work” but that may sound a bit cheesy 😉 I’m reading about the memory optimize tempdb and want to validate my understanding… This feature will make the old “rule” for the number of tempdb data file an old story isn’t it ?

No, and I explain why in this module of my Fundamentals of TempDB class.

Q: Eric Swiggum: There was a recent infrastructure outage, once resolved a SQL transactional replication was not applying changes, however we were not alerted. Our Undistributed Commands alert failed us too. Should I monitor “Not Running” statuses for these publications too?

Every now and then, on live webcasts, a question makes me lose my mind. I rant and rave and throw things. And I realize, as I write this, that I’ve never actually done that in text format.

So here it comes, Eric.

(Deep breath)

Should you monitor something that caused an outage?

Is that what you’re asking me?

Nah.

Why bother? Screw it. Who cares? It’s not like outages matter, or that it’s your job. Forget monitoring. Just let the outages keep happening, and let your phone keep ringing. Life is meaningless. Death comes for everyone. Eat Arby’s.

…

And that concludes another episode of Office Hours.

[Video] Office Hours: Vail Fireside Chat

Last Updated June 9, 2022

Brent Ozar

Videos

You posted and upvoted questions at https://pollgab.com/room/brento, and I sat down by the fire on a chilly evening in Vail, Colorado to answer ’em. I was driving cross-country, heading back from a road trip to see my mom.

Here’s what we covered:

00:00 Introductions
00:46 Ms. Cosmos: What are the top traits for the perfect DBA?
02:09 Boris Karloff: What is your top SQL audit horror story?
05:31 Midwest DBA: My friend is assisting with designing a new Data Warehouse solution for the BI team at his company. They really want HA and DR for their DW. I’ve never managed a DW that had HA/DR. What issues could occur if with using FCI for HA and Log shipping for DR, or AG for HA?
07:58 i_use_lowercase_for_select: Hi Brent, I just watched Bob Ward’s pitch of SQL2022. Does compat. level 160 really solve the parameter sniffing problem as suggested by Microsoft?
09:59 Jonas: Did your parents instill you with the desire to teach? Were they teachers?
11:22 StatisticsRules: Hi Brent! What are you most excited about in the coming release of SQL Server 2022? Thanks for your epic contributions to the community!
13:05 Jim: What is your opinion of DOP feedback in SQL 2022? Will this get shops over the finish line?
15:15 VegasDBA: How are you enjoying living in Las Vegas? Any off the beaten path places you would recommend? I’ve been here for 20+ years and love it.
16:17 Don’t Bother Andy: Is CU16 ready for prime time?
18:00 MergeItLikeItsHot: Hi Brent, we have some very sensitive tables that we work on and before we update any value on that table we backup the entire table for fast recovery, do you recommend using temporal tables instead?
19:26 SQLForTheWin: What is the best way to manage index deployments across multiple RDS instances? We have an application deployed in three regions and we need to deploy the same indexes across the instances.
20:30 TurnerBurn: I have a table that contains no primary key but has a non-unique clustered index containing 3 columns and, of course, the 4-byte clustering key SQL tacks on for uniqueness. Is there an advantage with this design or would an id column PK and non-clustered indexes be better?
21:39 Jason Burton: How do you stay current with such a variety of skillsets?
25:16 Beaker: Are DBA’s typically disliked by their co-workers? Should we care?

[Video] Office Hours in Moab, Utah

Last Updated May 25, 2022

Brent Ozar

Videos

On a cross-country trip, I stopped in Moab, Utah and answered your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

00:00 Introductions
01:20 YourbiggestFan: Do you have any plans to write an Azure first version of the first responder kit (sp_blitzfirst and sp_bitzcache)which can get the metrics from query store and thus overcome the issue of dmv stats resets on Azure platform? On PAAS platforms need a reliable method to analyse perf.
03:06 T.C.: What are the most under utilized features for SQL monitoring software?
04:24 Drew Furgiuele: What’s the most common “complaint” you hear from DBAs these days? Is it a lack of funding for good hardware/higher cloud performance tiers? Training budgets? Developers?
07:04 Zee: Hi Brent, How can a ORM developer (hibernate, spring) get better at sending queries to databases. And how a DBA can convince ORM developers that a 15k characters query with 29k output every 30k times is not a very good thing and can be done better in a stored procedure.
10:01 Yousef: What are the top unforgiveable DBA mistakes?
11:25 Ryan: Where do you see the DBA profession going in the next 5-10-15 years?
15:35 Hany Helmy: Hello Brent, my friend is searching for your article on “Your first day as a DBA in a new company”, couldn’t find it can you help him?
16:05 Dave Dustin: Of all the database project CI/CD pipelines you’ve seen in your career, do you have any recommendations or tips?
17:16 Does Basically Anything: I know your standard recommendation for storing BlitzFirst outputs are every 15 minutes, retaining for 7 days. What would be the maximum you’d recommend if a friend were interested in retaining more, or collecting all the data at more frequent intervals? (Using ALL output tables)
17:52 Harika: Do you have have an opinion for when the SQL Log backup job should be temporarily stopped on a busy SQL2019 OLTP server (i.e. index maintenance, full backups, diff backups, etc)?
18:52 Gülnaz: What post update sanity checks do you like to perform after applying a SQL cumulative update?
19:58 Midwest DBA: Have you ever worked with a Very large company? Target, Walmart, GMC, etc. Are their DBA team’s full of highly skilled DBAs, or are they much like everywhere else ,i.e. one expert and few regular folks?
22:11 Pessel: Who is the community’s “Brent Ozar” for all things SQL Security related?
22:51 NullPointer: My friend wants to know the best way to get some type of high availability (willing to have 30m-1hr of downtime). They have 1 database per client (sitting at 50 dbs but will grow) and use std edition with no dedicated DBAs (just devs). Suggestions (always on, log shipping, etc.)?
23:20 Alec Roques: Upon checking a table with sp_BlitzIndex, do you ever drop indexes with a low amount of reads to a high amount of writes (for example, Reads: 80 (56 seek 24 scan) Writes: 670,011)? How do you make that determination? Or will you wait until you see locking and blocking problems?
24:05 Preben: Are you open to visit belgium?
25:06 Latka: How do you determine the optimal time interval for backing up the transaction log, (30, 15, 10, 5 minutes, etc) if transaction log size / growth is primary concern ?
26:55 MancDBA: Non-SQL question – You have talked previously about some of your awesome cars (Helmut, Ferrari etc). What do you use as a daily driver? Cheers!

SQL Server 2022 Tells You Why A Query Can’t Go Parallel.

Last Updated June 7, 2022

Brent Ozar

Execution Plans, SQL Server 2022

Until 2022, when a query couldn’t go parallel, all we got was a really cryptic note in the execution plan properties saying NonParallelPlanReason = CouldNotGenerateValidParallelPlan.

But starting with SQL Server 2022, even when I’m running under older compatibility levels:

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140; /* 2017 */
GO
DECLARE @TableVariable TABLE(Total BIGINT);
INSERT INTO @TableVariable (Total)
	SELECT COUNT(*) FROM dbo.Votes;

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140; /* 2017 */

DECLARE @TableVariable TABLE(Total BIGINT);

INSERT INTO @TableVariable (Total)

SELECT COUNT(*) FROM dbo.Votes;

The execution plan gives me way more details:

Awww yeah! Here’s another example using a scalar user-defined function:

CREATE OR ALTER FUNCTION dbo.RightHereRightNow ( @Meaningless INT )
RETURNS DATETIME2
    WITH RETURNS NULL ON NULL INPUT,
         SCHEMABINDING
AS
    BEGIN
        DECLARE @Current DATETIME2;
        SELECT @Current = GETDATE();
        RETURN @Current;
    END;
GO
SELECT COUNT(*) FROM dbo.Votes
	WHERE dbo.RightHereRightNow(0) > '1900-01-01';

CREATE OR ALTER FUNCTION dbo.RightHereRightNow ( @Meaningless INT )

RETURNS DATETIME2

WITH RETURNS NULL ON NULL INPUT,

SCHEMABINDING

BEGIN

DECLARE @Current DATETIME2;

SELECT @Current = GETDATE();

RETURN @Current;

END;

SELECT COUNT(*) FROM dbo.Votes

WHERE dbo.RightHereRightNow(0) > '1900-01-01';

The actual plan now clearly explains that our query can’t go parallel because of the scalar function:

If our query uses two parallelism blockers at once:

DECLARE @TableVariable TABLE(Total BIGINT);
INSERT INTO @TableVariable (Total)
	SELECT COUNT(*) FROM dbo.Votes
	WHERE dbo.RightHereRightNow(0) > '1900-01-01';

DECLARE @TableVariable TABLE(Total BIGINT);

INSERT INTO @TableVariable (Total)

SELECT COUNT(*) FROM dbo.Votes

WHERE dbo.RightHereRightNow(0) > '1900-01-01';

The actual plan only shows one of the two reasons:

And the XML doesn’t show both reasons, either.

Still, that’s a really small complaint – at least SQL Server 2022 shows ANY of the reasons, which starts you down the road of performance tuning this query. I’ll take it!

SQL Server 2022 doesn’t remove those parallelism blockers, though.

If I change the compatibility level to 160 (2022), the table variable still goes single-threaded, as does the scalar function:

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 160; /* 2022 */
GO
SELECT COUNT(*) FROM dbo.Votes
	WHERE dbo.RightHereRightNow(0) > '1900-01-01';

ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 160; /* 2022 */

SELECT COUNT(*) FROM dbo.Votes

WHERE dbo.RightHereRightNow(0) > '1900-01-01';

The execution plan still plays the sad trombone:

Because it doesn’t look like SQL Server 2022 is fixing the rampant issues with scalar function inlining.

[Video] Office Hours: Professional Development Edition

Last Updated May 31, 2022

Brent Ozar

Videos

I spent an entire video just talking about 3 questions y’all posted to https://pollgab.com/room/brento because these were pretty big-picture and important.

I know it sounds goofy and sentimental, but I genuinely care about y’all’s careers, and I want you to work as little as practical in order to spend as much time with your loved ones as possible. So here we go:

What we covered:

Kinneret: When are work communications appropriate for a chat channel such as Slack? When should those communications take place via direct audible communications (phone, webex, etc)?
WhatsUpDocs: Hi Brent, have you ever needed to look at business documentation (check business rules/logic) when consulting or as an employee, but it was severely lacking? Recently joined a different team in work and trying to find simple answers to questions is an uphill struggle…
Roy: Hi Brent, How did you manged to upskill yourself early in your career with a busy full time job? What recommendations will you give to somebody early in his DBA career?