Blog

Thoughts About Stack Overflow’s Annual Developer Survey

Last Updated July 24, 2024

Every year, Stack Overflow runs a developer survey about technology, work, community, and more. This year’s results include 65,437 responses from developers around the world.

The results are biased towards the kinds of developers who use Stack Overflow – 76% of the respondents reported that they have a Stack Overflow account. I would guess that it’s nowhere near a perfect picture of all developers worldwide, but let’s just focus on the fact that it does represent how over 65,000 developers feel – and that alone is useful enough to give you a picture of what’s happening in at least a lot of shops worldwide.

The most-used databases were PostgreSQL, MySQL, SQLite, and Microsoft SQL Server, in that order:

When Jeff Atwood and Joel Spolsky first started Stack Overflow, Jeff did a lot of evangelization work for it, and Jeff’s audience was heavily biased towards .NET development. I would imagine that’s part of why SQL Server is by far the highest paid database in the list (as opposed to open source.)

I have a total blind spot around SQLite, but often on my TikTok videos when the discussion of licensing costs comes up, some commenters will ask why everyone in the world doesn’t use SQLite. I’m sure there are a lot of apps worldwide that simply don’t need a database server, only a small local relational storage, and I can understand why those developers would have a similar blind spot about what it’s like to handle concurrent load in an enterprise-wide ERP app or e-commerce store.

The next chart, admired-vs-desired, doesn’t make sense to me and I don’t trust the numbers:

From what I can tell, the blue scores indicate how much the respondents have worked with the database in the last year, and red scores indicate how much they want to work with it next year? I’m pretty confused by this one. Are the red scores exclusive to only the people who actually worked with the technology this year – meaning, out of the 15.4% of the audience that worked with SQL Server in the past year, 54.5% of them want to work with it next year?

And why don’t these numbers come anywhere near agreeing with the prior question? The prior question says 25.3% of the audience used it last year, but the admired-vs-desired question says only 15.4% did? I’m so lost. At first I thought this question is taking about “extensive” development work, whereas the first one might just be ANY database work – but the numbers don’t make sense there either, because Supabase scored 4% on the first question (any work), but 5.9% on this question (extensive work.) Both questions use the term “extensive.” I’m lost.

So yeah, I just discarded that question and didn’t bother to think about the results. It doesn’t make sense, so I don’t trust it.

There was also a question about which cloud provider folks used, and AWS dominated the market:

That’s been my experience too – the vast, vast majority of my cloud clients are on AWS – but I’m mentioning it here because I know that Azure users really seem to believe Azure’s the only game in town. When I talk to Microsoft MVPs, they seem dumbfounded that companies are actually using AWS extensively, and they also seem surprised that Microsoft is competing with Google for second place. (Google’s been throwing a lot of discounted/free compute power at prospective clients to win them over.)

There’s a lot more stuff in the overall results, especially the workplace trends section that talks about employment status, hybrid/remote/in-office, and salary. When you’re looking at each workplace graph, make sure to click on the geographic filter at the top of that graph so the numbers will be more meaningful to you, based on where you’re located.

[Video] Office Hours in a Mexican Hot Tub

Last Updated July 30, 2024

Brent Ozar

Videos

While taking a dip in Cabo, I went through your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

00:00 Start
01:00 OpportunityKnocking: What are your thoughts on when to implement a NoSQL strategy over traditional RDBMS for large enterprise-wide database platform solutions? I see scalability advantages in using NoSQL but do you see this becoming more of a preferred go-to solution for organizations over time?
01:57 SwissDBA: In a table with 8M rows, 2 GB data + 4GB indexes, Clustered ix on a GUID column. INSERTs are slow bc SQL has to squeeze them in between existing rows. I would switch the clustered ix to a new IDENTITY column, so new records can be added at the end of the tbl. How would you do it?
03:01 MyTeaGotCold: You recently said that your Hekaton clients are trying to get off. The faults of Hekaton are well-known, so what changed between when they started using it and now?
03:54 Tim Rogers: 90 TB database in AWS, all data active. Tables partitioned across 100 filegroups (4 files each) for ease of index maintenance. Would like to consolidate to a single “data” filegroup with ~100 files, because we don’t do anything that benefits from multiple filegroups. Thoughts?
06:30 Kevin M: Would like to learn Amazon Aurora despite current company not using Amazon stack. What’s the best / cheapest way to learn Amazon Aurora?
07:47 SadButTrue: Hey Brent! Who’s the “Brento” in Postgres community?
08:26 RadekG: When was the last time that you faced a query so poorly written that you had to have a drink before fixing it? What was so bad about it?
11:19 Jökull: What are the top issues you see when storing / searching XML/JSON in SQL Server?

Query Exercise Answer: Fixing a Slow Computed Column

Last Updated July 18, 2024

Brent Ozar

Query Exercise Answers

In last week’s Query Exercise, we added a user-defined function to the Users table to check whether their WebsiteUrl was valid or not. I noted that even with an index on Reputation, SQL Server 2022 simply ignored the index, did a table scan, and spent 2 minutes of time calling the user-defined function on a row-by-row basis.

First off, a disclaimer: I didn’t write that exercise with a goal of showing the difference between old & new SQL Server versions, between old & new compatibility levels. However, several folks in the comments said, “Hey, I’m on an older version, on an older compatibility level, and the query is blazing fast already. What do I need to fix?” That’s a good reminder of what work you need to do before you go live on SQL Server 2022 (or any new version.) Moving on.

One Fix: Persisting the Computed Column

One way to fix it is to drop the computed column, then create it again with the PERSISTED keyword:

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO
ALTER TABLE dbo.Users ADD IsValidUrl
	AS dbo.IsValidUrl(WebsiteUrl) PERSISTED;
GO

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

ALTER TABLE dbo.Users ADD IsValidUrl

AS dbo.IsValidUrl(WebsiteUrl) PERSISTED;

That way, SQL Server computes each row’s contents just once, and then persists it in the table itself, instead of running the function every time we query it. By executing the above code, we get:

Wait, what? As a reminder, here’s the contents of our function:

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT
AS
BEGIN
    DECLARE @Result BIT = 0

    -- Regex pattern for a valid URL
    -- This pattern covers:
    -- - Scheme (http, https, ftp)
    -- - Optional username:password
    -- - Domain name or IP address
    -- - Optional port
    -- - Optional path
    -- - Optional query string
    -- - Optional fragment

    IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'
    BEGIN
        IF @Url LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND @Url NOT LIKE '% %'            -- No spaces allowed in URL
           AND @Url LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND @Url LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND @Url LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
        BEGIN
            SET @Result = 1
        END
    END

    RETURN @Result
END
GO

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))

RETURNS BIT

BEGIN

DECLARE @Result BIT = 0

-- Regex pattern for a valid URL

-- This pattern covers:

-- - Scheme (http, https, ftp)

-- - Optional username:password

-- - Domain name or IP address

-- - Optional port

-- - Optional path

-- - Optional query string

-- - Optional fragment

IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'

BEGIN

IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND @Url NOT LIKE '% %' -- No spaces allowed in URL

AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain

AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

BEGIN

SET @Result = 1

END

RETURN @Result

END

HOW THE CELKO IS THAT NOT DETERMINISTIC? What, does string comparison change from time to time? Books Online’s article on determinism says “All of the string built-in functions are deterministic,” but I’m guessing this has something to do with a possibly changing collation, and I couldn’t care less about digging further. That’s left as an exercise for the reader. Moving on.

An Actual Quick Fix: WITH SCHEMABINDING

I used to hear people say, “You should put WITH SCHEMABINDING on your scalar functions and they’ll go faster.” Every single time I tried it, it made no difference whatsoever.

Until…last week! In the comments, Ag suggested to add that hint to the function, and it ran rapidly! We’ll remove the existing column, tweak the function – the only change is WITH SCHEMABINDING – and then add it back in:

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT WITH SCHEMABINDING
AS
BEGIN
    DECLARE @Result BIT = 0

    -- Regex pattern for a valid URL
    -- This pattern covers:
    -- - Scheme (http, https, ftp)
    -- - Optional username:password
    -- - Domain name or IP address
    -- - Optional port
    -- - Optional path
    -- - Optional query string
    -- - Optional fragment

    IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'
    BEGIN
        IF @Url LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND @Url NOT LIKE '% %'            -- No spaces allowed in URL
           AND @Url LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND @Url LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND @Url LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
        BEGIN
            SET @Result = 1
        END
    END

    RETURN @Result
END
GO
ALTER TABLE dbo.Users ADD IsValidUrl
	AS dbo.IsValidUrl(WebsiteUrl);

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))

RETURNS BIT WITH SCHEMABINDING

BEGIN

DECLARE @Result BIT = 0

-- Regex pattern for a valid URL

-- This pattern covers:

-- - Scheme (http, https, ftp)

-- - Optional username:password

-- - Domain name or IP address

-- - Optional port

-- - Optional path

-- - Optional query string

-- - Optional fragment

IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'

BEGIN

IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND @Url NOT LIKE '% %' -- No spaces allowed in URL

AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain

AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

BEGIN

SET @Result = 1

END

RETURN @Result

END

ALTER TABLE dbo.Users ADD IsValidUrl

AS dbo.IsValidUrl(WebsiteUrl);

Then rerun our query without changes:

SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;

SELECT TOP 200 *

FROM dbo.Users

WHERE IsValidUrl = 1

ORDER BY Reputation DESC;

Even on a large Stack Overflow database, the query runs in milliseconds and produces a great execution plan:

SQL Server uses the index only only reads a few hundred rows before the work is done. Sure, there’s no parallelism in the query, and the plan properties report that “TSQLUserDefinedFunctionsNotParallelizable”, but who cares? The query’s so fast, even with millions of candidate rows, that it doesn’t matter. Good enough, ship it.

My Old Faithful Fix: A Trigger

I know. I know, the word makes you break out in itchy hives, but triggers are a great way to quickly replace computed columns.

Let’s drop the existing computed column, add a new “real” column, configure the trigger to populate it going forward, and then backfill it to set up the existing data:

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO
ALTER TABLE dbo.Users ADD IsValidUrl BIT;
GO
CREATE OR ALTER TRIGGER Users_IsValidUrl ON dbo.Users
AFTER INSERT, UPDATE
AS  
BEGIN
UPDATE u
	SET IsValidUrl = dbo.IsValidUrl(u.WebsiteUrl)
	FROM dbo.Users u
	INNER JOIN inserted i ON u.Id = i.Id;
END;  
GO  
UPDATE dbo.Users
	SET WebsiteUrl = WebsiteUrl;
GO

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

ALTER TABLE dbo.Users ADD IsValidUrl BIT;

CREATE OR ALTER TRIGGER Users_IsValidUrl ON dbo.Users

AFTER INSERT, UPDATE

BEGIN

UPDATE u

SET IsValidUrl = dbo.IsValidUrl(u.WebsiteUrl)

FROM dbo.Users u

INNER JOIN inserted i ON u.Id = i.Id;

END;

UPDATE dbo.Users

SET WebsiteUrl = WebsiteUrl;

The first few statements will run instantly (assuming nobody else is locking the Users table), but the last statement will take a minute since it’s actually adding a new value to every row in the Users table, AND it’s calling a scalar function on every row. You could make that faster with set-based operations, but more on that in a minute.

After that, run the query we’ve been trying to tune:

SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;

SELECT TOP 200 *

FROM dbo.Users

WHERE IsValidUrl = 1

ORDER BY Reputation DESC;

It runs in a couple milliseconds (down from a minute) and uses the Reputation index, as the execution plan shows:

I like the trigger solution for a few reasons:

It keeps the function’s logic as-is, which can be helpful in situations where the logic is really complex, time-tested, and labor-intensive to rewrite.
It can be implemented really quickly.
Queries against the table can now go parallel if they need to.

But there’s a drawback: the trigger’s still getting called on every insert and update. If the workload involves batch jobs to load or modify the table, that’s going to be problematic because they’re going to slow down. Those modifications didn’t call the computed column’s function before since they most likely weren’t reading that column. If our workload includes batch jobs, we’re gonna need a faster solution. We’ll clean up after ourselves before trying the next one:

DROP TRIGGER [dbo].[Users_IsValidUrl]
GO
ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;
GO

DROP TRIGGER [dbo].[Users_IsValidUrl]

ALTER TABLE dbo.Users DROP COLUMN IsValidUrl;

Rewriting the Function

Most of the time when you hear database people complaining about functions, they’re complaining about user-defined functions. For example, in this case, we can get a dramatic performance improvement if we forklift the business logic out of the user-defined function and inline it directly into the table’s definition itself. Here’s the end result:

ALTER TABLE dbo.Users ADD IsValidUrl
AS CASE WHEN (WebsiteUrl LIKE 'http://%' OR WebsiteUrl LIKE 'https://%' OR WebsiteUrl LIKE 'ftp://%')
	AND (WebsiteUrl LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND WebsiteUrl NOT LIKE '% %'            -- No spaces allowed in URL
           AND WebsiteUrl LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND WebsiteUrl LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND WebsiteUrl LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
     ) THEN 1 ELSE 0 END;
GO

ALTER TABLE dbo.Users ADD IsValidUrl

AS CASE WHEN (WebsiteUrl LIKE 'http://%' OR WebsiteUrl LIKE 'https://%' OR WebsiteUrl LIKE 'ftp://%')

AND (WebsiteUrl LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND WebsiteUrl NOT LIKE '% %' -- No spaces allowed in URL

AND WebsiteUrl LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND WebsiteUrl LIKE '%/%' -- Ensure there's at least one slash after the domain

AND WebsiteUrl LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

) THEN 1 ELSE 0 END;

Note that the logic isn’t identical because I had to tweak it: instead of using IF, I used CASE statements, and there’s no @Url parameter anymore because we’re referencing the WebsiteUrl column directly. The more complex your user-defined function is, the harder work this is going to be. To learn how to do these kinds of rewrites, check out Microsoft’s PDF whitepaper on Froid, the inline function technology they added in SQL Server 2019. That paper gives several specific examples of language to use when rewriting functions to go inline, which is quite nice of them.

Now, when the SELECT runs, the plan looks big:

And while the estimates aren’t good, the plan shape is fine, it uses the index, and it runs in milliseconds.

Eagle-eyed readers will note that I did not use the PERSISTED keyword in the above example. If you add that, the resulting plan shape looks even better because the function doesn’t have to run at read time:

And the query finishes in even fewer milliseconds. However, executing the ALTER TABLE with the PERSISTED keyword will lock & rewrite the table, so it’s a question of whether your workload can tolerate that outage. You could always start with the non-persisted version to quickly make the pain go away, and then later if you still need more performance, take an outage to drop the non-persisted computed column (which will be instantaneous) and then add the persisted one (which will be the opposite of instantaneous.)

Hope you enjoyed the challenge! For more exercises, check out the prior Query Exercises posts and catch up on the ones you missed so far.

Announcing New Membership Tiers: Free & Mentoring

Last Updated July 23, 2024

Brent Ozar

Company News

I give away a lot of stuff in various places on the internet: videos, scripts, the new free tier of SQL ConstantCare®, and I’ve got more tricks coming soon.

But until now, it’s been kind of a hassle because they’ve been scattered all over the place, not in a sensible order to help you onboard gradually.

So now, my shop page has a simplified set of bundles, starting with an absolutely free membership:

Free

$0per year

SQL ConstantCare® with Health Monitoring
Introductory Classes:
How to Think Like the Engine
How I Use the First Responder Kit

Fundamentals

$995per year

SQL ConstantCare® with Health & Performance Monitoring
Fundamentals Classes
Purchased Individually: $1,542

Mastering

$1,995per year

Everything in
Fundamentals, plus:
Mastering Classes
The Consultant Toolkit
Purchased Individually: $3,617

This new Free Stuff bundle will make it easier for me to put my free resources all in one place, in a more logical order. Folks can track their progress through the resources, checking chapters off as they work through ’em.

I updated my free How to Think Like the Engine class this week with a new recording using the 50GB Stack Overflow 2013 database. This way, people can get started on the exact same database that I use throughout the Fundamentals classes, making it a little easier to get onboard.

I’ve also made my How I Use the First Responder Kit class completely free. I’ve always believed that if I can solve your problems for free, then I wanna be the person who does that, so I might as well put you in the best position possible to use the FRK scripts.

For the pros, I’ve also added a new Mentoring tier.

The Mentoring tier includes everything in Mastering, plus:

Quarterly 30-minute Zoom calls with just you & me, talking about whatever database pains you’re up against
Monthly server review emails from me – I go through your SQL ConstantCare® data, look at the server having the most performance issues (or the one you request), and give you my thoughts about what steps to take to improve performance

That tier is $3,495/year for 1 person. It’s designed for folks who don’t quite need my full 2-day hands-on SQL Critical Care® engagement, and just want periodic feedback from time to time.

Frequently Asked Questions

“If I have an existing subscription, does this change anything?” Nope! Not at all. You can continue with your current one, or if you’d like to change something, you can cancel your existing subscription and pick up one of these instead. (We don’t have the ability to upgrade in place.)

“Can I still buy classes individually?” Yep! Just keep scrolling down on the shop page. The vast, vast majority of my buyers pick up the bundles so I keep those at the top.

“Can I buy the mentoring without training?” No, because often my advice includes a link to a specific training module to explain the issue you’re facing.

“Can I get group discounts?” Yes, if you’re enrolling 5 or more people in the same membership tier, email us at Help@BrentOzar.com with the tier you want, and the list of email addresses to register, and we’ll get you a quote.

For other questions, feel free to drop ’em in the comments or contact me.

Today’s a Good Day to Talk to Your Manager About Disaster Recovery.

Last Updated July 19, 2024

Brent Ozar

Backup and Recovery, Cloud Computing

Last night, two major IT disasters struck:

Microsoft Azure’s Central region went down for about 4 hours. The official post-mortem isn’t out yet, but rumor has it that while decommissioning legacy storage services, the product group deleted the wrong thing.
Crowdstrike pushed a bad update, leading to blue screens of death on Windows systems worldwide, affecting banking, healthcare, airlines, and more.

If you were affected by one of those outages, you have my warmest virtual hug. At times like this, the stress level can be really tough, and I hope you can take care of yourself. Remember that your own self-worth is not determined by the IT solutions you work on.

If you weren’t affected by one of those outages, it’s a good time to spend an hour writing up a few things:

Which of our production services are hosted entirely in a single region, availability zone, or data center?
How are we monitoring the status of that single point of failure? If there’s a widespread outage like that, how much time are we going to waste troubleshooting our own services when there’s a bigger problem?
When our single-region or single-AZ production services go down, what users/customers would be affected?
How will we communicate the outage to those affected users? Can we write that notification ahead of time so that it’s ready to go quickly in the event of the next disaster like this?
How much would it cost us (monthly or annually) to add in a second region or availability zone for protection from these kinds of incidents?

Summarize that, pass it up to your manager in writing, and it’ll help them have discussions this morning with their managers and executives. Today, a lot of business folks are going to be asking questions, and having these answers will help get you the resources you want.

(Or, it’ll help you feel more comfortable that the business understands the risks of putting all their eggs in a single basket, and that when that basket breaks, it’s not your fault. You warned ’em, and they chose not to spend the money to double-up on baskets.)

[Video] Office Hours in My Backyard

Last Updated June 30, 2024

Brent Ozar

Videos

On a pleasantly mild morning, I sat down on the patio and took your top-voted questions from https://pollgab.com/room/brento.

Here’s what we covered:

00:00 Start
00:56 MyTeaGotCold: When going in to a database blind, do you worry at all about its compatibility level? I see a lot of unloved databases that are still on 2008’s level, but the tiny potential for breaking changes makes me scared to touch it.
02:01 Vishnu: For boxed SQL, Is it ok for users to create SQL agent jobs that run periodic business logic and/or email end users?
02:59 Karthik: What’s your opinion of live query plan viewing in SQL Sentry Plan Explorer?
03:26 Jessica : I am DBA who previously managed dozens of Azure VMs running SQL Server AGs. After a layoff and new job I’m now managing 1 prod Azure SQL db. Any tips for someone making a transition like this? Every day I’m finding features that I no longer have access to and minimal monitoring.
04:29 Hong Kong Phoey: Currently, when we see a non-clustered index on a given multi-tenant SQL table, we may or may not know why it’s there, who created it, when it was created or which app needs it. What’s the recommended change control process for answering these questions?
05:35 Kevin M: What’s the best place to go for commercial PostgreSQL training?
06:02 NotCloseEnoughToRetirementToStopLearning : Hi Brent Inheriting a VLDB (50Tb) any recommended articles or trainings for working with something this size?
07:05 Mike: Hi Brent! When migrating from SQL Server 2017 to SQL Managed Instance, on MI databases after restore become FORCE_LAST_GOOD_PLAN = ON (On 2017, it was OFF). Do you recommend leaving it ON, or should we turn it OFF on initial stages ?
08:27 Venkat: What’s the most common detrimental complacency you see with SQL dbas?
09:13 Mattia: Can a big analytical query that does many logical reads (compared to the SQL Server RAM) give SQL Server Plan cache amnesia? Or is the Buffer Pool completely separated from the Plan Cache?
09:48 Nickelton: Is it possible to forward select queries to AG secondary replica without changing application side? (connection string etc.) For example running code from SSMS or for legacy applications.
11:22 WB_DBA: My friend suggests creating all indexes on a test database since it mirrors the production database. Is this a good approach?

Query Exercise: Fix This Computed Column.

Last Updated July 23, 2024

Brent Ozar

Query Exercises

Take any size of the Stack Overflow database and check out the WebsiteUrl column of the Users table:

Sometimes it’s null, sometimes it’s an empty string, sometimes it’s populated but the URL isn’t valid.

Let’s say that along the way, someone decided to ask ChatGPT to build a function to check for valid website URLs, and then used that code to add a new IsValidUrl column to the Users table (and yes, this is inspired by a real-life client example, hahaha):

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))
RETURNS BIT
AS
BEGIN
    DECLARE @Result BIT = 0

    -- Regex pattern for a valid URL
    -- This pattern covers:
    -- - Scheme (http, https, ftp)
    -- - Optional username:password
    -- - Domain name or IP address
    -- - Optional port
    -- - Optional path
    -- - Optional query string
    -- - Optional fragment

    IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'
    BEGIN
        IF @Url LIKE '%://[A-Za-z0-9.-]%.%'  -- Check for domain/IP after scheme
           AND @Url NOT LIKE '% %'            -- No spaces allowed in URL
           AND @Url LIKE '%.[A-Za-z]%'        -- Ensure there's a period in domain/IP part
           AND @Url LIKE '%/%'                -- Ensure there's at least one slash after the domain
           AND @Url LIKE '%[A-Za-z0-9/_-]%'   -- Ensure there's at least one valid character in the path
        BEGIN
            SET @Result = 1
        END
    END

    RETURN @Result
END
GO
ALTER TABLE dbo.Users ADD IsValidUrl
	AS dbo.IsValidUrl(WebsiteUrl);

CREATE OR ALTER FUNCTION dbo.IsValidUrl (@Url NVARCHAR(MAX))

RETURNS BIT

BEGIN

DECLARE @Result BIT = 0

-- Regex pattern for a valid URL

-- This pattern covers:

-- - Scheme (http, https, ftp)

-- - Optional username:password

-- - Domain name or IP address

-- - Optional port

-- - Optional path

-- - Optional query string

-- - Optional fragment

IF @Url LIKE 'http://%' OR @Url LIKE 'https://%' OR @Url LIKE 'ftp://%'

BEGIN

IF @Url LIKE '%://[A-Za-z0-9.-]%.%' -- Check for domain/IP after scheme

AND @Url NOT LIKE '% %' -- No spaces allowed in URL

AND @Url LIKE '%.[A-Za-z]%' -- Ensure there's a period in domain/IP part

AND @Url LIKE '%/%' -- Ensure there's at least one slash after the domain

AND @Url LIKE '%[A-Za-z0-9/_-]%' -- Ensure there's at least one valid character in the path

BEGIN

SET @Result = 1

END

RETURN @Result

END

ALTER TABLE dbo.Users ADD IsValidUrl

AS dbo.IsValidUrl(WebsiteUrl);

The user-defined function isn’t accurate, for starters – it’s letting things through that aren’t valid URLs, and stopping things that are actually valid – but let’s set that aside for a second.

What happens when we try to get the top users by reputation? To give SQL Server the best shot, I’m using SQL Server 2022, with the database in 2022 compatibility level, with an index on Reputation:

CREATE INDEX Reputation ON dbo.Users(Reputation);

SET STATISTICS TIME, IO ON;
GO
SELECT TOP 200 *
FROM dbo.Users
WHERE IsValidUrl = 1
ORDER BY Reputation DESC;

CREATE INDEX Reputation ON dbo.Users(Reputation);

SET STATISTICS TIME, IO ON;

SELECT TOP 200 *

FROM dbo.Users

WHERE IsValidUrl = 1

ORDER BY Reputation DESC;

The actual query plan is deceivingly simple, despite its terrible performance that takes about a minute to run:

WHERE IS YOUR FUNCTION INLINING GOD NOW? I could make movie jokes about this all day. Anyhoo, the plan ignored the Reputation index, did a 2-second table scan, and spent nearly a minute doing the scalar function and the filtering.

To add insult to injury, if you’re going to do 1 minute of CPU work, it sure would help to parallelize that query across multiple cores – but that query can’t get parallelism, as explained in the plan properties:

Your Query Exercise this week isn’t to fix the accuracy of the function – you can leave it as inaccurate if you like. Your challenge is to have the exact same query run in less than a second. Our goal is to avoid changing application code, and to get a very fast fix in place without blaming the developers. You’re the data professional: be professional.

Put your queries in a Github Gist, and include that link in your comments. Check out the solutions from other folks, and compare and contrast your work. After you’ve given it a try, read my thoughts in the following post. Have fun!

Update: please read the post in its entirety, and follow the instructions. Please don’t just throw ideas in there or half-formed T-SQL. For someone to test your work, they need to see your exact work. C’mon, folks – this isn’t a major project, just a single function. Be fair to people on the other side of the screen.

Updated, Larger Stack Overflow Demo Database

Last Updated July 16, 2024

Brent Ozar

Stack Overflow

Stack Overflow publishes a data dump with all user-contributed content, and it’s a fun set of data to use for demos. I took the 2024-April data dump, and imported it into a Microsoft SQL Server database.

It’s an 31GB torrent (magnet) that expands to a ~202GB database. I used Microsoft SQL Server 2016, so you can attach this to anything 2016 or newer. If that’s too big, no worries – for smaller versions and past versions, check out my How to Download the Stack Overflow Database page.

Some quick facts about this latest version:

Badges: 51,289,973 rows; 4.7GB
Comments: 90,380,323 rows; 26.1GB
Posts: 59,819,048 rows; 162.8GB; 32.7GB LOB – this is where you’ll find questions & answers
Users: 22,484,235 rows; 2.6GB; 12.5MB LOB
Votes: 238,984,011 rows; 5.9GB – a fun candidate for columnstore demos

As with the source data, this database is licensed under cc-by-sa-4.0: https://creativecommons.org/licenses/by-sa/4.0/ And to be very clear, this is not my data. The data and the below licensing explanation comes from the Stack Overflow Data Dump’s page:

But our cc-by-sa 4.0 licensing, while intentionally permissive, does require attribution:

Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Specifically the attribution requirements are as follows:

Visually display or otherwise indicate the source of the content as coming from the Stack Exchange Network. This requirement is satisfied with a discreet text blurb, or some other unobtrusive but clear visual indication.
Ensure that any Internet use of the content includes a hyperlink directly to the original question on the source site on the Network (e.g., http://stackoverflow.com/questions/12345)
Visually display or otherwise clearly indicate the author names for every question and answer used
Ensure that any Internet use of the content includes a hyperlink for each author name directly back to his or her user profile page on the source site on the Network (e.g., http://stackoverflow.com/users/12345/username), directly to the Stack Exchange domain, in standard HTML (i.e. not through a Tinyurl or other such indirect hyperlink, form of obfuscation or redirection), without any “nofollow” command or any other such means of avoiding detection by search engines, and visible even with JavaScript disabled.

This will probably be the last database update.

Prosus (a tech investment company) acquired Stack Overflow a few years ago for $1.8 billion. When a company’s founders sell their baby for money:

The new owners usually want to make a profit on their large investment, and
The new owners rarely share the same goals as the original founders, and
Sometimes the new owners spent way, way too much (hi, Elon) and are forced to make tough decisions to make their debt payments and keep the company afloat

So now Prosus wants to earn their $1,800,000,000 back, and they’re looking at the actual product they bought. StackOverflow.com has 3 components[1]:

An online app that gives you good-enough answers, quickly
The existing past answers already contributed by the community
The potential of future answers continuing to go into the platform

Can Prosus compete on #1? No. Just no. Companies like OpenAI (ChatGPT), Google (Gemini), and Anthropic (Claude) simply have a better solution for #1, full stop, end of story. A web site – even a free one – can’t beat ChatGPT’s ability to integrate directly with your development environment, review your code & database, and recommend specific answers for the problem you’re facing. Game over.

Can Prosus compete on #2? No. The existing answers (as of April 2, 2024) are available for free with nearly no restrictions. The horse is already out of the barn. Moving on.

Can Prosus compete on #3? If ChatGPT and their friends win on #1 and #2, then the default place for developers to find answers is no longer the web browser. (It’s ChatGPT or Copilot or whatever). Whatever happens next is going to be intriguing. Today, you and I are conditioned to think, “I’ll post that question on Stack or a forum.” Tomorrow’s developers will not have that same bias:

Maybe the dev will prompt ChatGPT, “Can you find me answers online for this?” In that case, the LLM will search the web and summarize – and Prosus won’t stand a chance of convincing the user to post the question at StackOverflow.com.
Maybe the dev will open their web browser and ask the question. In that case, the search engine company will try to summarize answers too. These days, both Google and Bing try to avoid landing you on actual web sites, and try to give you the answers on their own pages instead, whether it’s AI-summarized answers or hallucinations or web page summaries next to each site.
Maybe the dev will go to the Github repo for the related project, and post a question there.

I don’t see an easy way for Stack Overflow to inject themselves into that workflow in the year 2030. I’m sad about that because I have a long personal history with Stack Overflow. At the same time, I’m also kinda glad that the original founders, employees, and advisors (me included) were able to cash out thanks to Prosus’s $1.8B overspending just before the generative AI boom hit.

Prosus needs solutions fast: Stack is now losing $150,000 per day. Prosus’s 2024 annual reports noted that Stack Overflow had $98M in incoming revenue – but lost $57M. I can understand why managers might flail at a company’s switches and dials trying to find a way to stop the financial bleeding.

One of the dials they’ve been flailing at is turning down community access to the past answer data, aka business part #2. In their minds, they’re trying to stop OpenAI/Google/Anthropic from making so much money on the back of Stack’s answers. Earlier this year, Prosus tried to pump the brakes on providing the data dumps in XML format on a regular basis, and there was some community outrage, so they relented. However, they’re back: last week, Prosus announced they’re limiting access again.

Based on what Prosus is saying in that post, going forward, I don’t think Prosus will approve of me redistributing new data dumps in a database format. I’m not going to waste time or energy fighting that battle – I’d rather they spent their own energy trying to figure out a way to keep StackOverflow.com a viable business concern going forward. Hopefully they find fun, productive ways to do that, ways that bring the community together onto Prosus’s side rather than turning consumers against Prosus.

However, if Prosus management is willing to limit the data dump, then I have a bad feeling that more barriers are coming over the years. Next, they’ll make answers harder to access for people who have an ad blocker, or who aren’t signed in, or who haven’t paid for a “premium” Stack membership. I’m not mad at them about this, because I don’t have any answers to turn the business around either, and I haven’t heard from anybody who does.

You either die a hero or live long enough to see yourself become the arch-enemy.

[1] Technically the company Stack Overflow has a couple other parts: advertising and Stack Overflow for Teams. Both of those business models are at risk due to AI as well. Their other attempts at diversification, like Articles and Jobs and Developer Story, never caught on.

[Video] Office Hours in Bulgaria

Last Updated June 30, 2024

Brent Ozar

Videos

I went through your top-voted questions from https://pollgab.com/room/brento while in front of the National Gallery for Foreign Art in Sofia.

Here’s what we covered:

00:00 Start
03:12 MyTeaGotCold: If all of my columns are nvarchar, is there a performance benefit to always wrapping strings in N”? My tests have been inconclusive.
03:50 SadButTrue: Hey Brent, most of our Azure SQL DBs have top wait stats related to parallelism (CX***) and performance is not great. As we cannot modify the cost threshold for parallelism in Azure SQL, what other techniques can we use to reduce the waits associated with parallelism?
05:00 Dom: Hi Brent, I noticed something strange on a SQL Express 2012. I’m looking at the “Visible online” CPU and was expecting to see 4 (as Express limits to 4 cpu or 1 socket) but it shows 8 visible online cpu… Am I missing something or is my SQL Express really using 8 CPU ? Thanks !
05:36 dba jr: hi Brent, in my company users can design any query they want. for example they can choose multi column for order by or in where clause . i mean queries in APP are not fix. but they tell me queries are slow. tables have million rows. how can I handle this.
07:10 DBA in VA: SQL 2019: What causes a query not to use the execution plan I’ve forced in the query store? Isn’t that the whole idea of the forced plan??
08:00 Ruby Sunday: Is creating a NC index that mirrors the clustered index to avoid blocking considered a bad practice?
09:30 TheyBlameMe: Hi Brent. What’s you opinion of this MS recommendation to “prevent lock escalation” for long running batch operations? BEGIN TRAN; SELECT * FROM mytable (UPDLOCK, HOLDLOCK) WHERE 1 = 0; WAITFOR DELAY ‘1:00:00’; COMMIT TRAN;
10:40 DanishDBA: Hi Brent, my friend needs to create an index on a highly used table on a SQL Server 2019 SE (FCI). On a copy of the db on the same server he knows that it can take up to 4 minutes. The goal is to minimize the impact, as a service window is not allowed. What is the best approach?
11:48 JustWondering: Been suspicious that something “outside” of SS is the problem but don’t know how to prove it. Changed from using JTDS JDBC driver to MS JDBC driver and saw 40+% improvement in runtimes. How could I find what the MS driver is doing different vs JTDS? QryPlans seem the same. Thx.

Announcing Free SQL Server Monitoring.

Last Updated July 17, 2024

Brent Ozar

SQL ConstantCare

You already use sp_Blitz and the rest of the free, open source First Responder Kit to give your SQL Servers a health check now and then.

But let’s be honest: you don’t do it often enough.

You wish you had an easier way to know when your backups stop working, when corruption strikes, when a poison wait is detected, or when a new SQL Server patch comes out for one of your servers.

Good news! We’ve now got a free version of SQL ConstantCare®!

SQL ConstantCare® is our simple monitoring product that you install on a jump box or VM. It connects to your SQL Servers, Azure SQL DB, Amazon RDS, etc. just once per day, gathers diagnostic data, and sends it to our processing servers in AWS. We generate a single daily email per server telling you specific, actionable tasks that will make your databases safer.

The full-blown $695/year version of SQL ConstantCare® gives you performance advice on stuff like query plans, indexes, memory settings, and wait stats, but there are a lot of folks out there who just don’t need that. They can’t fix the databases, queries, or indexes, and they just wanna know that the server is healthy.

That’s where our free health-only monitoring comes in. Sign up here for free, then download the installer, and follow the installation instructions. In a matter of minutes, you’ll get an email, and then again once a day – but only when we’ve got stuff you actually need to do on the server. Otherwise, we leave you alone to do the other important stuff you gotta do every day.

Why are we doing this? What’s the catch?

It’s easy for us to support because we’re not doing anything complex like putting agents on SQL Servers, or running 24/7 monitoring. The app just runs on your jump box as a scheduled task, once per day, and that’s it.

It’s cheap for us to provide because the health-only processing costs us less than $1 per monitored SQL Server per month. We designed SQL ConstantCare® to be serverless and cloud-native right from the start, and Richie’s done a lot of work keeping the code and database lean and mean, so scaling has been pretty easy.

Our back end is fully cloud-hosted, which means your diagnostic data goes up to our services in AWS. If you’ve got questions about how that works, check out how we collect, transmit, and store your data and the frequently asked questions. To help us with GDPR compliance, we automatically delete all data older than 30 days. We won’t ever offer a version of SQL ConstantCare® that you can host yourself – we rely on too many AWS services, and helping you set all that up and troubleshoot it would make the price tag crazy high. If you want your own private monitoring, you’re better off buying a conventional 24/7 monitoring app. Of course, those are more expensive, and priced per monitored server – but if you want that level of control, that’s the price you pay. (Literally.) We’re trying to help as many people as we can here, as inexpensively as possible.

The signup process is a little wonky, not as smooth as I’d like. When you click the signup link, you’ll be asked to sign up with an email or Google. If you’ve bought training classes or software from me in the last few years from training.brentozar.com, you can use the same login. If you hit any roadblocks signing up and you can’t figure out how to get past ’em, email Help@BrentOzar.com and include a screenshot of the full browser, including the URL you’re on.

I hope this helps make your job easier. Like the First Responder Kit, this here blog, and the tons of videos we put out across my YouTube channel, TikTok, LinkedIn, etc, I really want to help do as much as I can, for free. That way, when you need training or consulting, you’ll remember who loves ya, baby. Enjoy!

Use “We” Not “You”. #tsql2sday

Last Updated July 2, 2024

Brent Ozar

Consulting Lines

For T-SQL Tuesday this month, Louis Davidson suggested we give our past self some advice.

I’d tell myself, “Use ‘we’, not ‘you’.”

For years, when I gave advice, I’d say things like:

“You’re doing A, when you should really be doing B instead.”
“Your code has a problem right here.”
“Your network settings are wrong, and you should change them to this instead.”

The very word ‘you’ sets up a confrontational tone that puts the recipient on the defensive. They can’t help but react by taking things personally. We’re just humans, meatbags of emotion.

Instead, use words like ‘we’ and ‘our’ that group us together. We’re on the same team, and our common enemy is technology. Dagnabbit, technology sucks hard. It’s always out to get us, to make our lives miserable, to refuse to work the way it says it’ll work in the manual.

Once we (see what I did there) get started using the term ‘you’ early on in our careers, it’s a really hard habit to break. I know that, because I’ve been trying to break it for years.

I do use the term ‘you’ a lot in blog posts and videos that are purposely designed to be confrontational and drive engagement. That’s on purpose. However, when I wanna give advice to someone on my own team, I try to remember that we are indeed on the same team, and I need to communicate that by using the word ‘we.’

[Video] Office Hours in Sofia, Bulgaria

Last Updated June 30, 2024

Brent Ozar

Videos

I was honored to speak at the Present to Succeed conference in Sofia, Bulgaria, run by a former SQL Server MCM. Sofia is a beautiful city, and the gorgeous Patriarchal Cathedral of St. Alexander Nevsky (Wikipedia) was close to my hotel, so I dragged my tripod over there for an Office Hours session.

Here’s what we covered:

00:00 Start
01:19 Live is Life: After a migration to a new db, the old 3TB db is now read only. My friend is enabling row compression on the biggest tables/indexes and already got down to 1,5TB. The goal is to get faster reads and a smaller db. What are your thoughts about this? Is there a better way?
02:40 MyTeaGotCold: Is Enterprise Edition generally seen as the norm? Blogs and particularly the official docs rarely point out that something isn’t available in Standard, but I’ve gone my entire career without seeing it.
03:20 EthicalDBA: Hey Brent, have you ever faced any moral/ethical issues in your DBA career that caused you to really question the task you were working on?
04:35 With NeinLock: Greetings! As someone who is a household name when it comes to the SQL Server community, have you noticed members of the younger generation joining the community to give back and share knowledge? If yes, are there any you recommend following?
08:22 Roger ap Gwilliam: What’s the scariest RDBMS you have worked with?
08:58 DT_DBA: Do you think using the FORCESEEK hint “everywhere” is okay? (same as how some people use NOLOCK). I have a client that has started doing this. They aren’t specifying which index to use, just doing things like “select * from table WITH (FORCESEEK) join view WITH (FORCESEEK) …”
09:40 handysql: Helmet on the shelf. Is that for show or do you have a harley/rocket in the garage?
11:35 Mike: Is there a good criteria that can be used to tell if a query is OLTP or OLAP query ? Is it number of seconds (duration, or cpu time), or number of logical reads, or something else ?
12:48 VegasDBA: Hi Brent! Ever do any big physical to virtual conversions? I’ve been tasked with a very aggressive timeline to convert several physical SQL AGs to VMs. I was considering using AGs or distributed AGs to fail them over. Was curious of your thoughts and opinions.
13:46 Mike: Is there a good up-to-date article that describes differences between Junior, Middle and Senior DBA in SQL Server (and maybe Azure SQL ?). And where can you develop next after becoming “Senior” ?
15:53 Davros: When a traditional clustered index table starts to having too many indexes is this an indication that the table should be using column store?

Save $350 on the PASS Summit if You Register Now.

Last Updated June 30, 2024

Brent Ozar

#SQLPass

Next Tuesday, pricing on the 3-day tickets for the PASS Data Community Summit goes up to $2,095.

But if you register right now, plus use coupon code BRENTO24, it’s just $1,745.

It’s one in-person conference that covers Microsoft SQL Server, Azure, PostgreSQL, Snowflake, Oracle, and more. So many of us (me included!) are working with multiple databases these days, and it’s hard to find a single event with this kind of coverage.

I’m also teaching a 1-day pre-conference workshop on Monday, Tuning T-SQL for SQL Server 2019 and 2022. That’s an additional $595, and you can also sign up for another pre-conference workshop on Tuesday. There are great options in there including Query Quest with Erik & Kendra, Microsoft Fabric in a Day, PostgreSQL Fundamentals, Power BI Architecture, and a SQL AI Workshop run by Microsoft folks.

Go register now, and I’ll see you in Seattle!

Does Bit Column Order Matter in Tables?

Last Updated June 30, 2024

Brent Ozar

Development

At the PGConf.dev, where Postgres developers get together and strategize the work they wanna do for the next version, I attended a session where Matthias van de Meent talked about changing the way Postgres stores columns. As of right now (Postgres 17), columns are aligned in 8-bit intervals, so if you create a table with alternating columns:

MyBitColumn1 – 1 bit used
(7 bits wasted for alignment to get to the next byte)
SomeOtherColumn – any other datatype, but not a bit
MyBitColumn2 – 1 bit used
(another 7 bits wasted for alignment)

Matthias pointed out that was inefficient, and that Postgres should separate physical column order from logical column order. Under the hood, it should just store:

MyBitColumn1 – 1 bit used
MyBitColumn2 – 1 bit used
(6 bits wasted for alignment)
SomeOtherColumn – any other datatype

Microsoft SQL Server already does this with bits, as the documentation explains:

The SQL Server Database Engine optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9 up to 16 bit columns, the columns are stored as 2 bytes, and so on.

To demonstrate that, I whipped up a demo script showing two table creations: one with the bit columns scattered around through the table, and one where they’re all grouped together:

CREATE DATABASE TestColumnAlignment;
GO
USE TestColumnAlignment;
GO

CREATE TABLE dbo.Disorganized
(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
TinyInt1 TINYINT,
Bit1 BIT,

TinyInt2 TINYINT,
Bit2 BIT,

TinyInt3 TINYINT,
Bit3 BIT,

TinyInt4 TINYINT,
Bit4 BIT,

TinyInt5 TINYINT,
Bit5 BIT
);




CREATE TABLE dbo.Organized
(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

Bit1 BIT,
Bit2 BIT,
Bit3 BIT,
Bit4 BIT,
Bit5 BIT,

TinyInt1 TINYINT,
TinyInt2 TINYINT,
TinyInt3 TINYINT,
TinyInt4 TINYINT,
TinyInt5 TINYINT
);


INSERT INTO dbo.Disorganized
	(Bit1, Bit2, Bit3, Bit4, Bit5, TinyInt1, TinyInt2, TinyInt3, TinyInt4, TinyInt5)
	SELECT 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
	FROM GENERATE_SERIES(1, 10000000);


INSERT INTO dbo.Organized
	(Bit1, Bit2, Bit3, Bit4, Bit5, TinyInt1, TinyInt2, TinyInt3, TinyInt4, TinyInt5)
	SELECT 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
	FROM GENERATE_SERIES(1, 10000000);
GO
	
EXEC sp_BlitzIndex @Mode = 2;

CREATE DATABASE TestColumnAlignment;

USE TestColumnAlignment;

CREATE TABLE dbo.Disorganized

(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

TinyInt1 TINYINT,

Bit1 BIT,

TinyInt2 TINYINT,

Bit2 BIT,

TinyInt3 TINYINT,

Bit3 BIT,

TinyInt4 TINYINT,

Bit4 BIT,

TinyInt5 TINYINT,

Bit5 BIT

);

CREATE TABLE dbo.Organized

(Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

Bit1 BIT,

Bit2 BIT,

Bit3 BIT,

Bit4 BIT,

Bit5 BIT,

TinyInt1 TINYINT,

TinyInt2 TINYINT,

TinyInt3 TINYINT,

TinyInt4 TINYINT,

TinyInt5 TINYINT

);

INSERT INTO dbo.Disorganized

(Bit1, Bit2, Bit3, Bit4, Bit5, TinyInt1, TinyInt2, TinyInt3, TinyInt4, TinyInt5)

SELECT 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

FROM GENERATE_SERIES(1, 10000000);

INSERT INTO dbo.Organized

(Bit1, Bit2, Bit3, Bit4, Bit5, TinyInt1, TinyInt2, TinyInt3, TinyInt4, TinyInt5)

SELECT 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

FROM GENERATE_SERIES(1, 10000000);

EXEC sp_BlitzIndex @Mode = 2;

sp_BlitzIndex shows that both tables have the same size:

If we add another bit column to both tables:

ALTER TABLE dbo.Disorganized
	ADD Bit6 BIT;

ALTER TABLE dbo.Organized
	ADD Bit6 BIT;
GO
EXEC sp_BlitzIndex @Mode = 2;
GO

ALTER TABLE dbo.Disorganized

ADD Bit6 BIT;

ALTER TABLE dbo.Organized

ADD Bit6 BIT;

EXEC sp_BlitzIndex @Mode = 2;

Size still remains the same:

Because SQL Server’s just making a metadata-only change, noting that 1 of the 8 bits in the bit-designated space is now available for use by the new Bit6 column. To really drive that point home, let’s go back and update the new Bit6 column to be 1:

UPDATE dbo.Disorganized SET Bit6 = 1;
UPDATE dbo.Organized SET Bit6 = 1;
GO
EXEC sp_BlitzIndex @Mode = 2;
GO

UPDATE dbo.Disorganized SET Bit6 = 1;

UPDATE dbo.Organized SET Bit6 = 1;

EXEC sp_BlitzIndex @Mode = 2;

And then check the space used again:

Yep, still 194MB. Good work, Microsoft.

The more I learn about Postgres, the more I appreciate so many little things that Microsoft has done over the years for performance & space optimization. The one that’ll really surprise you is that Postgres still doesn’t have table or index compression yet, although it does offer value-level compression.

Updated First Responder Kit and Consultant Toolkit for July 2024

Last Updated July 1, 2024

Brent Ozar

First Responder Kit Updates

The next release would normally be August, but I’m doing a release a little early in order to fix 2 issues with the Consultant Toolkit. The May release broke the ability to automatically upload results to Amazon S3, and didn’t include all of sp_BlitzLock’s results. Those are now fixed, so I wanted to get another release out quickly so consultants don’t have to keep using the old version.

If you’ve hard-coded First Responder Kit installer file names, there was a big recent change. There are now just 2 installer scripts: Install-All-Scripts.sql, and a new Install-Azure.sql, which only installs the scripts that are compatible with Azure SQL DB. The old Install-Core scripts are gone because we’ve deprecated sp_AllNightLog, sp_BlitzInMemoryOLTP, and sp_BlitzQueryStore. Read on for why.

How I Use the First Responder Kit — Wanna watch me use it? Take the class.

To get the new version:

Download the updated FirstResponderKit.zip
Azure Data Studio users with the First Responder Kit extension:
ctrl/command+shift+p, First Responder Kit: Import.
PowerShell users: run Install-DbaFirstResponderKit from dbatools
Get The Consultant Toolkit to quickly export the First Responder Kit results into an easy-to-share spreadsheet

Consultant Toolkit Changes

We fixed the problems with AWS S3 uploads and sp_BlitzLock results that were introduced by May’s big changes. As a reminder, here’s what we did in May because the changes were pretty big:

Supports Microsoft Entra multi-factor authentication (MFA)
Automatically retries failed connections, allowing you to more easily gather data from Azure SQL DB Serverless databases that auto-paused
No longer requires create-table permissions, so it works better in environments where you can pull diagnostic data but not see database contents
Requires .NET Desktop Runtime 7 or higher on the machine where you run the Consultant Toolkit, typically your jump box or laptop (not the SQL Server itself)

sp_Blitz Changes

Enhancement: several checks for Query Store settings & trace flags. (#3531, #3535, #3536, thanks Reece Gooding)

sp_BlitzLock Changes

Enhancement: add @DeadlockType parameter to filter for just regular or parallelism deadlocks. (#3525, thanks Erik Darling)
Enhancement: add max CPU and elapsed times. (#3545, thanks Erik Darling)

sp_DatabaseRestore Changes

Enhancement: add @EnableBroker parameter to turn on Service Broker after a restore. (#3532, thanks John McCall)

Deprecating sp_BlitzInMemoryOLTP, sp_BlitzQueryStore, and sp_AllNightLog

sp_BlitzQueryStore was originally written by Erik Darling when he worked here. He’s moved on to start his own excellent company, plus his own sp_QuickieStore. You should be using that instead.

sp_BlitzInMemoryOLTP was always kinda distributed as a courtesy – the real home for it is in KTaranov’s Github repository, and you can still find it there. It hasn’t been updated in over 6 years, and I’ve never seen anyone using it, so I’m removing it to streamline support issues.

sp_AllNightLog was a ton of fun when we built it several years ago, but it’s consistently had a problem. Companies start using it, then decide they want to build something even more ambitious, typically a C# service with robust error handling and scheduling. sp_AllNightLog isn’t the kind of thing I want to encourage beginners to use – it’s complex.

So when Reece Goding started working on cleaning up sp_BlitzQueryStore’s code, I decided that now was the time to deprecate stuff we no longer use or recommend. You’re definitely welcome to continue to use ’em if you get value out of ’em! I’ve going to move these procs into the Deprecated folder, plus simplify the installation scripts. For the rest of 2024, the only installer script will be Install-All-Scripts.sql.

For Support

When you have questions about how the tools work, talk with the community in the #FirstResponderKit Slack channel. Be patient: it’s staffed by volunteers with day jobs. If it’s your first time in the community Slack, get started here.

When you find a bug or want something changed, read the contributing.md file.

When you have a question about what the scripts found, first make sure you read the “More Details” URL for any warning you find. We put a lot of work into documentation, and we wouldn’t want someone to yell at you to go read the fine manual. After that, when you’ve still got questions about how something works in SQL Server, post a question at DBA.StackExchange.com and the community (that includes me!) will help. Include exact errors and any applicable screenshots, your SQL Server version number (including the build #), and the version of the tool you’re working with.

[Video] Office Hours: I Don’t Have All the Answers

Last Updated June 5, 2024

Brent Ozar

Videos

When I went through the top-voted questions at https://pollgab.com/room/brento, I didn’t have all the answers today:

Here’s what we covered:

00:00 Start
00:48 MyTeaGotCold: I increased my RAM from 16 GB to 32 GB and my buffer pool has grown, but my PAGEIOLATCH_SH waits went up by two percentage points. What might that indicate?
01:46 DavDBA: Hey Brent, Sp_Blitz is recommending that I change the target recovery interval from 0 to 60. I want to implement this, but I read in the Microsoft documentation that it can cause extra I/O activity on certain systems. Have you encountered such issues?
02:32 John: A friend of mine has several hundred SQL AG’s used for DR, not HA. These are all manual failover, async commit. They want to remove WFSC, implementing Read-Scale AGs instead. Is this risky on such a large scale? Are there serious gotchas to be concerned about?
03:30 WindowsMillenniumWasAnInsideJob: Conspiracy theory: Microsoft won’t take performance issues in SQL Server seriously until Azure SQL DB won’t have majority of the market share (vs boxed editions). Only then Microsoft will fix them because it will be in their interests to do so. What do you think about it?
04:55 SQL_TheOcean: Is there a way to copy the Managed instance databases across a different region in Azure.
05:32 Eduardo: What is your opinion of Natural language to SQL query in Azure SQL DB? Is this a game changer?
06:45 Steve E: Hi Brent, When query tuning, I often find myself trying to fix a bad estimate and wondering how SQL Server made an estimate. Is there a trace flag or tool that shows how SQL server came up with estimates on plan operators – what stats it used, what formulas it applied etc. Thanks
07:32 ConfusedDBA: My friend is using spatial index on a table with less than 10k rows. He does not have a problem with this table but often times he hears that the usage of spatial index/data type will cause issues. Are those remarks valid? If yes, what kind of problem he can expect in the future?
08:23 SQLBaller: Have you seen it where when running a stored procedure it will complete in two seconds? However, if you run that same stored procedure with include actual execution plan it takes ten minutes to run.
09:36 Need4Speed: Is there a way to optimize a table strictly used for inserts of archival records? The table currently has 62 million records and only has a primary key. I have noticed that my process for inserts is running a little slow and was wondering if there was anything I could check.
10:21 ChompingBits: My friend has a huge vendor owned application. The vendor’s configuration makes use of Resource Governor which frequently kills their OLA DBCC runs, and doing so in a way that creates a false positive for disk errors. Is there a way to verify RG killed a process?
10:57 i_use_uppercase_for_SELECT: Do you have any recommendations or best practices for query store when restoring to lower environments with a different database name? The plans seem to include the database name and fail forcing plans.
11:50 themoderndba: I had you mention in a prior podcast a Microsoft guide for faster data loads to a database. What is the exact name of the guide? I am interest in this for a current ETL task.

Join Me in Seattle for Watch Brent Tune a Query in SQL Server 2022

Last Updated June 19, 2024

Brent Ozar

#SQLPass

The PASS Data Community Summit session lineup for Nov 4-8 in Seattle was just announced, and I’ve been selected for a general session!

Watch Brent Tune a Query in SQL Server 2022: Ever wonder how somebody else does it? In this all-demo session, watch over Brent Ozar’s shoulder while he takes a slow stored procedure in the Stack Overflow database, analyzes it, and iterates over several improvements trying to make it go faster. He’ll explain his thought process as he goes, and get feedback from the audience on what they’d try as well.

Goals:

Learn to use sp_BlitzCache to identify which queries in a proc need tuning first
Understand the use of query hints to test possible database setting changes
See how to measure the overhead of query hints like recompiles

Prerequisites: You should be comfortable writing T-SQL, reading execution plans, and using STATISTICS IO to measure logical reads.

Register now because discounted early bird registration prices are available til July 9th:

3-day conference ticket $1,895: includes the Weds/Thurs/Fri general sessions like mine above
Use referral code BRENTO24 for another $150 off that 3-day price
Pre-conference workshops: $595 each, like my Tuning T-SQL for SQL Server 2019 and 2022 on Monday

See you in Seattle!

The SQL Language is Just a Hurdle You Gotta Overcome.

Last Updated June 25, 2024

Brent Ozar

T-SQL

I’m not talking just about Microsoft SQL Server specifically here, nor T-SQL. Let’s zoom out a little and think bigger picture for a second: is the SQL language itself a problem?

Sometimes when I talk to client developers, they gripe about the antiquated language.

The order of a SELECT statement doesn’t make any sense. You shouldn’t state what you’re looking for, before you even say where you wanna get the data from. The FROM should really go first so that query-completion tools like IntelliSense have a fighting chance to help you write the SELECT part. If we started writing our queries like this:

FROM dbo.Users u
INNER JOIN dbo.Posts p ON u.Id = p.OwnerUserId
SELECT ...

FROM dbo.Users u

INNER JOIN dbo.Posts p ON u.Id = p.OwnerUserId

SELECT ...

Then as you started typing stuff in the SELECT, you could actually get useful stuff out of IntelliSense. How many times have you started typing a query, and query completion tools start throwing all kinds of system functions at you? Idiotic.

Exception handling is a painful mess. Let’s be honest here: the majority of stored procedures and functions out there don’t have error handling. They YOLO their way through the data, hoping and praying that things are as we expect, we have the right permissions, structures haven’t changed, and the data is in a useful state. Everybody looks the other way and mumbles, “We’ll handle errors on the application side,” when in reality those errors are either thrown directly at the innocent user, or simply suppressed and not logged anywhere.

It’s not really a standard. Oh sure, SELECT/FROM/WHERE/ORDER BY works in most databases, but even trivially simple applications break if you try to port them from one database management system to another. Your skills transfer in a similar way: even if you’re great at T-SQL exception handling, you’re still gonna have to tweak the way you do it in Postgres. The concepts are standard, but the specifics are different.

Unit testing is a pipe dream. App code developers know if their code changes will break something. Database developers just punt their stuff into development, run the query a few times, nod because no errors get thrown, and then toss it into production. When code breaks weeks or months later, all we hear is, “Nothing’s been changed.”

So why haven’t we moved on past SQL?

In some ways, we have, with object-relational mapping (ORM) tools like Entity Framework, Hibernate, and Django. The database administrator readers here in the audience usually cringe when they hear those words, but the reality is that developers leverage those tools heavily to build new applications. I don’t blame them. I would too, for all the reasons I talked about above.

What those tools do is translate your desires into SQL, though, which brings us right back where we started. Often, the SQL they generate sucks for performance, thus the typical DBA’s feelings about ORMs. So why haven’t we got a new standard way for applications to talk directly to databases, in a secure, performant, and easy-to-write way?

It’s not for lack of trying: at least once every 6 months, I see a post on HackerNews about a better replacement for SQL. Someone puts a lot of thought into the problems, puts a lot of work into a replacement, and then proudly announces it.

And nobody uses it.

Because SQL is the lowest common denominator that works damn near everywhere, for values of “works.”

It works on the back end. Remember when NoSQL came out, and everybody was all “databases r doomd”? And remember what business users said when they wanted to run their reports? NoSQL persistence layers pretty quickly changed their tune, saying, “Oh, well, uh, we meant Not Only SQL, that’s what we meant,” as they struggled to quickly slap in SQL compatibility. Even MongoDB, king of NoSQL, implemented SQL support.

It works on the front end, especially the reporting front end, which is what managers care about. The people who sign the checks wanna see their data in Power BI and Excel. Every new reporting tool that comes out, in order to check boxes and say they’re compatible with every database, implements SQL support. Oh sure, these tools write horrific queries, but they check the box to say they can get data out of all your different persistence layers, and they do it with SQL first because it’s cheap and easy to support lots of databases that way.

I’ll leave you with an amusing quote from Bjarne Stroustrup:

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

[Video] Office Hours: Database Answers from a Hotel Room

Last Updated June 5, 2024

Brent Ozar

Videos

I went through your top-voted questions from https://pollgab.com/room/brento before heading out to PGConf.dev in Vancouver.

Here’s what we covered:

00:00 Start
01:43 Poul J: Hi Brent. Can you give some examples of how a CHECK() constraint is used by the optimizer. Is it similar to a filtered index… Or is there more to it?
03:54 RadekG: Hi Brent, Could you explain when high wait statistics of parallelism type indicate a problem? I wander what is even a point in monitoring them… (but I am almost sure that I am missing something obvious here)
04:51 Ricardo: My secret-sauce as a DBA was getting sophisticated work done through the GUI (EG: Always On). Now with Azure Portal, Databricks, etc, the GUI seems to change daily. Do you think the days of the GUI are numbered? (the days of knowing a GUI intimately, like an engine bay).
05:45 tibbler: Hi, imagin you have complex structured data in a database. You’d like to archive them for long term, the structure should be preserved too. Would you recommand a database for this purpose?
06:57 Ricardo: When performance tuning what ball-park figures do you use relating [time] to [rows returned]. EG: What is a reasonable amount of time to return 200,000 rows in a report?
08:48 Ozan: Hi Brent, when using SQLQueryStress with enough number of threads and iterations to get THREADPOOL waits, i can still see enough available worker threads after summing up the active_workers_count field in dm_os_schedulers. How is that possible? Thanks
09:50 Andrew: How transferrable is Oracle DBA experience to the Microsoft stack? I’m reviewing job applications for someone on my team, we use the Microsoft stack, but have a few applicants with years of Oracle experience – do you see the concepts as equivalent or transferrable?

Query Exercise Answer: Beating ChatGPT at Finding Good Question Times

Last Updated June 15, 2024

Brent Ozar

Query Exercise Answers

For this week’s Query Exercise, I asked you to write a better query than ChatGPT wrote. Your goal was to find the best days and times to post questions on Stack Overflow.

I found it interesting that a lot of the initial answers focused on the times when there were the most questions, or which questions were the most highly upvoted. For me, the best time to post a question is when you have the highest likelihood of getting the right answer, quickly.

When someone posts a question, they can accept an answer as the right one. You can see it by looking for checkmarks next to an answer. The checkmark indicates that the answer was accepted by the original question-asker.

The accepted answer may not be the best one overall, especially as additional answers come in later over time. However, the accepted answer was good enough for the person who asked the question – and when I’m asking a question, that’s what my goal is, to get an answer that’s good enough to solve my problem, and move on.

In the Posts table where questions & answers are stored, there’s an AcceptedAnswerId column. If a question has an Id in the AcceptedAnswerId, then that’s the Posts.Id for the answer row.

Let’s try this:

SELECT TOP 100 DATENAME(weekday, pQ.CreationDate) AS DayOfWeek,
	DATEPART(hh, pQ.CreationDate) AS HourOfDay,
	AVG(DATEDIFF(mi, pQ.CreationDate, pA.CreationDate)) AS AvgMinutesToAnswer,
	SUM(1) AS QuestionsAnswered
FROM dbo.Posts pQ
INNER JOIN dbo.Posts pA ON pq.AcceptedAnswerId = pA.Id
WHERE pQ.PostTypeId = 1 /* Question, but not really needed */
GROUP BY DATENAME(weekday, pQ.CreationDate),
	DATEPART(hh,pQ.CreationDate)
ORDER BY AVG(DATEDIFF(mi, pQ.CreationDate, pA.CreationDate)),
	SUM(1) DESC;

SELECT TOP 100 DATENAME(weekday, pQ.CreationDate) AS DayOfWeek,

DATEPART(hh, pQ.CreationDate) AS HourOfDay,

AVG(DATEDIFF(mi, pQ.CreationDate, pA.CreationDate)) AS AvgMinutesToAnswer,

SUM(1) AS QuestionsAnswered

FROM dbo.Posts pQ

INNER JOIN dbo.Posts pA ON pq.AcceptedAnswerId = pA.Id

WHERE pQ.PostTypeId = 1 /* Question, but not really needed */

GROUP BY DATENAME(weekday, pQ.CreationDate),

DATEPART(hh,pQ.CreationDate)

ORDER BY AVG(DATEDIFF(mi, pQ.CreationDate, pA.CreationDate)),

SUM(1) DESC;

The results in the 2018-06 version of the Stack Overflow database:

It’s looking like weekend mornings are the best times to post questions – but even then, it takes a week to get to a good answer! You might think (or at least I did), “Well, if it takes a whole week, does it even make a difference when I post the question?” Let’s flip the sort order and look for the worst times:

That’s kinda wild – it’s weekday afternoons! (We’ll set time zones aside for this, but that’s a whole ‘nother exercise.) That made me wonder: if we only group the data by day of week, what does it look like?

SELECT DATENAME(weekday, pQ.CreationDate) AS DayOfWeek,
	AVG(DATEDIFF(hour, pQ.CreationDate, pA.CreationDate)) AS AvgHoursToAnswer,
	SUM(1) AS QuestionsAsked,
	COUNT(DISTINCT pA.Id) AS QuestionsAnswered
FROM dbo.Posts pQ
LEFT OUTER JOIN dbo.Posts pA ON pq.AcceptedAnswerId = pA.Id
WHERE pQ.PostTypeId = 1 /* Question, but not really needed */
GROUP BY DATENAME(weekday, pQ.CreationDate)
ORDER BY AVG(DATEDIFF(hour, pQ.CreationDate, pA.CreationDate))

SELECT DATENAME(weekday, pQ.CreationDate) AS DayOfWeek,

AVG(DATEDIFF(hour, pQ.CreationDate, pA.CreationDate)) AS AvgHoursToAnswer,

SUM(1) AS QuestionsAsked,

COUNT(DISTINCT pA.Id) AS QuestionsAnswered

FROM dbo.Posts pQ

LEFT OUTER JOIN dbo.Posts pA ON pq.AcceptedAnswerId = pA.Id

WHERE pQ.PostTypeId = 1 /* Question, but not really needed */

GROUP BY DATENAME(weekday, pQ.CreationDate)

ORDER BY AVG(DATEDIFF(hour, pQ.CreationDate, pA.CreationDate))

I added a couple more columns because the results are pretty conclusive:

Post your questions on the weekends. Sure, there are way less questions coming in at that time – but that’s also when you get more eyeballs on your questions because you have less competition. You’re more likely to get a good answer, faster, when you’re not competing with other questions.

Is the moral of the story that ChatGPT’s answer was bad? No, or at least, no worse than some of the answers us meatbags came up with initially. I think the key to asking a good data question is to keep following up with more questions. What do the query results show? Where do we think the loopholes are? What’s the real business objective that we’re trying to achieve? How do we gauge the accuracy of an answer?