Blog

Sometimes SQL is the presentation layer

And when it is, you end up doing a lot of concatenating. This isn’t about performance, or trying to talk you out of SQL as the presentation layer, this is just something you should keep in mind. SQL is a confusing language when you’re just starting out. Heck, sometimes it’s even confusing when you’ve been doing it for a long time.

Let’s say your have a website that stores files, and when a user logs in you use a temp table to track session actions as a sort of audit trail, which you dump out into a larger table when they log out. Your audit only cares about folders they have files stored in, not empty ones.

Here’s a couple tables to get us going.

IF OBJECT_ID('tempdb..#aggy') IS NOT NULL
DROP TABLE #aggy;

WITH x1 AS (
SELECT TOP (100)
ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) AS ID
FROM sys.[messages] AS [m], sys.[messages] AS [m2])
SELECT ID, 
    DATEADD(DAY, [x1].[ID] * -1, CAST(GETDATE() AS DATE ) ) [CreateDate],
    'C:\temp\' + CAST(HASHBYTES('MD5', NCHAR([x1].[ID])) AS VARCHAR(32)) + '.gif' [Path]
INTO #aggy
FROM [x1];

IF OBJECT_ID('tempdb..#usersessioninfo') IS NOT NULL
DROP TABLE #usersessioninfo;
CREATE TABLE #usersessioninfo 
(LastActionID INT IDENTITY(1,1), UserID INT, UserMessage VARCHAR(100), MessageDetails VARCHAR(100))

IF OBJECT_ID('tempdb..#aggy') IS NOT NULL

DROP TABLE #aggy;

WITH x1 AS (

SELECT TOP (100)

ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) AS ID

FROM sys.[messages] AS [m], sys.[messages] AS [m2])

SELECT ID,

DATEADD(DAY, [x1].[ID] * -1, CAST(GETDATE() AS DATE ) ) [CreateDate],

'C:\temp\' + CAST(HASHBYTES('MD5', NCHAR([x1].[ID])) AS VARCHAR(32)) + '.gif' [Path]

INTO #aggy

FROM [x1];

IF OBJECT_ID('tempdb..#usersessioninfo') IS NOT NULL

DROP TABLE #usersessioninfo;

CREATE TABLE #usersessioninfo

(LastActionID INT IDENTITY(1,1), UserID INT, UserMessage VARCHAR(100), MessageDetails VARCHAR(100))

And then we’ll stick some data into our session table like this.

INSERT [#usersessioninfo]
( [UserID] , [UserMessage] , [MessageDetails] )
SELECT 
@@SPID AS [UserID],
'Welcome to your folder!' AS [UserMessage],
'You have stored #' +
CAST(COUNT(*) AS VARCHAR(100)) +
' files in the last 30 days, starting on ' + 
CAST(MIN([a].[CreateDate]) AS VARCHAR(20)) + 
' ending on ' +
CAST(MAX([a].[CreateDate]) AS VARCHAR(20)) +
'.' AS [MessageDetails]
FROM [#aggy] AS [a]
WHERE [a].[CreateDate] >= GETDATE() -30

INSERT [#usersessioninfo]

( [UserID] , [UserMessage] , [MessageDetails] )

SELECT

@@SPID AS [UserID],

'Welcome to your folder!' AS [UserMessage],

'You have stored #' +

CAST(COUNT(*) AS VARCHAR(100)) +

' files in the last 30 days, starting on ' +

CAST(MIN([a].[CreateDate]) AS VARCHAR(20)) +

' ending on ' +

CAST(MAX([a].[CreateDate]) AS VARCHAR(20)) +

'.' AS [MessageDetails]

FROM [#aggy] AS [a]

WHERE [a].[CreateDate] >= GETDATE() -30

Everything looks great!

Select max blah blah blah

But if your table is empty…

You may find yourself with a bunch of junk you don’t care about! Empty folders. Contrived examples. Logic problems. Stay in school.

TRUNCATE TABLE [#aggy]

INSERT [#usersessioninfo]
( [UserID] , [UserMessage] , [MessageDetails] )
SELECT 
@@SPID AS [UserID],
'Welcome to your folder!' AS [UserMessage],
'You have stored #' +
CAST(COUNT(*) AS VARCHAR(100)) +
' files in the last 30 days, starting on ' + 
CAST(MIN([a].[CreateDate]) AS VARCHAR(20)) + 
' ending on ' +
CAST(MAX([a].[CreateDate]) AS VARCHAR(20)) +
'.' AS [MessageDetails]
FROM [#aggy] AS [a]
WHERE [a].[CreateDate] >= GETDATE() -30

TRUNCATE TABLE [#aggy]

INSERT [#usersessioninfo]

( [UserID] , [UserMessage] , [MessageDetails] )

SELECT

@@SPID AS [UserID],

'Welcome to your folder!' AS [UserMessage],

'You have stored #' +

CAST(COUNT(*) AS VARCHAR(100)) +

' files in the last 30 days, starting on ' +

CAST(MIN([a].[CreateDate]) AS VARCHAR(20)) +

' ending on ' +

CAST(MAX([a].[CreateDate]) AS VARCHAR(20)) +

'.' AS [MessageDetails]

FROM [#aggy] AS [a]

WHERE [a].[CreateDate] >= GETDATE() -30

What do you think is going to happen? We truncated the table, so there’s nothing in there. Our WHERE clause should just skip everything because there are no dates to qualify.

NULLs be here!

Darn. Dang. Gosh be hecked. These are words I really say when writing SQL.

That obviously didn’t work! You’re gonna need to do something a little different.

Having having bo baving banana fana fo faving

One of the first things I was ever really proud of was using the HAVING clause to show my boss duplicate records. This was quickly diminished by him asking me to then remove duplicates based on complicated logic.

Having is also pretty cool, because it’s processed after the where clause, so any rows that make it past there will be filtered out later on down the line. For our purposes, it will keep anything from being inserted, because our COUNT is a big fat 0. Zero. Zer-roh.

INSERT [#usersessioninfo]
( [UserID] , [UserMessage] , [MessageDetails] )
SELECT 
@@SPID AS [UserID],
'Welcome to your folder!',
'You have # ' +
CAST(COUNT(*) AS VARCHAR(100)) +
' files, starting on ' + 
CAST(MIN([a].[CreateDate]) AS VARCHAR(20)) + 
' ending on ' +
CAST(MAX([a].[CreateDate]) AS VARCHAR(20)) +
' in the last 30 days.'
FROM [#aggy] AS [a]
WHERE [a].[CreateDate] >= GETDATE() -30
HAVING COUNT(*) > 0

INSERT [#usersessioninfo]

( [UserID] , [UserMessage] , [MessageDetails] )

SELECT

@@SPID AS [UserID],

'Welcome to your folder!',

'You have # ' +

CAST(COUNT(*) AS VARCHAR(100)) +

' files, starting on ' +

CAST(MIN([a].[CreateDate]) AS VARCHAR(20)) +

' ending on ' +

CAST(MAX([a].[CreateDate]) AS VARCHAR(20)) +

' in the last 30 days.'

FROM [#aggy] AS [a]

WHERE [a].[CreateDate] >= GETDATE() -30

HAVING COUNT(*) > 0

This inserts 0 rows, which is what we wanted. No longer auditing empty folders! Hooray! Everybody ~~dance~~ drink now!

Mom will be so proud

Not only did you stay out of jail, but you wrote some SQL that worked correctly.

Thanks for reading!

SQL Interview Question: “Talk me through this query.”

Last Updated April 24, 2016

Interviewing, SQL Server

Last month’s post “For Technical Interviews, Don’t Ask Questions, Show Screenshots” was a surprise hit, and lots of folks asked for more details about the types of screenshots I’d show. Over the next few weeks, I’ll share a few more.

Normally I’d show this query as a screenshot, but for easier copy/pasting into comments, I’m showing it as code here.

CREATE PROC dbo.usp_ByCategory @Category NVARCHAR(20) AS
IF @Category = NULL SET @Category = 'Default'
SELECT i.itemID, i.itemName,
   COALESCE(po.Price, i.Price, 0) AS Price
FROM Items I
   LEFT OUTER JOIN PriceOverrides po
   ON i.itemID = po.itemID
   AND po.SaleStartDate &gt;= GETDATE()
   AND po.SaleEndDate &lt;= GETDATE()
WHERE i.Category = @Category

CREATE PROC dbo.usp_ByCategory @Category NVARCHAR(20) AS

IF @Category = NULL SET @Category = 'Default'

SELECT i.itemID, i.itemName,

COALESCE(po.Price, i.Price, 0) AS Price

FROM Items I

LEFT OUTER JOIN PriceOverrides po

ON i.itemID = po.itemID

AND po.SaleStartDate >= GETDATE()

AND po.SaleEndDate <= GETDATE()

WHERE i.Category = @Category

I’d say to the job candidate, “You’ve been asked to take a quick look at this code as part of a deployment. Explain what the business purpose of the code is, and tell me if there’s anything that concerns you.”

After a few days, I’ll follow up with my own thoughts in the comments.

SQL Server 2016 Release Date: June 1, 2016

Last Updated May 2, 2016

Microsoft just announced the SQL Server 2016 Release Date: June 1, 2016.

It’s the news we’ve all been waiting for!

This PDF lays out the differences between editions, and here’s a few points that stand out:

Standard Edition now goes up to 24 cores, and still just 128GB max memory
~~Query Store is Enterprise Edition only~~ (see update below)
Always Encrypted is Enterprise only, thereby killing its adoption rate among ISVs
In-memory analytics, R integration are Enterprise only
Business Intelligence Edition is gone with the wind
According to the newly released TPC-H benchmark Executive Summary, Enterprise Edition still costs around $7k USD per core

Great news! Let me know what you think in the comments.

UPDATE 6:30PM – Microsoft unveiled a more detailed feature comparison by edition, and this one says Query Store will be available in all editions (including Express!)

[Video] Office Hours 2016 2016/04/27

Last Updated June 7, 2020

Brent Ozar Unlimited Team

SQL Server, Videos

This week, Brent, Erik, Jessica, Angie, and Tara discuss SQL service packs, partial restores, breaking replication, backups, as well as their favorite TV shows.

Here’s the video on YouTube:

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

If you prefer to listen to the audio:

Transcript:

Jessica Connors: Let’s start with the service pack question from Jay H. He asks, “When patching applying SQL service pack to passive node in a cluster should one put into pause mode or okay to simply run the service pack?”

Tara Kizer: I’ve never put it into pause mode. I’ve patched hundreds, maybe thousands of servers.

Brent Ozar: Well but do you patch the passive? I mean you patch the passive and there’s no services running on it so you really don’t have to worry about it.

Tara Kizer: Yeah, nope.

Brent Ozar: Yeah, am I going to have to worry about something failing over to it in the middle of the service pack? I’ve never done it either. Talk about a first-world problem there. So I’ve never paused it. I don’t know, if I was probably going to get ambitious, like PowerShell script the whole thing out, I would probably put in some kind of pause there but I’m just not that guy.

Jessica Connors: What’s this about Terabyteasaurus Rex?

Brent Ozar: That is a presentation that I do about very large databases. There’s things that you want to do to first avoid very large databases. Then once you have one, how you do to cope with the aftermath. It’s like having your own little pet dinosaur. You have to plan for things ahead of time and don’t bite off more than you can chew.

Angie Walker: Like a Tamagotchi.

Brent Ozar: Like a Tamagotchi, yes. Only you can’t carry it around in your pocket.

Jessica Connors: Is that a … presentation you’re doing coming up?

Brent Ozar: That one’s actually from our regular training classes. I’m doing it for IDERA. IDERA pays us to periodically do free presentations for the community. We give them like our menu of private training courses and just go, “Here, which one do you want to buy for the community?” They go buy one and we give it out for free to everybody and off we go. So it’s really awesome how we work that with vendors.

Jessica Connors: Nice.

Brent Ozar: Everybody wins.

Jessica Connors: All right. This question is pretty vague from Abdullah. He says, “Hello. We have just received new hardware with 124 cores and 1 terabyte of memory to host many of our SQL instances.”

Tara Kizer: Dang.

Brent Ozar: Big spender.

Erik Darling: Bad news: the only licensed …edition.

[Laughter]

Brent Ozar: Wow. One word, virtualization. Virtualization. Normally, I’d never buy a host that big but I just wouldn’t do instance stacking. I wouldn’t run one Windows OS and then a lot of instance stacking. Have you guys ever run multiple instances on a server and what problems have you seen from it?

Erik Darling: Not willingly. I’ve inherited them and it’s always been like, okay, we’re going to really stifle this one which is less important because this other one is just choking it out anyway.

Tara Kizer: I used to have a four node cluster with eleven SQL instances on it.

Brent Ozar: Oh.

Tara Kizer: Yeah, the biggest challenge was installing patches. There was always some node that suddenly had to be rebooted. This was back on before 2008. It was horrible. Even after you rebooted the server it would say, “Oh, that server needs a reboot.” It would take an act, a miracle, for all four nodes to agree it’s now time to patch. I would reboot like 20 times before it would say it. It was horrible.

Jessica Connors: Oh boy. Did you end up running them all? Did you end up consolidating?

Tara Kizer: We ended up upgrading to newer versions where you could install it passively. Back then the SQL instance had to be in the right state on all four instances, or not the SQL instance, but you had to patch all four instances at the same time. So all four had to agree that it was ready to be patched.

Brent Ozar: Miserable.

Tara Kizer: It was horrible.

Brent Ozar: Still even today Windows patching is a giant pain in the rear. Do you want to take down all of the instances at once just in order to patch Windows? Windows patches come out kind of fast and furious. So virtualization adds that little bit layer of lower performance, and I’m not even going to go down the is virtualization slower or not. But worst case scenario, it’s a little slower and you deal with that. But then you get so much better management. Holy smokes. It’s way easier.

Jessica Connors: All right. Let’s talk about partial restores. “Can you recommend a good article on doing partial restores? I want to move my historical data to a separate NDF file so I can restore just the active portion right away so the users can use the database then restore the historical part.”

Tara Kizer: Do we have an article? I mean I know someone who does a session.

Brent Ozar: Who?

Tara Kizer: I’ve seen it twice.

Brent Ozar: Is it available publicly? Is there somewhere, a blog we can link to?

Tara Kizer: I don’t know. I’ve sure she has a blog. Kimberly Tripp has a whole session on it. At patch she’ll bring a USB, all these USBs, and she’ll unplug them to simulate losing a drive and the database stays online because it’s all in memory. It’s really cool but it has to do with the partial restores as well.

Brent Ozar: I bet if you go to hit Google and you do like “partial restore site:SQLSkills.com” I bet she’s got blog posts on it too because it’s one of her more famous demos. We talk about it in our training classes but I was just sitting there thinking, I don’t think we have a single public post on it.

Erik Darling: I was working on something similar when I was messing around with some foreign key stuff but SQL outsmarted me so I didn’t end up writing the post.

Tara Kizer: Erik will take care of that in the next 15 minutes though.

[Laughter]

Brent Ozar: Erik “the blogger” Darling.

Erik Darling: Oh, get out of here.

Brent Ozar: There’s Bob Pusateri, @SQLBob on Twitter. Bob Pusateri has an article too on his site to help move your stuff into a historical file group. He has really nice scripts that help you rebuild objects onto other file groups which is way trickier than it looks like. There’s things like off-row data that don’t move by default.

Erik Darling: If you’re on a version prior to 2012, a lot of that stuff is still offline, especially for large varchar and varchar types.

Jessica Connors: All right. A question from John. He says, “When looking at statistics properties, it gives the date and time when statistics were last updated. How can I tell if that statistic’s object was last updated using a full scan or a sample scan and what that sample scan value was?”

Erik Darling: I know all of this.

Jessica Connors: Do you?

Erik Darling: Off the top of my head. So if you run DBCC SHOW_STATISTICS on the table and index that you’re interested in, you can either run the whole command or you can run it with stat header which will just give you the top row of output. There will be two columns in the stat header part. One will be rows and one will be rows sampled. If rows equal rows sampled, then you did a full scan. If rows sampled is less than rows, then it used a percentage. You can also hit a function sys.dm_db_stats_properties. If you cross apply that with sys.stats, you can pass an object ID and the stats ID and that will also tell you rows sampled versus rows in the table. So you can figure out all that stuff there. If you want to calculate a percentage, just put a calculation in your queries. That’s the way to tell.

Jessica Connors: Thanks, Brent Ozar.

[Laughter]

Why are you still Brent Ozar?

Brent Ozar: My alter ego. He’s logging in first to start the webcast because I’m shlumpy, lazy, and I don’t show up on time. So god bless him.

Erik Darling: Brent is taking cold drugs and hanging around.

Brent Ozar: Yes.

Jessica Connors: Got ya. Okay, question from Cameron. He says, “If you want to purposely break replication…”

Brent Ozar: What?

Jessica Connors: Why would you purposely break replication?

Brent Ozar: Is he trying to drop the microphone on his way out the door?

Angie Walker: Should we continue answering this question?

[Laughter]

Jessica Connors: “Is it better to unsubscribe from the () database or should you delete the publication from the master?”

Brent Ozar: Oh, he wants to do a restore.

Tara Kizer: I haven’t used that term () database, I assume he’s referring to the subscriber for a restore.

Brent Ozar: Subscriber.

Tara Kizer: I just right-click on the publication and say drop it and it takes care of everything.

Brent Ozar: Like it’s hot.

Tara Kizer: But no, you don’t delete the publication from the… I’m confused by the terms that he’s using.

Brent Ozar: I bet, so one thing I would say is I would do Tara’s approach rather than trying to remove any subscriber’s setups because what happens if you leave the publisher’s setup and somebody doesn’t do the restore? Like I’m assuming you’re trying to restore the publisher, not the subscriber. You can leave a hosed up replication setup behind. So as long as there’s only one subscriber, just delete it at the publisher.

Tara Kizer: If you end up with the hose situation, you can run sp_removedbpublisher, something like that. It’s remove something. Remove db something. That will just clean up anything that was left behind.

Brent Ozar: That’s how you know somebody has worked with replication before.

Tara Kizer: Yes.

Brent Ozar: She’s like looking up the rest.

Tara Kizer: Bailed over to the DR site and forgot to drop replication beforehand and it like orphaned at that point. It’s like, what is that command?

Brent Ozar: Oh god.

Jessica Connors: Tara has the best stories.

Tara Kizer: Lost lots of sleep.

Brent Ozar: God bless.

Jessica Connors: Yeah, you could have your own podcast. Yeah, that’s what he says, he says it’s just to do a restore. He’s not in there just for the sake of breaking stuff now that we know his first and last name.

Tara Kizer: I know how to break replication too. He meant drop it.

Jessica Connors: All right. Let’s talk about stack dumps. Chris has a question. He says, “My SQL error logs show a stacked up. Total server memory is 16GBs, Mac memory is setup just above 13GBs, LPIM is enabled. Available memory went to zero just before the crash according to my monitoring software. I’m thinking I should lower the max memory or disable the LPIM. What do you think?”

Tara Kizer: You need to figure out what is using the memory but I don’t think your 2.5 is enough for the OS. The 2.5 I mean.

Brent Ozar: Yeah, when Tara says find out what’s using the memory, it’s not SQL Server, or at least not the engine. It could be other things in SQL Server like integration services or analysis services or whatever but you set max memory, you set lock pages, and memory turned on. So that’s cool. That amount is locked but now, and people will often say, “I should turn on lock pages and memory. That way I don’t have to worry if something else needs RAM.” Hell yeah you do. You just did. You just suffered a crash because of it. SQL Server couldn’t back down on the amount of memory it needed. So now your fun journey begins to go troubleshoot what is using that extra memory. What would you guys use in order to find out what tools or what apps are using the extra memory?

Tara Kizer: Well I had low memory on my laptop on Monday during a client session. So after I was finally able to investigate it after the call in either the app log or the system log, it told me what the top three processes that were using the memory. It was two SQL server instances and WebEx; those were the top three. I’m not too sure if that would be seen if your actual server crashed though, but maybe. There might be low memory alerts in there leading up to the crash.

Brent Ozar: You’re right on the thing in saying lock pages and memory, should I maybe turn it off. I would while I’m doing the investigating. Just leave it off just to prevent—because this other app is probably going to fire up again before you have the chance to fix it.

Jessica Connors: All right. So let’s talk about what to do when your backups fail. Fred has a question. He says, “Checkdb is successful and our backups always complete successfully but trying to restore the backup gives an error that the backup failed, the data is invalid. Any thoughts on where to look.” He’s running SQL 2008 R2 Enterprise. Not using any Enterprise-only features. Four backups later we usually get a good backup that we can restore from.

Erik Darling: My first question is are you running backups with checksum enabled and do you have page verification turned on for that database? Because you could at least narrow down where the issue is happening. So if you have page verification turned on, SQL will start writing checksums to your pages to make sure that nothing wonky happens to them on disk. Then if you run backups to the checksums, SQL will check those checksums as it does the full backup. So you at least have something verifying there that it’s not SQL and that it’s something with your disk media or like when you’re transferring files over that’s happening. The only time I’ve ever seen that happen was when I was restoring a backup on a dev server and it turned out that one of the disks on the dev server was actually suffering from some malady, some lergy. So that was the issue on that. So I would absolutely verify that it’s not something happening on the primary sequences and then my investigation would be on whatever hardware I have on the instance I’m trying to restore it to.

Brent Ozar: I’ve seen it when you write to crappy storage for your backups, like the backup reports writes successfully but then the data is trash when you go through to read it. But I would like to say, just like the great singer Meatloaf, two out of three ain’t bad. Four out of five successful backups, that’s not such a bad number. 80 percent, that’s a passing score. You probably didn’t need one out of five. It’s probably not that big of a deal. But yeah, I would just try immediately after the backup finishes, try doing a restore with verify only. Either from the same SQL Server or from another SQL Server and that will at least tell you if the backup’s file is hosed.

Jessica Connors: All right. Back to replication.

Brent Ozar: How come nobody ever says, “What do you guys think about flowers? What’s your favorite kind of chocolate?”

Erik Darling: How much do you like [inaudible: Casa Playa]?

Brent Ozar: It’s dreamy.

Jessica Connors: Let’s see. Question from John. He says, “Is it possible to mirror via transactional replication a SQL 2008 R2 database to a SQL 2016 database?”

Brent Ozar: Wow. I bet it would be.

Tara Kizer: I think so since 2016 goes all the way down to 2005 compatibility level.

Brent Ozar: Yeah, I bet you could.

Erik Darling: The only thing I’ve ever seen stand in the way is backwards stuff.

Tara Kizer: Mirroring and transactional replication, replication doesn’t really care about versions. Mirroring does.

Brent Ozar: Yeah, he should be fine.

Tara Kizer: Either way, it should be fine.

Jessica Connors: Can you just upgrade both to 2016?

Brent Ozar: I bet he’s so tired of running with his 2008 R2 box and he’s like just trying to give his users something that has nice, new functionality on 2016. That’s probably what it is. He’s like, “Here, go query over here. It’s really nice and fun.” Maybe he’s got nice nonclustered column store indexes on the table over there, make his queries fast. Maybe that’s what it is.

Jessica Connors: Kanye or Wanye West.

Brent: Wanye. [Laughter] Oh, Wayne you are never going to live that down, Wanye.

Jessica Connors: I think that’s a good question. Where is Richie?

Brent Ozar: Oh, he’s in Disneyworld.

Jessica Connors: Of course he is. He’s always getting lost at Disneyworld.

Tara Kizer: Driving home.

Angie Walker: Yeah, he’s on the road.

Jessica Connors: Let’s see here. Question from Jay H. He says, “Last year after applying a couple of particular Windows KB updates issues arose with JBDC TSL connections and had to be removed. Has Microsoft fixed this and can updates now be applied?”

Brent Ozar: I remember seeing Aaron Bertrand blog about this. This is one of those SSL and the connection string I’ve never paid too much attention to but I think Aaron Bertrand blogged about this. Other people are nodding like we vaguely have seen something along these lines.

Erik Darling: Yeah. I’ve just seen some stuff sitting around 2016 with TLS 1.1 and 1.2 having some weirdness bug things.

Brent Ozar: Yeah, we don’t touch it with a ten-foot pole. If you search for SQL Server TLS Aaron Bertrand and Aaron is A-A-R-O-N Bertrand, I bet you you’re going to find a blog post in there where he went into detail on that. Because like anything else with SQL Server updates Aaron Bertrand looks over those with a magnifying glass and a fine-tooth comb.

Erik Darling: In a kilt.

Brent Ozar: In a kilt. So Rusty Householder already replied with the answer, blogs.SQLSentry.com TLS support. I’m going to put that in the resources for everybody there. But yeah, it is a thing that Aaron blogged about.

Jessica Connors: Let’s see. This is the last comment/question. People have been pretty quiet today. From Chris Wood, he says, “Thanks for the help on the blocking.” You helped Chris with blocking?

Brent Ozar: I believe we did last week I think. I think we did.

Jessica Connors: Via Critical Care?

Brent Ozar: Oh no, it was a question about—I remember this. It was a database restore involving Relativity. He was doing a database restore on Relativity and I think we posted the question on Stack Exchange as well. SP who was active showed blocking and we couldn’t figure out which query it was that was doing the blocking. Turned out it was a system SPID that was a doing a full text crawl. So when you finished doing restore, it did a full text crawl and it locked some of the tables in the database. People weren’t allowed to access them. Awesome. Got to love that. People are like wow…

Erik Darling: I could have answered that one.

Brent Ozar: Oh could you, have you had that same problem?

Erik Darling: Yeah, embarrassingly.

Brent Ozar: Unbelievable.

Jessica Connors: Angie is getting a shout out. Were you posting on #SQLhelp?

Angie Walker: Nope. [Laughter] Apparently I have an impersonator. I didn’t think there were any other Angies out there. Oh, Tara, ah. It’s pretty hard to tell us apart, I know.

Brent Ozar: Just one of us cartoons, all our cartoons look like.

Erik Darling: I get mistaken for Jessica all the time.

[Laughter]

Erik Darling: They’re like, “Hey we need to buy some stuff.” I’m like…

Brent Ozar: Unsubscribe.

Jessica Connor: Yeah. Let’s see, Brent. Brent’s an awesome name. He says, “Any experience with Experian Correct Address which has to be installed on the database server for SQL CLR. Do you have any experience there?”

Brent Ozar: Oh, I’m vaguely remembering that this calls a web service. That it goes and validates people’s addresses. The way that it does it, whenever you want to like validate someone’s address you call an extended stored procedure, a CLR stored proc and it goes off and it calls a web service. So it’s possible that this is working on one node and not another because of firewall rules or network permissions. Windows Firewall, UAC, I mean, it could be almost anything that involves accessing the interwebs.

Erik Darling: I hate to say it but this is actually something that Master Data Services is good at.

Brent Ozar: Really?

Erik Darling: Yeah. You can do like address lookups and have like post office integration where you can get whatever like you know a post office valid address for a thing is, it can validate that. I don’t know a ton about it because I’ve only ever seen it in action a couple times but that’s actually something that Master Data Services does well which I feel filthy saying.

Brent Ozar: That’s a couple times more than me. I’m guessing it didn’t require CLR then, it was probably just stuff built into the database server?

Erik Darling: Yeah, but I don’t know how it was called so it still might have been CLR but it was integrated with Master Data Services. So it was like SQL friendly. It didn’t need to call out to anything else. It was already like built in somewhere.

Brent Ozar: Doing its thing.

Erik Darling: Yeah.

Jessica Connors: Let’s see. I think we talked about this last week. Nick Johnson, he says, “I found a couple articles that talk about how compatibility level 90 2005 does not work in SQL 2014. You guys have any confirmation on that even though 2014 in fact shows it, it doesn’t work.”

Erik Darling: You can’t upgrade directly, isn’t that it? Or is that from 2000?

Brent Ozar: I can’t remember either.

Angie Walker: I think in general, you can’t do more than two versions, right? You couldn’t go through 2005 to 2012 or straight to 2014. You’d have to make the hop in between.

Tara Kizer: You can’t restore but the compatibility level is there. So on 2014 you can go all the way down to 2005 compatibility level. Like he’s saying, the option is there but apparently some articles are saying it doesn’t work. I don’t know. I see it.

Brent Ozar: Yeah, I vaguely remember that during the release process, like it worked in SSMS but the ALTER didn’t work, like the ALTER DATABASE didn’t work. I think people thought, “It’s not going to work when it finally releases,” like that they’re going to yank 2005 compat. But I’m pretty sure it still does work because I distribute Stack Overflow in that format too and Doug was asking questions about that this week. It should work fine. I don’t know if it runs in 2016, if 2005 compat mode runs in 2016. No clue. And who the hell would use that? Why would I swear in the middle of a podcast? Who knows. Don’t do that. Don’t do that. By that, I mean 2005 compat mode, not swearing. You should totally swear. You’re a grown person.

[Laughter]

Jessica Connors: All right, Abdullah asks or states he’s in progress building a DR site on a multisite topology. Any recommendation for SAN replication?

Erik Darling: The fast one.

Tara Kizer: It’s going to be expensive.

Brent Ozar: And the cheap one.

[Laughter]

Brent Ozar: Yeah, you don’t get much of a choice. It’s whatever brand your SAN supports. I think EMC makes stuff as well, like appliances that will do between SANS. But cha-ching.

Erik Darling: Also make sure that you are deleting old snapshots and copies because you can find yourself—depending on like how big these copies are, deleting them could take a very long time so you want to make sure that you have enough room to both copy and have other stuff being deleted off. Because deleting 20 terabytes of SAN replication snapshots is time consuming no matter what.

Jessica Connors: All right, well I guess while we’re on the topic of that, do we like async DB mirroring as a DR strategy?

Tara Kizer: Yes.

Erik Darling: Yes.

Tara Kizer: I do. I used it for years. So when I joined a company three years ago, we were on SQL 2000 using log shipping and only upgraded 2005. We were relieved to get rid of log shipping just due to the amount of work it took to failover to DR site for a lot of servers. It was a lot of work. We were real excited about mirroring. We used asynchronous mirroring between our two sides. There was about 300 miles apart. We could not have done synchronous database mirroring because of the drastic performance degradation that you would have. But async mirroring worked great. We failed over to the DR site regularly, two, three times a year. Ran production out of it for a few weeks then failed back with it. It works great. All you have to do is when you want to failover, set it to synchronous mode, let it catch up, then do your failover and set it back to async.

Jessica Connors: There’s a Trump DBA handle on Twitter. Are you guys familiar with this?

Bret Ozar: Yes. I encourage humor in the SQL Server community. It’s not me doing it. That’s somebody else doing it. But I’m always like, if people want to have fun, that’s kind of cool.

Jessica Connors: What about DBA Reactions? Are you still involved with that?

Brent Ozar: I am. I took a brief hiatus from it while other people kept submitting stuff. Other people were submitting so many things. So I just went in and approved them all the time. Then of course I fell back in love with GIFs. I’m like, “Oh, let me go in and look and see because it’s been a while.” Oh my god, there’s so many good GIFs these days. So I started queueing them up again. I think I’ve already gotten them written through most of next week.

Jessica Connors: Oh, so you’re still doing that.

Brent Ozar: Yeah, I love it. I have a disturbing amount of fun with it.

Jessica Connors: They used to get a newsletter, the DBA Reactions.

Brent Ozar: I cut it back to like only once a week. It was going Tuesdays and Thursdays. Now it’s down to either just Tuesday or Thursday, I forget which one because I didn’t want to overwhelm people’s email boxes. There are like 5,000 people signed up to get this in their email box every week.

Jessica Connors: Yeah. I remember the time that somebody called for SQL Critical Care and they’re like “I heard about you guys from DBA Reactions. I love … I’m like, “we made a sale.”

[Laughter]

Brent Ozar: Well and at that point they’re going to call us because anybody who’s crazy enough to go, “I like DBA Reactions, they’re my people,” they already know exactly what we’re like. They know exactly how we work.

Jessica Connors: That’s fair. Cool, well you guys are being fairly quiet today so I think we’ll end there.

Brent Ozar: Thanks everybody for hanging out with us. We will see you guys next week.

Updated the StackOverflow SQL Server Database Torrent to 2016-03

Last Updated April 9, 2017

The StackOverflow XML Data Dump was recently updated with 2016-03 data, so I’ve updated our torrent of the SQL Server database version of the Stack Overflow data dump.

Fun facts about the database and its real-world-ness:

95GB in size
29,499,660 posts spanning 2008-07-31 to 2016-03-06
5,277,831 users spanning ages from -972 to 96 (just like real world data, you can’t trust it)
46,306,538 comments (227 of which have the F-bomb)
Every table has a clustered key on an Id identity field, and has relationships to other tables’ Ids (again, much more real-world-ish)
Lots of lumpy data distribution and sizes, making it fun for parameter sniffing demos
Case-sensitive collation (because if you’re going to share scripts online, you want to get used to testing them on case sensitive servers – this stuff exists out in the real world)
1,305% cooler than AdventureWorks

Here’s how I built the torrent:

128GB USB3 flash drives with the StackOverflow database that we use in training classes — 128GB USB3 flash drives with the StackOverflow database that we use in our training classes

In our AWS lab, we have an m4.large (2 cores, 8GB RAM) VM with SQL Server 2005. We use that for testing behaviors – even though 2005 isn’t supported anymore, sometimes it’s helpful to hop in and see how things used to work.

I still use 2005 to create the dump because I want the widest possible number of folks to be able to use it. (This is the same reason I don’t make the database smaller with table compression – that’s an Enterprise Edition feature, and not everybody can use that.) You can attach this database to a SQL 2005, 2008, 2008R2, 2012, or 2014 instance and it’s immediately usable. Keep in mind, though, that it attaches at a 2005 or similar compatibility level. If you want 2014’s new cardinality estimator, you’ll need to set your compat level to 2014 after you attach the database.

I downloaded the Stack Exchange data dump on that 2005 VM. It’s a little confusing because the Archive.org page says it was uploaded on 1/21/2014, but that’s just the first date the file was published. The top update date of March 1, 2016 is the current version you’ll get if you use the download links at the top right of the page.

To make the import run faster, I shut the VM down, then changed its instance type to the largest supported m4 – an M4 Deca Extra Large with 40 cores and 160GB RAM for $4.91/hour – and booted it back up. (Don’t forget to revisit your SQL Server’s max memory, MAXDOP, and TempDB settings when you make changes like this.)

I created an empty StackOverflow database, then fired up the Stack Overflow Data Dump Importer (SODDI), an open source tool that reads the XML data dump files and does batch inserts into a SQL Server database. I pasted in a connection string pointing to my SQL Server – ConnectionStrings.com makes this easy – and off it went:

SODDI importing the StackOverflow XML dump

The import finished in about 25 minutes, although it turns out the extra cores didn’t really help here – SODDI is single-threaded per import file:

Using a few threads while we import a few files

After SODDI finished, I stopped the SQL Server service so I could access the ~95GB data and log files directly, and then used 7-zip set to use ultra compression and 32 cores, and the CPU usage showed a little different story:

Whammo, lots of active cores and 16+GB memory used

After creating the 7z file, I shut down the EC2 VM, adjusted it back down to m4.large. I created a torrent with uTorrent, then hopped over to my Whatbox. Whatbox sells seedboxes – virtual machines that stay online and seed your torrent for you. They’re relatively inexpensive – around $10-$30/mo depending on the plan, and I just go for unlimited traffic to make sure the database is always available.

To double-check my work, I fired up my home BitTorrent client, downloaded the torrent, extracted it, and attached the database in my home lab. Presto, working 95GB StackOverflow database.

Now, you can go grab our torrent of the SQL Server database version of the Stack Overflow data dump. Enjoy!

My Favorite Database Disaster Stories

Last Updated February 11, 2017

Backup and Recovery, Clients and Case Studies, Processes and Practices

The statute of limitations has passed, so this week on SQL Server Radio, I get together with Guy Glantser and Matan Yungman to talk about our favorite oops moments.

I talked about my very first database disaster ever – done back when I was in my 20s and working for a photo studio, using Xenix, long before I ever thought I wanted to be a database administrator. (Yes, kids, Microsoft had their own Unix thirty years ago, and suddenly I feel really, really old. No, I wasn’t using it thirty years ago.)

This episode was so much fun because we recorded it in-person, together, gathered around a table in Tel Aviv when I was there for SQLSaturday Israel 2016. I really love talking to these guys, and I think you can hear how fun the chemistry is on the podcast.

Head on over and listen to our disaster stories, and when you’re done, check out my classic post 9 Ways to Lose Your Data.

One weird trick for managing a bunch of servers

Last Updated April 28, 2016

Let’s face it, most people don’t have just one SQL Server

How many they tell Microsoft they have is another matter, but let the record show that I don’t condone licensing dishonesty. But going one step further, most places… Well, they’re ‘lucky’ if they have one DBA, never mind a team.

Everyone else: Give me your network people, your sysadmin, your huddled SAN group yearning to breathe free, the wretched refuse of your teeming developers.

Doing things on one server is aggravating enough. Doing things on a bunch of servers is even worse. Given some of today’s HA/DR features (I’m looking at you, Availability Groups, with your lack of a mechanism to sync anything outside of user databases. Rude.) people are more and more likely to have lots of SQL Servers that they need to tend to.

Sometimes just keeping track of them is impossible. If you’re one guy with 20 servers, have fun scrolling through the connection list in SSMS trying to remember which one is which. Because people name things well, right? Here’s SQLVM27\Instance1, SQLVM27\Instance2, SQLVM27\Instance3, and that old legacy accounting database is around here somewhere.

Register it and forget it

But don’t actually forget it. If you forget it and it goes offline, people will look at you funny. Turns out people don’t like offline servers much.

So what’s someone to do with all these servers? Register them! Hidden deep in the View menu of SSMS is the Registered Servers window

It will look pretty barren at first, just an empty folder. But you’ll fill it up quick, I’m sure. Can never have enough servers around, you know.

It’s pretty easy to populate, you can right click on the Local Server Group folder, or on servers you’re connected to in Object Explorer.

Either way, you get the same dialog box to add a server in. You can give it a friendly name if you want! Maybe WIN03-SQL05\Misc doesn’t tell a good story.

And if you hip and hop over to the Connection Properties tab, you can set all sorts of nifty stuff up. The biggest one for me was to give different types of servers different colored tabs that the bottom of SSMS is highlighted with. It’s the one you’re probably looking at now that’s a putrid yellow-ish color and tells you you’re connected and that your query has been executing for three hours. Reassuring. Anyway, I’d use this to differentiate dev from prod servers. Just make sure to choose light colors, because the black text doesn’t show up on dark colors too well.

Another piece of advice here is not to mix servers on different major (and sometimes minor) versions. The reason is that this feature gives you the ability to query multiple servers at once. If you’re looking at DMVs, they can have different columns in them, and you’ll just get an error. Even a simple query to sys.databases will throw you a bonk between 2012 and 2014.

I changed my mind. I hate planets.

Even if you’re running 2008R2, there are some pretty big differences in DMVs between SP1 and SP3. Microsoft has been known to change stuff in CUs (I’m looking at you, Extended Events).

On the plus side, you can use your multi-server connection to SELECT @@VERSION to help you decide how you should group them. If they have something better in common, like participating in Log Shipping, Mirroring, an AG, etc., all the better.

Insight

But my favorite thing, because I was a devotee to the Blitz line of stored procedures even before I got paid to like them, was that I could install them on ALL OF MY SERVERS AT ONCE! This was especially useful when updates came out. You know what it’s like to put a stored proc on 20 servers one at a time? Geeeeeet outta here!

Check that out. It’s on both of my servers. At once. That means simultaneously, FYI. If you have a DBA or Admin database that you keep on all your servers to hold your fancy pants scripts and tools, this is an awesome way to make sure they all have the latest and greatest.

You’re already better at your job

Even though this feature came out in 2008, I hardly see anyone using it. I found it really helpful comparing indexes and query plans across app servers that held different client data across them. It also exposes far less than Linked Servers; you need to worry less about access and level of privilege.

Just don’t forget to export your list if you change laptops!

Thanks for reading!

When Should You Hire a Consultant for Amazon RDS?

Last Updated February 9, 2017

Amazon Web Services (AWS), SQL Server

You’re hosting your SQL Server databases in Amazon RDS, and performance has been getting slower over time. You’re not sure if it’s storage IOPs, instance size, SQL Server configuration, queries, or indexes. What’s the easiest way to find out?

Ask a few questions:

Are you using SQL Server Enterprise Edition? The smallest EE server, a db.r3.2xlarge, costs about $4,241 per month on demand (which isn’t the cheapest way to buy, of course). The next smallest doubles in cost, which means if the performance tuning efforts could drop you by just one single instance size, a consulting engagement would pay for itself well within two months.

Are you using mirroring for multi-AZ protection? If so, the delays required for sequential writes between availability zones may be your biggest bottleneck for inserts, updates, and deletes. Check your wait types with sp_BlitzFirst, and if the top ones are database mirroring, then the data changes aren’t likely to get faster with hardware tuning. Increased IOPs might help – but it takes deeper digging to get to that conclusion. It’s time to look at reducing your change rate in the database. If, on the other hand, your biggest bottleneck consists of select queries, consulting can help.

Are you locked into a long-reserved instance? You can sell reserved EC2 instances on the secondary market, but you can’t sell RDS instances as of this writing. If you’re having performance problems on it, this is definitely a time to call for consulting help fast. You want to avoid dumping the smaller instance and jumping into another commitment if a growing customer base or slowing code base could mean yet another instance type change.

Or are you running a single Standard Edition instance in just one AZ? Try standing up another RDS instance – but this time with the largest Standard Edition instance type you can get, around $12/hour as of this writing. Run the same types of queries against it, and within a couple hundred bucks of experimentation, you can get an idea of whether or not hardware will be a cheap enough solution. Granted – your time isn’t free – but it’s cheaper than a consulting engagement.

These questions help you figure out when it’s just cheaper to throw more virtual hardware at it.

A quick tip for working with large scripts in SSMS

Last Updated April 21, 2016

Sometimes great things stare you in the face for years

Sometimes they stare at you for so long that you stop noticing them. This isn’t a romdramadey tag line. There are just so many buttons to push. Sometimes pressing them is a horrible idea and it breaks everything. I lose my mind every time I go to move a tab and I end up undocking a window. This won’t save you from that, but it will at least save your mouse scroll wheel.

This has helped me out a whole bunch of times, and especially, recently, when contributing code to our Blitz* line of stored procedures. Navigating all around large scripts to change variables or whatever is a horrible nuisance. Or it can be. When the stores procedure is like 3000 lines and there’s a bunch of dynamic SQL and… yeah. Anyway. Buttons!

Personality Crisis

There’s a little button in the top right corner of SSMS. Rather unassuming. Blends right in. What is it? Fast scroll? Some kind of ruin all your settings and crash your computer button? Delete all the scripts you’ve been working on for the past 6 months?

No! It’s a splitter, screen splitter! Guaranteed to blow your mind! Anytime!

If you drag it up and down, you can alter the visible portion of the screen, and scroll and zoom in each pane independently.

There you have it

Next time you need to work with a huge script and find yourself scrolling around like a lunatic, remember this post!

Thanks for reading!

[Video] Office Hours 2016/04/20 – Now With Transcriptions

Last Updated April 9, 2017

Here’s the video on YouTube:

SQL Server, Videos

This week, Brent, Erik, Jessica, Richie, and Tara discuss database modeling tools, how to learn about database corruption, the new cardinality estimator, and the only question that will be on our certification exams.

You can register to attend next week’s Office Hours, or subscribe to our podcast to listen on the go.

What ERD Tool Should I Use?

Jessica Connors: All right, question from Lee. Let’s jump in there. He says, “We have a system where the vendor went belly up. Now I am tasked with getting an ERD for the database so we can move it to a new system. What tools, free if possible, would you suggest?”

Richie Rump: Oh, uh-

Tara Kizer: I mean, Management Studio has it built in, the ERD, but I’ve always, anytime I’ve used the ERDs, I’ve always used Erwin or whatever.

Richie Rump: Technically, that’s not an ERD. That’s just a diagram, right?

Tara Kizer: Yeah.

Richie Rump: An ERD is an all-encompassing tool that will have, it’s essentially like a case tool. Did I just go case, yeah? You can generate databases from it. You can reverse engineer it. I think Viseo still could do it for free in a pinch, which I’ve done, but my favorite is Embarcadero ER Studio and Erwin, still. They both have their plus and minuses. Both of them have the same minus which is a very large price tag. I have actually purchased my own copy of ER Studio because I do data design a lot, and I’m kind of crazy that way.

Brent Ozar: I want to throw something else weird out there. If the vendor went belly up, the diagramming or ERD tool is going to be the least of your worries. Holy cow, buckle up. You’re going to be developing the bejeezus out of that database. If you’re taking it over from here on out, go buy the tool because this is your new career. This is what you’re going to be working on.

Tara Kizer: Hopefully, they have the source code [inaudible 00:03:29]

Richie Rump: Wow.

Brent Ozar: Oh, that would suck.

Jessica Connors: Yeah, what happens when vendors just die like that? It’s just like, oh, sorry customer. You can have the product, but we’re not iterating on it. We’re done.

Brent Ozar: That was my life in 2000 to 2001. We had a help desk product where the vendor was like, yeah, no, we’re not doing this anymore. I had to build a new web front end to their existing database and gradually like change things over. That was two years of my life that I would really like back. The hilarious part is that the company that I worked for then still uses that crappy web help desk that I built because they lost the source code after I left.

How Should I Troubleshoot a Big, Slow Stored Procedure?

Jessica Connors: Oh boy. All right, James says, “I have an SP that has 53 queries in it. What is the best way to approach troubleshooting this slow query?”

Erik Darling: After the execution plan-

Tara Kizer: Execution plan sets statistics I know.

Erik Darling: Then, use it in the SQL Sentry Plan Explorer, it makes it really easy to see which statement in the store procedure takes up the most percentage of work.

Brent Ozar: The other thing you can do if you want to play with it is take it over to development, like restore a copy to production database. Blow your plan cache, run DBCC FREEPROCCACHE. I emphasize, you’re going to do this in development. You’re not going to do it in production. Then, immediately after you free the plan cache, run the query and use sp_BlitzCache. sp_BlitzCache will show you which lines in the stored procedure do the most reads, CBU, run time, whatever.

Should I Update Stats When I Change Compatibility Levels?

Jessica Connors: Question from Tom Towns, I haven’t gotten this one. It says, “Is it necessary or advisable to rebuilt indexes/stats when downgrading compatibility level on a database but staying on the same version of SQL Server?” He’s running SQL Server 2014.

Erik Darling: No, not necessarily, but you should be doing regular maintenance on your databases anyway.

Brent Ozar: That’s actually a good question because it comes from the history of when you went up with SQL Server, there was a lot of advice around. You should update your stats, and I don’t think, for a while there, we were trying to find the root of that, like where that advice came from.

Erik Darling: My best guess on that is that there are slight tweaks and twinges to the cardinality estimator, and when you update versions, that doesn’t necessarily kick in just by restoring the database. Updating the stats just kind of helps push things to the whatever, new ideas that cardinality estimator has for your data.

Will Brent Ozar Unlimited Have a Certification Program?

Jessica Connors: Let’s see here, you guys are quiet today. In terms of certifications, would we ever think about giving our own Brent Ozar stamp of approval? I know we give out those certificates of completion with the funny little cartoons on there, but Brent, we never actually thought about giving our own certification course, classes, all of those things.

Brent Ozar: It would really just consist of, can you write us a check for 200 dollars, yes? You are a member of a very prestigious club. I bet if we all got together and focused, we could probably put together a test that would do it. The problem is, it’s really expensive. It takes a lot of time to do it because you’ve got to do it in a way, I was part of a team that helped build the Microsoft Certified Master, the next version that never ended up going public. I learned so much from Microsoft around that process where the questions have to be legally defensible. They can’t be biased towards English speaking people. They can’t have cultural issues, like you can’t take in certain business knowledge or assume certain cultural knowledge. It needs to be a fair playing field for anybody who touches SQL Server. That’s really hard. I have a lot of respect for people who write certification exams, but I agree with you. The Microsoft one sucked pretty bad.

Richie and I talked about that in the Away from the Keyboard podcast.

Richie Rump: Yeah, on Away from the Keyboard. I think the name of the episode was Brent Ozar Loses His MVP.

Brent Ozar: Yeah. I trash talked the bejeezus out of that, but yeah.

Richie Rump: And yet, we got renewed. He’s trying people. I don’t understand.

Brent Ozar: We don’t even have tests in our classes. What we end up doing is we have group discussions. I’ll give you a homework assignment, and everybody works together at the same time, like some of the assignments you do by yourself. Some you of them you do in groups. Even that, just the discussions of people saying afterwards, I think this answer is right. I think this answer is right can take hours, so it’s pretty tricky.

Should I Install Multiple Instances On One Server?

Jessica Connors: Let’s see here, Steve. Move on to Steve. He says, “We are being given a virtual server to install SQL 2014. Which would be better, install one instance of SQL with all of our databases or several instances of SQL server with fewer databases on each instance?”

Brent Ozar: We’re all trying to be polite, I think.

Tara Kizer: Is this all for one box? If it’s for one box, we don’t recommend stacking instances. One instance per box.

Brent Ozar: Why don’t we recommend stacking instances?

Tara Kizer: I mean, you have to determine max memory setting for them, and you might can figure it not optimal for one instance. Another instance might need more memory. You might be playing that game and just fighting for resources. How many databases are we talking about? Are we talking about just 40 or 50? Are they all critical? Are they all non-critical? Can you have another virtual sever where you can put some databases on that one and other databases on the other?

Erik Darling: A lot of what I would use to sort of dictate where I’m going to put things is RPO and RTO. If you have a group of databases that all have 24/7 uptime, higher level requirements and say like bug tracking or help desk or something along those lines, something you can put off, stuff that needs high availability, stuff that needs better back ups, things like that, things that need more horsepower behind them, I would put those on a server or servers where I’m going to be able to pay special attention to them. I would put some of the less needy stuff on other servers that I can sort of stick in a corner and forget about and not worry about performance and not worry about any performance in one stepping on the toes of another.

How Do I Get Rid of Key Lookups?

Jessica Connors: All right, question from [inaudible 00:10:51]. She reads this, “My key lookup is still showing after the column was added in a covered index. Anything I could do to avoid the key look up?”

Brent Ozar: We’ve seen this with a few. You want to look a little deeper with the execution plan. Sometimes there is something called a residual predicate, where there’s other fields that are being looked up, not just the key. When you hover your mouse over the key, look up, look for two things, the predicate and the output. Maybe SQL Server is seeking for some other kinds of fields or it’s out putting other kinds of fields. If there is no predicate and no output, use SQL Sentry Plan Explorer and then anonymize that plan and post it online. There are known issues, I’ve seen them where I can’t get rid of the look up, and sometimes there’s gurus on there like Paul White who can take a look at the plan and go, oh, of course, it’s the bug right here.

Erik Darling: That happened to me sort of recently. I was helping a client get a computed columns together for a table, and for some reason, we added the computed columns. We started referencing the computed columns, but in the execution plan, there was still a key look up for all of the columns that made the computed column work, right? We had five or six columns that added up, made a computed column. We added the computed column to the included columns of the index, but it still wanted all of the columns that made up the computed column for some reason. That was a very weird day.

Tara Kizer: What did you do to fix it?

Erik Darling: Added the columns it was asking for in the output columns and the key lookup went away. It was very weird though. It was something that I tried to reproduce on 2012 with 2014, and I couldn’t get it to happen.

Richie Rump: He stared at it, and-

Erik Darling: Intimated it.

Richie Rump: Changed.

Brent Ozar: Flexed.

How Do I Learn About Database Corruption?

Jessica Connors: Database corruption, we haven’t talked about that.

Richie Rump: Ew.

Jessica Connors: That sounds fun.

Brent Ozar: It is fun. It is actually fun.

Jessica Connors: Is it? It reminds me- [crosstalk 00:13:05] Now I’m thinking about that song by the Beastie Boys.

Brent Ozar: Sabotage?

Jessica Connors: Yeah, that’s the one. Let’s see, it’s from Garrett. He says, “What’s the best way to lean about how to resolve database corruption with zero data loss?”

Brent Ozar: Steve Stedman’s database corruption challenge. If you search for Steve Stedman, it’s just S-T-E-D-M-A-N, database corruption challenge, he had like 10 weeks of challenges with different sample databases that were corrupt. You would go download them, and you would figure out how to fix them. The quizzes are kind of two part. The download sample database, and then you go try to figure out how to recover as much data as you can, with zero data loss. Then, you go read how other people did it. They share their answers and show how fast they were able to get the data back.

Erik Darling: Get real comfortable with some weird DBCC commands.

Brent Ozar: Forcing query hints to get data from some indexes and not others, oh it is awesome. Even if you don’t do it, just reading how hard it is is awesome.

Erik Darling: You know what always get me is when you have to find binary, but it’s byte reversed, so there’s like stuff with DBCC right page, and oh forget it.

Brent Ozar: Screw that. No way.

Erik Darling: Call someone else.

Brent Ozar: It’s really-

Jessica Connors: All of you, you’ve all dealt with this before? Database-

Brent Ozar: I’ve never dealt with it in real life. I don’t like-

Tara Kizer: I’ve done it two or three times. Do I like it? No. It was probably about five years ago. I was at the on call DBA. Two DBAs got woken up, and this was on a weekend. By seven in the morning, you hadn’t slept yet. They couldn’t even think clearly at this point, so that’s when they called me to take over for the next several hours. We ended up having to restore the database, and we had data loss, and it was all due to the san. You had some san hardware issues. It was bad. It was really bad.

Brent Ozar: It’s one of those where, like you can go your whole career and never see it, but if you have storage that isn’t bullet proof reliable, or if you have a team that like screws around and cuts around corners, then you spend years of your life dealing with that.

Should I Feel Bad for Not Using the Current Compat Mode?

Jessica Connors: Question from Mandy. She says, “Hi. We upgraded to 2014 Standard Edition on new hardware a couple of months ago, but left our databases in 2012 compatibility mode. A few weeks ago, I upped the compatibility mode to 2014 and afterwards had such major contention and blocking problems, we had to change it back to 2012 after about 30 minutes. Is this common? What can we look for in our database to resolve this?”

Brent Ozar: It would be more common if more people were upgrading.

Erik Darling: Everyone’s still on 2008.

Tara Kizer: Is that the cardinality estimator, probably doing that.

Erik Darling: Yeah, it sounds like that kicking in.

Tara Kizer: Is there a different way to turn it off and you’ll still have your compatibility level be 2014?

Brent Ozar: This is tricky, you see, price flags, and you can turn it off at the query level, but Mandy you were bang on. You’re perfect to wait a couple of weeks. That’s exactly what you want to do. Then, you want to flip it on a weekend, like on a Saturday morning. Let it go for just 10 minutes or 15 minutes, but as soon as you can, run sp_BlitzCache and gather the most CPU intensive query plans, most read-intensive query plans, and the longest running query plans with sp_BlitzCache. You’ve got to save those plans out to files, and then switch back to 2012 compatibility mode so that you’re back on the old CE. Then, you take the next week or two troubleshooting these query plans to figure out, what is it about these queries that suck. Is it the new cardinality estimator? So often, people just test their worst queries with the new 2014 CE, not understanding you need to test all of the queries, because the ones that used to be awesome can suddenly start sucking with the new CE.

Erik Darling: Or, in a couple of months, just upgrade to the new 2016 and use the query data store and then just bang that right off. Done.

Brent Ozar: What’s the query data store?

Erik Darling: It’s this neat thing. It’s like a black box for SQL Server. What it does, is it basically logs all of your queries. It’s a good way for cases like you where you had good query plans that were working, and then all of a sudden, something changed and you had bad query plans. It gives you a way to look at and find regressions in query plans and then force the query plan that was working before so that it uses that one rather than trying to mess around and just use the new one. Otherwise, you’re kind of stuck, like you are on 2014, where you could run, switch to the new cardinality estimator, but then you would have to use some trace flags on the specific queries to force the old cardinality estimator, which is not fun. I Don’t recall those trace flags off the top of my head. I never do.

Richie Rump: Can you hit paste flight recorder for queries? Is that about right?

Brent Ozar: Listening in on the cockpit catches people doing things they shouldn’t be doing.

Erik Darling: I was having a drinking contest.

Does the New Cardinality Estimator Get Used Automatically in 2014?

Jessica Connors: Let’s see, question while we’re on the topic of the cardinality estimator. Question from Nate, we may have answered this. “Speaking of the new CE for 2014, does it automatically get used for everything or only for the DBs in the 2014 compatibility level, or is it off by default, and it’s up to the DBA to turn it on at their discretion?”

Erik Darling: Or indiscretion.

Brent Ozar: Yeah, whenever you change your, if you’re upgrading a SQL Server, or if you’re attaching databases to a 2014 SQL Server, the new cardinality estimator doesn’t get used by default. It does get used if you create new databases, and they’re in the 2014 compat mode. Yeah, it’s totally okay to leave your databases in the old compat mode. You can leave them there as long as you want. There’s no botches with that. Totally okay, totally supported.

Jessica Connors: What if it’s in a 2005 compatibility mode?

Brent Ozar: What’s the oldest one that supported? I think it’s 2008.

Tara Kizer: I just looked yesterday, because a client had a question, and 2014 does have the 2005 compatibility level in the list. I was surprised. I was very surprised.

Brent Ozar: So generous.

Should I Use “Optimize for Ad Hoc Workloads”?

Jessica Connors: Question from Robert, he says, “Optimized for query workload’s instant setting, good to turn on by default or only when mad at hot queries are frequent?”

Brent Ozar: This is tricky. You will read a lot of advice out there like you should turn it on by default, because it doesn’t have a drawback. I’m kind of like, if I see that, it doesn’t bother me. It just doesn’t usually help people unless they truly have a BI environment where every query comes out as a unique delicate snow flower. Have any of you guys ever seen problems with optimized for ad hoc?

Tara Kizer: Problems? No.

Richie Rump: No.

Erik Darling: I’ve never seen problems. You know, every once in a while, you’ll see someone with like 80 or 90% of their plan caches is single use queries. At that point, that’s not really efficient use of the memory or what you’re storing in the plan cache.

Jessica Connors: Let’s see, question from Nate. We are still on compatibility mode questions. Last one, “Is there anything wrong with leaving all DBs in lower compat levels when you do a SQL Server migration, server 2014? Like, leave all the DBs at 2005 or 2008 or two compatibility levels and then start upping them later when we have breathing room?

Erik Darling: No.

Tara Kizer: You won’t ge the new features.

Brent Ozar: Totally okay. Totally okay.

Tara Kizer: Eventually, you’re going to have to up them, because you’re going to be upgrading and you can’t go down to that level. I imagine that 2016 doesn’t go down to 2005.

Erik Darling: I mean, just doing that before you up the compatibility level, just make sure that you check any of the Microsoft’s broken features and deprecated stuff and new reserve key words. If your old databases are using keywords that have now become reserved, stuff could break pretty easily.

When Will We Add More Training Videos?

Jessica Connors: James is getting sick of our same old training videos on our website. He is wondering when we will be adding more videos to the training library. If so, what? When? How?

Erik Darling: As soon as you pay us to make them.

Brent Ozar: Doug is doing a new one on advanced querying and indexing. It has a game show theme, back from a ’70s game show. We are aiming to have that one online in June. To give you a rough idea, it takes about three months worth of work to do a six hour class to the level of Doug’s videos. If you’ve watched T-SQL Level Up, it’s phenomenal production value. It takes three months worth of work to do a six hour video.

Brent Ozar: For my level of production values, where it’s a guy talking in front of a green screen, it’s usually about a month. I’m working on performance tuning, when you can’t fix the queries. That one will be out probably in June or July, somewhere in there. The trick with that is, the Everything Bundle price will go up at that time. If you’re interested in getting an everything bundle, I would do that this month. It will include any videos that we add during the course of your 18-month ownership. This month, the everything bundle is on sale, even further the half off. Now, it’s just $449 for 18 months to access to all of our video. That is a one month only sale to celebrate our 5th year anniversary. After that, you are right back up to 899.

Jessica Connors: Then, what’s it going to go up to?

Brent Ozar: It depends. I think we have two more 299 videos. I wouldn’t be surprised if we pushed it to 999.

Jessica Connors: Ah.

Richie Rump: 999? That’s still a deal.

Brent Ozar: Richie says in his sales person voice.

Why Does sp_BlitzIndex v3 Have Less Output?

Jessica Connors: All right, question from Justin. Hello, Justin. He says, “sp_BlitzIndex isn’t recognizing new indexes on a table where I deleted all of the indexes and ran replay trace against, but SSMS is recommending some. So is a third party software. Any idea what would cause this?”

Brent Ozar: I bet you’re running a brand new version of sp_BlitzIndex, version three, that just came out where we ignore crappy little tables that don’t really have that much of a difference in terms of performance. If you want the crappy little tables version, you can run the older version of sp_BlitzIndex, which is included in the download pack, or use it with @Mode = 4, which does the fine grained analysis that we used to do. Just know that you’re probably not going to have that much of a performance improvement, not if they’re tiny little tables, tiny little indexes.

Should I Use Somebody’s HA/DR Software on My SQL Server?

Jessica Connors: Yeah, he said that he was using the new one. Brent, or anyone, has anyone here heard of DH2I enterprise contain HA solution, and what is your opinion? Have you heard of that? It’s from-

Brent Ozar: I have heard of it. I don’t know anybody using it. I don’t know if you guys have either. I have this whole thing where if I’m going to install software that supposed to make my stuff more highly available, my whole team better know how to use it, and it better be like really well tested. I need to be able to talk to other people and get training on how it works. I would just talk to other people who use it and ask for hey, can you give me a run down of what issues you’ve had over the last 12 months. Make sure they’re a similarly sized customer to you, like if you’ve got 100 servers, the person you talk to should have 100 servers.

Erik Darling: What I find helpful is that a lot of vendors like this will have help forums. What I like to do is just read through the help forums and see what kind of problems people are facing, kind of common questions and see if there’s any answers there that kind of help me figure out if this is right for me.

Jessica Connors: At any of the help forums, have any of you guys seen people giving feedback like, don not buy this product. This is terrible.

Erik Darling: You know, you see some questions that are like X product stopped working and this broke and this stopped working and had major outage. What do I do?

Brent Ozar: And you look at the time lag between answers, like how long it takes to get it fixed.

Erik Darling: And then like, no actual company support people are chiming in. It’s all other users that had the same problem.

Tara Kizer: I just don’t know if I would want to use a product that isn’t widely used in industry. Do you want to be the first customer using this product or the first five? I want to use the products everyone else is using.

Jessica Connors: We are doing some upgrades to our CRM right now, and there’s little things I want to change to make it work, and the engineer is sending me, basically these forums of people like, I really want this feature. It’s never going to be turned on. This piece doesn’t work, and then like it’s from five years ago, four years ago.

Brent Ozar: Utter silence. There’s also a follow up question from someone that says, oh, so and so did an online session on that. Just know that often, consultants who are paid by vendors, will do sessions going, hey, this tool is amazing. I would be glad to help you install it. They are selling the software and they are selling their consulting. Look at it just the same as you would an infomercial. I would ask to talk to the customers using it, not to the people you are going to pay money to.

Jessica Connors: Have we done that? Has anyone ever come up to us with another product like, hey, can you do a webcast for their product?

Brent Ozar: Oh, all the time. I’ll be like, because we do webcasts too for like Dell and Idera and all this. I’m going to talk about the native way of doing it, like the pain points of how you do it without software. If you want to talk about how your software does it, that’s totally cool. I can’t talk about your software, because I don’t use it. I just have to jump form one client to another. Every now and then, people will say, “Here’s a bucket of money. How much of this bucket would you like in order to say our product is awesome?” I’m like no. I only have one reputation. What happens if the product sucks butt wind? I will review privately if you want to spend three or four days hammering on your software and see how it works. We would be glad to do it and then give you a private review. I’m not going to go public and say that it’s awesome when it smells like…

Richie Rump: Erik and I will do it. That’s not a problem.

Brent Ozar: Yes.

Erik Darling: My reputation stinks anyway.

What’s the Ideal HA/DR Solution for 5-10 SQL Servers?

Jessica Connors: Let’s see, just for fun. Question if you run out of things to answer. This is really open-ended. What is your ideal HA and DR solution for a SQL Server environment with five to ten instances. That depends.

Brent Ozar: No, we should each answer that. Tara, what’s your ideal for five to ten instances? Say you’ve got one DBA.

Tara Kizer: I don’t know how to answer that. I’ve never worked in an environment where I was the only DBA. I have always worked in an environment where there was probably three to eight DBAs Availability groups is my answer for almost everything. You say HA, I say availability groups. I know you guys don’t like that, but that’s what we implemented. We had large DBA teams. We had larger server teams that understood Windows, clustering, all that stuff. It works well if you have the people that know the clustering, you know, all the features.

Brent Ozar: Tara doesn’t get out of bed for less than 10,000 dollars. Erik, how about you?

Erik Darling: For me, if you just have one to two DBAs, but you may have some other support staff, I would say a fair level clustering and then something like either async mirroring or log shipping. It’s usually pretty decent for those teams. It’s pretty manageable for most people who don’t have the DBAs or the football jerseys on and the schedules and the tackle cards on almost everything.

Brent Ozar: Yeah, how about you, Richard?

Richie Rump: From a developer’s perspective, because I joined this team, and all of a sudden, I have a developer’s perspective. I love that. It’s Azure SQL Database, right?

Brent Ozar: Oh, you’re cheating.

Erik Darling: Wow.

Richie Rump: It’s all kind of baked in there, and I don’t have to think about it. A lot of it’s done fore me. As a developer, I’m going to do the laziest, simplest way out. That would be it.

Brent Ozar: Man, you win points on that. That’s genius. That is pretty freaking smart. I would probably assume that they’re not mission critical if there is stuff that I could stand some down time on. I actually would probably go with just plain old VMware. I would just make them as single instances in virtualization. Then, that way do something like log shipping for disaster recovery, or VMware replication. Now, this is not high availability. It’s just availability. It’s just the A. If you cut me down to just like five SQL Server instances and no DBA or like one DBA who’s kind of winging it and spending most of his time on a free webcast, then I kind of like that. If not, I’m still kind of a fan of async mirroring too. Async mirroring is not bad.

Erik Darling: Much less of an albatross than its sync brother.

Brent Ozar: Yeah, that thing blows.

Jessica Connors: Cool.

Erik Darling: I’ll say one thing though, not replication.

Tara Kizer: Yeah.

Jessica Connors: So many people using replication out there.

Tara Kizer: Replication is mostly okay, but not as an HA or DR feature.

Jessica Connors: Ah, they’re using it for the wrong thing.

Tara Kizer: It’s really reporting feature.

Brent Ozar: It’s funny to see, too, like all of us end up having these different answers, and this ends up being what our chat room is like. In the company chat room, like we all have access to everybody else’s files. We know what everybody is working on. Somebody can be like, Hey, I’m working with this client. Here is what they look like. Here’s what their strengths and challenges are. It’s really fun to bounce ideas off people and see what everybody comes up with.

Jessica Connors: Mm-hmm – well, all right, guys. It’s that time.

Erik Darling: Uh-oh.

Brent Ozar: Calling it an episode. Thanks everybody for hanging out with us, and we will see you next week.

Stats Week: Only Updating Statistics With Ola Hallengren’s Scripts

Last Updated June 17, 2017

Index Maintenance, Ola Hallengren's Database Maintenance Scripts

I hate rebuilding indexes

There. I said it. It’s not fun. I don’t care all that much for reorgs, either. They’re less intrusive, but man, that LOB compaction stuff can really be time consuming. What I do like is updating statistics. Doing that can be the kick in the bad plan pants that you need to get things running smoothly again.

I also really like Ola Hallengren’s free scripts for all your DBA broom and dustpan needs. Backups, DBCC CHECKDB, and Index and Statistics maintenance. Recently I was trying to only update statistics, and I found it a little trickier than I first imagined. So tricky, in fact, that I emailed Ola, and got a response that I printed and framed. Yes, the frame is made out of hearts. So what?

What was tricky about it?

Well, the IndexOptimize stored procedure has default values built in for index maintenance. This isn’t a bad thing, and I could have altered the stored procedure, but that would be mean. I set about trying to figure out how to get it to work on my own.

First, I tried only passing in statistics parameters.

EXEC [master].[dbo].[IndexOptimize]
    @Databases = N'USER_DATABASES' ,
    @UpdateStatistics = N'ALL' ,
    @OnlyModifiedStatistics = N'Y' ,
    @LogToTable = N'Y';

EXEC [master].[dbo].[IndexOptimize]

@Databases = N'USER_DATABASES' ,

@UpdateStatistics = N'ALL' ,

@OnlyModifiedStatistics = N'Y' ,

@LogToTable = N'Y';

But because of the default values, it would also perform index maintenance. Sad face. So I tried being clever. Being clever gets you nowhere. What are the odds any index would be 100% fragmented? I mean, not even GUIDs… Okay, maybe GUIDs.

EXEC [master].[dbo].[IndexOptimize]
    @Databases = N'USER_DATABASES' ,
    @FragmentationMedium = N'INDEX_REORGANIZE' ,
    @FragmentationHigh = N'INDEX_REORGANIZE' ,
    @FragmentationLevel1 = 100 ,
    @FragmentationLevel2 = 100 ,
    @UpdateStatistics = N'ALL' ,
    @OnlyModifiedStatistics = N'Y' ,
    @LogToTable = N'Y';

EXEC [master].[dbo].[IndexOptimize]

@Databases = N'USER_DATABASES' ,

@FragmentationMedium = N'INDEX_REORGANIZE' ,

@FragmentationHigh = N'INDEX_REORGANIZE' ,

@FragmentationLevel1 = 100 ,

@FragmentationLevel2 = 100 ,

@UpdateStatistics = N'ALL' ,

@OnlyModifiedStatistics = N'Y' ,

@LogToTable = N'Y';

But This throws an error. Why? Well, two reasons. First, 100 isn’t valid here, and second, you can’t have the same fragmentation level twice. It would screw up how commands get processed, and the routine wouldn’t know whether to use @FragmentationMedium, or @FragmentationHigh. This makes sense.

Okay, so I can’t use 100, and I can’t set them both to 99. What to do? Let’s bring another parameter in: @PageCountLevel.

EXEC [master].[dbo].[IndexOptimize]
    @Databases = N'USER_DATABASES' ,
    @FragmentationMedium = N'INDEX_REORGANIZE' ,
    @FragmentationHigh = N'INDEX_REORGANIZE' ,
    @FragmentationLevel1 = 98 ,
    @FragmentationLevel2 = 99 ,
    @UpdateStatistics = N'ALL' ,
    @OnlyModifiedStatistics = N'Y' ,
    @PageCountLevel = 2147483647,
    @LogToTable = N'Y';

EXEC [master].[dbo].[IndexOptimize]

@Databases = N'USER_DATABASES' ,

@FragmentationMedium = N'INDEX_REORGANIZE' ,

@FragmentationHigh = N'INDEX_REORGANIZE' ,

@FragmentationLevel1 = 98 ,

@FragmentationLevel2 = 99 ,

@UpdateStatistics = N'ALL' ,

@OnlyModifiedStatistics = N'Y' ,

@PageCountLevel = 2147483647,

@LogToTable = N'Y';

This seems safe, but it’s still not 100%. Even with the integer maximum passed in for the page count, it still felt hacky. Hackish. Higgity hack. The other part of the equation is that I don’t even want this thing THINKING about indexes. It will still look for indexes that meet these requirements. If your tables are big, you know, sys.dm_db_index_physical_stats can take foreeeeeeeeeeeeeeeeeeeeeeeeeeeever to run. That seems wasteful, if I’m not going to actually do anything with the information.

Hola, Ola

This is where I emailed Ola for advice. He responded pretty quickly, and here’s how you run stats only updates.

EXECUTE [dbo].[IndexOptimize]
    @Databases = 'USER_DATABASES' ,
    @FragmentationLow = NULL ,
    @FragmentationMedium = NULL ,
    @FragmentationHigh = NULL ,
    @UpdateStatistics = 'ALL' ,
    @OnlyModifiedStatistics = N'Y' ,
    @LogToTable = N'Y';

EXECUTE [dbo].[IndexOptimize]

@Databases = 'USER_DATABASES' ,

@FragmentationLow = NULL ,

@FragmentationMedium = NULL ,

@FragmentationHigh = NULL ,

@UpdateStatistics = 'ALL' ,

@OnlyModifiedStatistics = N'Y' ,

@LogToTable = N'Y';

Moral of the story

NULLs aren’t all bad! Sometimes they can be helpful. Other times, developers.

Thanks for reading!

Brent says: Subtitle: How many DBAs does it take to think of NULL as a usable option? Seriously, we all banged our heads against this one in the company chat room.

Breaking News: Query Store in All Editions of SQL Server 2016

Last Updated February 13, 2017

Execution Plans, Query Store (QDS), SQL Server

Bob Ward talking Query Store at SQL Intersection

Onstage at SQL Intersections in Orlando this morning, Bob Ward announced that Query Store will be available in all editions of SQL Server 2016.

This is awesome, because Query Store is a fantastic flight data recorder for your query execution plans. It’ll help you troubleshoot parameter sniffing issues, connection settings issues, plan regressions, bad stats, and much more.

I’m such a believer in Query Store that sp_Blitz® even warns you if Query Store is available, but isn’t turned on.

Wanna learn what it is and how to use it? Books Online’s section on Query Store is a good place to start learning, and check out Bob’s slide deck and resource scripts.

And oh yeah – Argenis Fernandez and I had a little bet. He bet that Query Store would be fully functional in Standard Edition, and I bet that it wouldn’t. I’ve never been happier to lose a bet, and I made a $500 donation to Doctors Without Borders this morning. Woohoo!

Update 4/21 – note the comment below from Bob Ward, who clarifies that this wasn’t quite ready for release yet, and feature decisions may not have been made yet.

Stats Week: Messin’ With Statistics

Last Updated March 23, 2017

SQL Server, Statistics

If there’s one thing living in Texas has taught me

It’s that people are very paranoid that you may Mess With it. Even in Austin, where the citizenry demand weirdness, they are vehemently opposed to any form of Messing, unless it results in mayonnaise-based dipping sauce.

Me? I like Messing With stuff. Today we’re going to look at one way you can make SQL think your tables are much bigger than they actually are, without wasting a bunch of disk space that has nearly the same price as wall to wall carpeting.

To do this, we’re going to venture deep into the Undocumented Command Zone. It’s like the Twilight Zone, except if you go there on your production server, you’ll probably end up getting fired. So, dev servers only here, people.

Creatine

Let’s make a table, stuff a little bit of data in it, and make some indexes.

DROP TABLE [dbo].[Stats_Test]

;WITH E1(N) AS (
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL 
    SELECT NULL  ),                          
E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),
Numbers AS (SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)
SELECT  ISNULL([N].[N], 0) AS [ID] ,
    ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE())),     '1900-01-01') AS [OrderDate] ,
        ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 10000.00))) AS [Amt1]
INTO [Stats_Test]
FROM    [Numbers] [N]
ORDER BY [N];

CREATE UNIQUE CLUSTERED INDEX [cx_id] ON [dbo].[Stats_Test] ([ID])
CREATE UNIQUE NONCLUSTERED INDEX [ix_test1] ON [dbo].[Stats_Test] ([OrderDate], [ID])
CREATE UNIQUE NONCLUSTERED INDEX [ix_test2] ON [dbo].[Stats_Test] ([Amt1], [ID])

DROP TABLE [dbo].[Stats_Test]

;WITH E1(N) AS (

SELECT NULL UNION ALL SELECT NULL UNION ALL SELECT NULL UNION ALL

SELECT NULL ),

E2(N) AS (SELECT NULL FROM E1 a, E1 b, E1 c, E1 d, E1 e, E1 f, E1 g, E1 h, E1 i, E1 j),

Numbers AS (SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM E2)

SELECT ISNULL([N].[N], 0) AS [ID] ,

ISNULL(CONVERT(DATE, DATEADD(HOUR, -[N].[N], GETDATE())), '1900-01-01') AS [OrderDate] ,

ABS(CONVERT(NUMERIC(18,2), (CHECKSUM(NEWID()) % 10000.00))) AS [Amt1]

INTO [Stats_Test]

FROM [Numbers] [N]

ORDER BY [N];

CREATE UNIQUE CLUSTERED INDEX [cx_id] ON [dbo].[Stats_Test] ([ID])

CREATE UNIQUE NONCLUSTERED INDEX [ix_test1] ON [dbo].[Stats_Test] ([OrderDate], [ID])

CREATE UNIQUE NONCLUSTERED INDEX [ix_test2] ON [dbo].[Stats_Test] ([Amt1], [ID])

There’s our thousand rows. If you’re dev testing against 1000 rows, your production data better only have 1001 rows in it, or you’re really gonna be screwed when your code hits real data. How do we cheat and make our data bigger without sacrificing disk space?

Eat Clen, Tren Hard, Anavar give up

You can update all statistics on the table at once, or target specific indexes with the following commands.

UPDATE STATISTICS [dbo].[Stats_Test] WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([cx_id]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000
UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test1]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000
UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test2]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([cx_id]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test1]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

UPDATE STATISTICS [dbo].[Stats_Test] ([ix_test2]) WITH ROWCOUNT = 10000000000, PAGECOUNT = 1000000000

This will set your table row count to uh… 10 billion, and your page count to 1 billion. This makes sense, since usually a bunch of rows fit on a page. You can be more scientific about it than I was, this is just to give you an idea.

So let’s check in on our statistics! Sup with those?

DBCC SHOW_STATISTICS('dbo.Stats_Test', cx_id)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test1)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test2)

DBCC SHOW_STATISTICS('dbo.Stats_Test', cx_id)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test1)

DBCC SHOW_STATISTICS('dbo.Stats_Test', ix_test2)

Hint: these commands will not show inflated page or row counts in them. They actually won’t show page counts at all. Hah. That’s kinda silly, though. Hm.

Anyway, what we should grab from the statistics histograms are some middling values we can play with. For me, that’s an ID of 500, a date of 2016-03-18, and an amount of 4733.00.

One thing I’ve found is that the inflated counts don’t seem to change anything for Identities, or Primary Keys. You’ll always get very reasonable plans and estimates regardless of how high you set row and page counts for those. Regular old clustered indexes are fair game.

Some really interesting things can start to happen to execution plans when SQL thinks there’s this many rows in a table. The first is that SQL will use a rare (in my experience) plan choice: Index Intersection. You can think of this like a Key Lookup but with two nonclustered indexes rather than from one nonclustered index to the clustered index.

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[ID] = 500

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[OrderDate] = '2016-03-18'

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[Amt1] = 4733.00

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[ID] = 500

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[OrderDate] = '2016-03-18'

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[Amt1] = 4733.00

For these equality queries, we get the following plans:

SQL isn’t fooled with an equality on 500. We get a little plan. We’ll examine inequality plans in a moment. For now let’s look at the middle plan. That’s where the Index Intersection is occurring. The bottom plan has a regular Key Lookup.

The costs and estimates here are Banana Town crazy. And right there down the bottom, we can see SQL using the Clustered Index key to join our Nonclustered Indexes together. If you’ve been reading this blog regularly, you should know that Clustered Index key columns are carried over to all your Nonclustered Indexes.

If we switch to inequality queries, well…

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[ID] > 500

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[OrderDate] > '2016-03-18'

SELECT *
FROM [dbo].[Stats_Test] AS [st]
WHERE [st].[Amt1] > 4733.00

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[ID] > 500

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[OrderDate] > '2016-03-18'

SELECT *

FROM [dbo].[Stats_Test] AS [st]

WHERE [st].[Amt1] > 4733.00

The top query that SQL wasn’t fooled by before now has the same insane estimates as the others. Our two bottom queries get missing index requests due to the amount of work the Index Intersection takes.

It’s happening because of the SELECT * query pattern. This will go away if we stick to only selecting columns that are in our Nonclustered Indexes. For example, SELECT ID will result in some pretty sane index seeks occuring. The estimated rows are still way up there.

Unfortunately, STATISTICS TIME and IO are not fooled by our statistical tomfoolery.

They use reality-based measurements of our query activity. This trick is really only useful to see what happens to execution plans. But hey it’s a lot cheaper, easier, and faster than actually inserting 10 billion rows.

So what?

Like a grandma in a grocery store, SQL Server makes all its decisions based on cost. Whatever is cheapest is choice. If SQL Server were a person, it would probably wash and dry used aluminum foil, save old bread ties, and use clothes pins for the right thing.

I forget what I was going to say. Probably something smart about testing your queries about sets of data commensurate to what you have in production (or larger) so that you don’t get caught flatfooted by perf issues on code releases, or if your company finally starts getting customers. This is one technique to see how SQL will treat your code as you get more rows and pages involved.

Just don’t forget to set things back when you’re done. A regular stats update will take care of that.

Thanks for reading!

Stats Week: Do Query Predicates Affect Histogram Step Creation?

Last Updated May 26, 2020

Execution Plans, Parameter Sniffing, SQL Server

Auto Create Statistics is your friend

It’s not perfect, but 99% of the time I’d rather have imperfect statistics than no statistics. This question struck me as interesting, because the optimizer will totally sniff parameters to compile an initial plan. If you don’t have index statistics, or system statistics already on a column in a WHERE clause, SQL is generally kind enough to create a statistics object for you when the query is compiled.

So I thought to myself: Would SQL create an initial histogram based on the compile-time parameter? It might be nice if it did, since it could potentially get the best possible information about predicate cardinality from a direct hit on a histogram step.

Here’s a quick test that shows, no, SQL doesn’t give half a care about that. It creates the same histogram no matter what. 1000 rows should do the trick. I’m making both columns NOT NULL here, because I want to make one my PK, and I want to make sure there’s no histogram step for NULL values in the other. I’m not going to index my date column here, I’m going to let SQL generate statistics automatically.

SELECT
    ISNULL([x].[ID], 0) AS [ID] ,
    ISNULL(CAST([x].[DateCol] AS DATE), '1900-01-01') AS [HireDate]
INTO
    [dbo].[AutoStatsTest]
FROM
    ( SELECT TOP 1000
        ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) ,
        DATEADD(HOUR, [m].[message_id], GETDATE())
      FROM
        [sys].[messages] AS [m] ) [x] ( [ID], [DateCol] );

ALTER TABLE [dbo].[AutoStatsTest] ADD CONSTRAINT [pk_t1_id] PRIMARY KEY CLUSTERED ([ID]);

SELECT

ISNULL([x].[ID], 0) AS [ID] ,

ISNULL(CAST([x].[DateCol] AS DATE), '1900-01-01') AS [HireDate]

INTO

[dbo].[AutoStatsTest]

FROM

( SELECT TOP 1000

ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) ,

DATEADD(HOUR, [m].[message_id], GETDATE())

FROM

[sys].[messages] AS [m] ) [x] ( [ID], [DateCol] );

ALTER TABLE [dbo].[AutoStatsTest] ADD CONSTRAINT [pk_t1_id] PRIMARY KEY CLUSTERED ([ID]);

First, let’s check in on what values we have

I’m going to run one query that will generate a histogram, but it’s guaranteed to return all of the table data. I want to see what SQL comes up with for histogram hits and missing, here.

SELECT *
FROM [dbo].[AutoStatsTest] AS [ast]
WHERE [ast].[HireDate] >= '1900-01-01'

SELECT *

FROM [dbo].[AutoStatsTest] AS [ast]

WHERE [ast].[HireDate] >= '1900-01-01'

We have our histogram, and I’ll use a clunky DBCC command to show me to it. Below is a partial screen cap, up to a point of interest.

SQL created a histogram with direct hits on 04/30, and then 05/02. That means it doesn’t have a step for 05/01, but it does postulate that there are 22 rows with a date of 05/01 in the RANGE_ROWS column.

I went ahead and dropped that table and re-created it. Next we’ll run the same query, but we’ll pass in 05/01 as an equality value.

SELECT *
FROM [dbo].[AutoStatsTest] AS [ast]
WHERE [ast].[HireDate] = '2016-05-01'

SELECT *

FROM [dbo].[AutoStatsTest] AS [ast]

WHERE [ast].[HireDate] = '2016-05-01'

And, long story short, it creates the exact same histogram as before.

Is this good? Is this bad?

Well, at least it’s reliable. I’m not sure how I feel about it otherwise.

You can try creating filtered indexes or statistics on really important segments of data, if you really need SQL to have granular information about it. Otherwise, you’ll have to trust in the secret, and sometimes not so secret sauce, behind the cardinality estimator.

Thanks for reading!

Brent says: the more I work with SQL Server, the more I’m filled with optimism about the oddest things. When I read Erik’s idea about the exact histogram step, though, I thought, “Nopetopus.”

Stats Week: Statistics Terminology Cheatsheet

Last Updated April 9, 2017

These things used to confuse me so much

Despite having worked at a Market Research company for a while, I know nothing about statistics, other than that project managers have all sorts of disagreeably subjective phrases for describing them. Vast majority, convincing plurality, dwindling minority, et al. Less talky, more picture.

When I started getting into SQL Server, and learning about statistics, I heard the same phrases over and over again, but wasn’t exactly sure what they meant.

Here are a few of them:

Selectivity

This tells you how special your snowflakes are. When a column is called “highly selective” that usually means values aren’t repeating all that often, if at all. Think about order numbers, identity or sequence values, GUIDs, etc.

Density

This is sort of the anti-matter to selectivity. Highly dense columns aren’t very unique. They’ll return a lot of rows for a given value. Think about Zip Codes, Gender, Marital Status, etc. If you were to select all the people in 10002, a densely (there’s that word again) populated zip code in Chinatown, you’d probably wait a while, kill the query, and add another filter.

Cardinality

If you mash selectivity and density together, you end up with cardinality. This is the number of rows that satisfy a given predicate. This is very important, because poor cardinality estimation can arise from a number of places, and every time it can really ruin query performance.

Here’s a quick example of each for a 10,000 row table with three columns.

USE [tempdb];

WITH x AS (
SELECT TOP 10000
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [rn]
FROM sys.[messages] AS [m]
)
SELECT
[x].[rn],
CASE WHEN [x].[rn] % 2 = 0 THEN 'M' ELSE 'F' END AS [Gender],
CASE WHEN [x].[rn] % 2 = 0 THEN 'Married' WHEN [x].[rn] % 3 = 0 THEN 'Divorced' WHEN [x].[rn] % 5 = 0 THEN 'Single' ELSE 'Dead' END AS [MaritalStatus]
INTO #xgen
FROM [x]

/*Selectivity*/
SELECT COUNT_BIG(DISTINCT [x].[rn])
FROM [#xgen] AS [x]

SELECT COUNT_BIG(DISTINCT [x].[Gender])
FROM [#xgen] AS [x]

SELECT COUNT_BIG(DISTINCT [x].[MaritalStatus])
FROM [#xgen] AS [x]

/*Density*/
SELECT (1. / COUNT_BIG(DISTINCT [x].[rn]))
FROM [#xgen] AS [x]

SELECT (1. / COUNT_BIG(DISTINCT [x].[Gender]))
FROM [#xgen] AS [x]

SELECT (1. / COUNT_BIG(DISTINCT [x].[MaritalStatus]))
FROM [#xgen] AS [x]

/*Reverse engineering Density*/
SELECT 1.0 / 0.00010000000000000000

SELECT 1.0 / 0.50000000000000000000

SELECT 1.0 / 0.25000000000000000000

/*Cardinality*/
SELECT COUNT_BIG(*) / COUNT_BIG(DISTINCT [x].[rn])
FROM [#xgen] AS [x]

SELECT COUNT_BIG(*) / COUNT_BIG(DISTINCT [x].[Gender])
FROM [#xgen] AS [x]

SELECT COUNT_BIG(*) / COUNT_BIG(DISTINCT [x].[MaritalStatus])
FROM [#xgen] AS [x]

DROP TABLE [#xgen]

USE [tempdb];

WITH x AS (

SELECT TOP 10000

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS [rn]

FROM sys.[messages] AS [m]

)

SELECT

[x].[rn],

CASE WHEN [x].[rn] % 2 = 0 THEN 'M' ELSE 'F' END AS [Gender],

CASE WHEN [x].[rn] % 2 = 0 THEN 'Married' WHEN [x].[rn] % 3 = 0 THEN 'Divorced' WHEN [x].[rn] % 5 = 0 THEN 'Single' ELSE 'Dead' END AS [MaritalStatus]

INTO #xgen

FROM [x]

/*Selectivity*/

SELECT COUNT_BIG(DISTINCT [x].[rn])

FROM [#xgen] AS [x]

SELECT COUNT_BIG(DISTINCT [x].[Gender])

FROM [#xgen] AS [x]

SELECT COUNT_BIG(DISTINCT [x].[MaritalStatus])

FROM [#xgen] AS [x]

/*Density*/

SELECT (1. / COUNT_BIG(DISTINCT [x].[rn]))

FROM [#xgen] AS [x]

SELECT (1. / COUNT_BIG(DISTINCT [x].[Gender]))

FROM [#xgen] AS [x]

SELECT (1. / COUNT_BIG(DISTINCT [x].[MaritalStatus]))

FROM [#xgen] AS [x]

/*Reverse engineering Density*/

SELECT 1.0 / 0.00010000000000000000

SELECT 1.0 / 0.50000000000000000000

SELECT 1.0 / 0.25000000000000000000

/*Cardinality*/

SELECT COUNT_BIG(*) / COUNT_BIG(DISTINCT [x].[rn])

FROM [#xgen] AS [x]

SELECT COUNT_BIG(*) / COUNT_BIG(DISTINCT [x].[Gender])

FROM [#xgen] AS [x]

SELECT COUNT_BIG(*) / COUNT_BIG(DISTINCT [x].[MaritalStatus])

FROM [#xgen] AS [x]

DROP TABLE [#xgen]

Bigger by the day

A lot has been written about cardinality estimation. SQL Server 2014 saw a total re-write of the cardinality estimation guts that had been around since SQL Server 2000, build-to-build tinkering notwithstanding.

In my examples, it’s all pretty cut and dry. If you’re looking at a normal sales database that follows the 80/20 rule, where 80 percent of your business comes from 20 percent of your clients, the customer ID columns may be highly skewed towards a small group of clients. It’s good for SQL to know this stuff so it can come up with good execution plans for you. It’s good for you to understand how parameter sniffing works so you understand why that execution plan was good for a small client, but not good for any big clients.

That’s why you should go see Brent in person. He’ll tell you all this stuff, feed you, give you prizes, and then you go home and get a raise because you can fix problems. Everyone wins!

Thanks for reading!

Brent says: wanna learn more about statistics? Check out Dave Ballantyne’s past SQLbits videos, including the one about the new 2014 CE.

Looking for a New Challenge? kCura is Hiring a DBA.

Last Updated April 14, 2016

Interested? Get on over there and apply.

I’m a big kCura Relativity fan – it’s an application that really pushes SQL Server hard, written by people who are a ton of fun to work with.

If you’re looking for a challenge in a really cool environment, check out what they’re looking for:

A Production Database Administrator with a deep demonstrated knowledge of MS SQL administrative tasks and the ability to consult on design, development, and automation improvements. Having a passion for maintaining MS SQL databases that meet or exceed internal and client contracted production SLAs for availability and performance. The right candidate will ideally have past hands-on experience administering MS SQL databases running in a public cloud environment such as AWS or MS Azure.

Responsibilities

Install and configure SQL Server 2012 and higher versions. Configurations built and supported should include scenarios that include leveraging SQL Always On, windows fail over clustering, and transaction replication.
Document complex installation, configuration, and optimization procedures so they can be automated.
Provide support 24/7/365 for any troubleshooting or corrective actions related to incidents impacting application availability within the production environments.
Take proactive measures to monitor, trend, and tune SQL databases, such as running maintenance jobs (backups, DBCCs, apply indexes/re-indexing, etc.), to meet or exceed baseline stability and performance SLAs on large databases (1 TB+) and large volumes of databases (100+).
Create, implement, and maintain SQL DB Health Checks, and have a demonstrated ability to automate SQL health reporting/event notification, and corrective actions.
Configure SQL Server monitoring utilities to minimize false alarms, and have a demonstrated ability to monitor/trend SQL environments to determine and implement enhanced monitoring thresholds to prevent incidents and reduce mean time to recovery (MTTR).
When performance issues arise, determine the most effective way to increase performance including scaling up or out, server configuration changes, index/query changes, etc.
Identify code defects and enhancements and develop a detailed root cause analysis that can be leveraged by the product management and development teams to improve application availability and decrease the total cost of ownership.
Ensure databases are being backed up and can be recovered in a manner that meets all BCDR objectives for RPO and RTO.
Perform all database management responsibilities in Microsoft Azure for production and non-production workloads.

Qualifications:

At least 4 years’ experience working as a Microsoft SQL DBA leveraging versions 2008r2 or later.
Experience working in a 24/7/365 operation.
Bachelor’s degree in computer science or information systems.
Familiar with basic Azure IaaS capabilities, and some experience designing and building MS SQL databases within Azure or AWS.
Microsoft certifications such as MCSE, MCSD, etc.
Experience operating in an ISO certified and/or highly regulated (SSAE, PCI, HIPPA, etc.) hosting operation.
Familiar with Dev/Ops concepts, and ideally experience working with a Dev/Ops team focused on implementing and enhancing continuous delivery capabilities.
Experience automating SQL Server deployment and configuration through PowerShell, Chef, Puppet, etc.
Background designing, building, and managing a search and indexing solution such as Elastic Search, Apache SOLR, etc.
Previous Relativity system administration experience.

(You should read that “qualifications” list as a perfect candidate, and don’t be dissuaded from applying if you’re not the perfect candidate.)

Old and Busted: DBCC Commands in 2016

Last Updated April 6, 2016

I hate DBCC Commands

Not what they do, just that the syntax isn’t consistent (do I need quotes around this string or not?), the results are a distraction to get into a usable table, and you need to write absurd loops to perform object-at-a-time data gathering. I’m not talking about running DBCC CHECKDB (necessarily), or turning on Trace Flags, or any cache-clearing commands — you know, things that perform actions — I mean things that spit tabular results at you.

Stuff like this puts the BC in DBCC. It’s a dinosaur.

In SQL Server 2016

DBCC INPUTBUFFER got its own DMV after, like, a million decades. Commands like DBCC SQLPERF, DBCC DBINFO, and DBCC LOGINFO should probably get their own. Pinal Dave has a whole list of DBCC Commands that you can break your server with here.

But truly, the most annoying one to me is DBCC SHOW_STATISTICS. It’s insane that there’s no DMV or function to expose histogram information. ~~That’s why I filed this Connect item.~~

UPDATE: It looks like Greg Low beat me to it by about 6 years. Too bad searching Connect items is so horrible. I urge you to vote for Greg’s item, instead.

Statistics are the intel SQL Server uses to make query plan choices.

Making this information easier to retrieve, aggregate, join to other information, and analyze would put a powerful performance tuning tool into the hands of SQL Server users, and it would help take some of the mystery surrounding statistics away.

Please consider voting for my Greg’s Connect item.

Thanks for reading!

How to Run DBCC CHECKDB for In-Memory OLTP (Hekaton) Tables

Last Updated February 15, 2019

CHECKDB and Corruption, In-Memory OLTP (Hekaton), SQL Server

tl;dr – run a copy-only full backup of the Hekaton filegroup to nul. If the backup fails, you have corruption, and you need to immediately make plans to either export all your data, or do a restore from your last good full backup, plus all your transaction log backups since.

Yeah, that one’s gonna need a little more explanation, I can tell. Here’s the background.

DBCC CHECKDB skips In-Memory OLTP tables.

Books Online explains that even in SQL Server 2016, DBCC CHECKDB simply skips Hekaton tables outright, and you can’t force it to look at them:

Hey, that’s the same way I handle merge replication – I just look away.

However, that’s not to say these tables can’t get corrupted – they have checksums, and SQL Server checks those whenever the pages are read off disk. That happens in two scenarios:

Scenario 1: when SQL Server is started up, it reads the pages from disk into memory. If it finds corruption at this point, the entire database won’t start up. Even your corruption-free, non-Hekaton tables will just not be available. Your options at this point are to restore the database, or to fail over to a different server, or start altering the database to remove Hekaton. Your application is down.

Scenario 2: when we run a full (not log) backup, SQL Server reads Hekaton’s data from disk and writes to the backup file. If corruption is found, the backup fails. Period. You can still run log backups, but not full backups. When your full backup fails due to corrupt in-memory OLTP pages, that’s your sign to build a Plan B server or database immediately.

Here’s the details from Books Online:

The easy fix: run full native backups every day, and freak out when they fail.

Backup failures aren’t normally a big deal, but if you use in-memory OLTP on a standalone server or a failover clustered instance, backup failures are all-out emergencies. You need to immediately find out if the backup just ran out of drive space or lost its network connection, or if you have game-over Hekaton corruption.

Note that you can’t use SAN snapshot backups here. SQL Server won’t read the In-Memory OLTP pages during a snapshot backup, which means they can still be totally corrupt.

This works fine for shops with relatively small databases, say under 500GB.

The harder fix: back up just the In-Memory OLTP data daily.

With SQL Server 2016, the Hekaton limits have been raised to 2TB – and you don’t really want to be backing up a 2TB database the old-school way, every day. You could also have a scenario where a >1TB database has a relatively small amount of Hekaton data – you want to use SAN snapshot backups, but you still have to do conventional backups for the Hekaton data in order to get corruption checks.

Thankfully, Hekaton objects are confined to their own filegroup, so Microsoft PM Jos de Bruijn pointed out to me that we can just run a backup of just that one filegroup, and we can run it to NUL: to avoid writing any data to disk:

Oops, did I say we could just back up that filegroup? Not exactly – you also have to back up the primary filegroup at the same time.

If you’re doing great (not just good) database design for very large databases, you’ve:

Created a separate filegroup for your tables
Set it as the default
Moved all the clustered & nonclustered indexes over to it
Kept the primary filegroup empty so you can do piecemeal restores

If not, hey, you’re about to. An empty primary filegroup will then let you do this faster:

Checking for corruption by backing up to NUL:

Tah-dah! Now we know we don’t have corruption.

This comes in handy if you’ve got a large database and you’re only doing weekly (or heaven forbid, monthly) full backups, and doing differential and log backups the rest of the time. Now you can back up just your in-memory OLTP objects for corruption.

Note that in these examples, I’m doing a copy_only backup – this lets me continue to do differential backups if that sort of thing is your bag.

For bonus points, if your Hekaton data is copied to other servers using Always On Availability Groups, you’ll want to do this trick on every replica where you might fail over to or run full backups on. (Automatic page repair doesn’t appear to be available for In-Memory OLTP objects.)

If you’d like CHECKDB to actually, uh, CHECK the DB, give the request an upvote here.

Breaking News, Literally: SQL CLR Support Removed from Azure SQL DB

Last Updated September 21, 2017