Blog

Memory Grants Added to sp_BlitzCache and sp_BlitzFirst

Exciting New Doodads

When SP3 for 2012 dropped, we were all super excited by the new query tuning-centric features that were at our disposal. Now all we had to do was get people to install SP3! Great features like this make patching an easier sell. Now with SP2 for 2014 out, a lot of those great features finally made it to 2014. Let’s talk about what they mean for you, and what they mean for us.

For you, dear reader

If you’re rocking the second-newest version of SQL Server, you can get all sorts of cool new insights into what your queries are doing when you’re not watching, as well as when you are. There are some great additions to execution plans baked in here, too. Some, like the TempDB spills, are in the actual execution plan only, but don’t let that get you down. This stuff was all included in 2016, so don’t feel like you’re going to miss out if you set your upgrade sights on the brand new. Note to self: someday this blog post is going to sound hopelessly outdated.

Seriously, if you build applications that hit SQL Server, get them tested on these newer versions. I’m not just trying to line Microsoft’s licensing pockets here, I’m trying to help you out. Many of these things are awesome for tuning performance problems. They will make your life easier. You’ll look like a hero when you can quickly diagnose why something is slow on a client’s kajillion dollar server that they don’t have any monitoring for and their DBA is a SAN admin who restarts SQL every morning because it’s using too much memory.

For us

As consultants who build tools to make your SQL Server life easier, they’re great opportunities for us to not only see how these new features work, but also to figure out ways to let you know how to read, interpret, and troubleshoot problems with new information. We’ve been hard at work updating sp_BlitzCache and sp_BlitzFirst to keep you a sane and functioning SQL user.

Having this information standard across the three most recent versions makes presenting and analyzing it much easier. So what did we do?

sp_BlitzFirst

If you run this with @ExpertMode = 1, you get a snapshot of what’s currently running at the beginning and end of the window you’re measuring. So, if you use @ExpertMode = 1, @Seconds = 30, you’ll see what was running on both sides of that 30 second window.

Excuse the hugeness.

Excuse the hugeness.

Pretty sweet. If you’re on a version that supports it, you’ll also get these columns in your results.

They all love your miniature ways.

They all love your miniature ways.

If your query isn’t running for the first time, or coming back from a recompile, these columns can have some useful information in them about all the many resources they’re using. You may be able to spot variations in here that can explain why things are slow ‘sometimes’. You may also be able to spot large memory grants on rather small queries, and/or rather large queries that for some reason aren’t going parallel, etc. Good stuff! Thanks, Microsoft.

sp_BlitzCache

Everyone’s favorite tool (poll in company chat) for spelunking your plan crevasse got some new columns and some new sort modes specifically for memory grants. Again, only if you’re on a supported version. You’ll see columns for Min and Max Memory Grants, Min and Max Used, and Average Memory Grant. There’s a new warning for when queries use <= 10% of their memory grant, which is configurable.

EXEC dbo.sp_BlitzCache @SortOrder = 'memory grant'

EXEC dbo.sp_BlitzCache @SortOrder = 'average memory grant'

Sorting by memory grant will sort by the maximum grant requested column, and average memory grant will sort by the largest average, determined by max grant / executions.

New columns:

Percent Used isn't displayed

Quantify!

New warning:

The Unused

The Unused

If you’re living in the present

Give these new versions a shot, and if you’re giddy about GitHub, drop us a line with any issues, or code contributions.

Thanks for reading!

Why Not Just Create Statistics?

Here at Brent Ozar Unlimited

We have a proud tradition of not blaming index fragmentation for everything. There are points you should deal with it, but they’re probably not 5% and 30% and 1000 pages. But that’s not what this blog post is about. I’m hoping to clarify why we’re more interested in up to date statistics, and also why statistics outside of indexes aren’t really the most helpful thing. If they were, we’d all just create statistics and every query would magically blaze from select to offset without a care in the world.

Statistics: It’s What’s Inside That Counts

Statistics are what SQL Server uses to figure out how many rows to expect from accessing a data structure. You can do some things that fundamentally break this, like using functions in joins or where clauses, using local variables or optimize for unknown, using table variables without recompile hints, and a sniffed parameter can just allow SQL to run a query without guessing at all. It already guessed once. No tag backs. As you may imagine, this is important information for SQL to have for running your queries optimally.

Indexes: Pride and Vanity

Indexes of the nonclustered variety contain subsets of your table or view’s data. Clustered ones are your table or view data ordered by the clustering key(s). Focusing on the nonclustered variety, they’re the “and the [band name]” to your clustered index’s “[lead singer name]”, and they’re great for providing SQL with a more appropriate data structure for your query.

If you don’t need to select all the columns, and you do need to filter, join, order by, or aggregate a column not in the key of the clustered index, nonclustered indexes get a solo after the chorus. Give the drummer some.

Nonclustered indexes will, under normal circumstances, get a statistics object created with rolling density information for the key columns going left to right, and a histogram on only the leftmost column.

I AM I SAID

I AM I SAID

With that out of the way

Why do we care more about statistics being updated than indexes being fragmented? Largely, because reading pages with some empty space from a fragmented index is oh-so-very-rarely the root cause of a performance issue. Especially if those pages are already in memory. Out of date statistics can allow SQL to continue to make some really bad guesses, and keep giving you a lousy execution plan no matter which way you tune your query.

The bigger your table gets, the worse the problem gets. Prior to 2016, if you don’t turn on Trace Flag 2371, about 20% of your table’s rows need to change before an automatic statistics update kicks in. For a 100 million row table, this can be a long ways off. Poor cardinality estimates here can really sink you. Rebuilding indexes for a 100 million row table is a B-U-M-M-E-R.

Log Shipping? Mirroring? Availability Group? Good luck with that.

Crappy server? Low memory? Slow disk? Dead man walking.

You may ultimately spend more time and expend more server resources defragmenting indexes than your queries will spend reading extra pages from fragmented indexes. Rebuilding or reorganizing large indexes can be a special kind of brutal.

Consider the process

Read a bunch of index pages with sys.dm_db_index_physical_stats to figure out if there’s fragmentation, reorganize or rebuild based on feedback.

  • Reorganize is online but single threaded and can take FOREVER on big tables, especially if you’re compacting LOBs.
  • Rebuild is offline and single threaded in Standard, online and potentially parallel in Enterprise, but you better hope you’re patched up so you don’t corrupt anything.

Is that worth it? Every night? For every index on every table in every user database? Only if you can prove it.

The one upside to Rebuilding is that it also updates statistics with a full scan. Think about this the next time you say something like “rebuilding the index fixed the problem”, you may have an epiphany on the way.

Wait for it.

Wait for it.

Wait for it.

Statistics with no indexes

SQL Server doesn’t make easy work of getting information about your Statistics, or finding out which statistics get used. Even at the query level, you have to use a spate of Trace Flags to find out what gets loaded and looked at. Cached plans don’t fare much better.

No wonder everyone cares about indexes and their fragmentation. Microsoft has made information about them easy and abundant, while Statistics are kept hidden in the basement next to piles of soft bones and a bowl of hot blood.

Head rush moment: SQL may use information from histograms outside of the index it chooses for cardinality estimation.

Back to earth: If you just create a bunch of statistics instead of indexes, you’re (at best) using your Clustered Index for everything (which is still bad), or you’re using a HEAP for everything (which is usually worse). You’re still generally better off creating good indexes for your workload. They’ll get statistics objects created and associated with them, and if SQL thinks another column is interesting, it will create a single column statistics object for it, as long as you haven’t turned off auto create stats.

Sure, you can put on your black cloak and goat mask and create some multi-column or filtered statistics, but in the words of a wise man (Doug), you end up with more stats to maintain and understanding query behavior gets more difficult.

Filtered statistics suffer from a problem where they don’t automatically update based on the filtered row count, but rather the table row count. Imagine you have a 100 million row table, and your filtered index is on 1 million rows. All million of those rows might change, but the statistics on that index won’t. 1 million is not 20% of 100 million. You’ll have to update the statistics manually, or rebuild the filtered index.

Multi-column statistics are hardly a replacement for a multi-column index, and it’s not like you get an enhanced histogram that includes the second column. It’s just like a normal histogram. All you get is the density information for the columns you throw in there. Boo hiss.

Moral of the story (B-B-B-B-B-B-BENNY WITHOUT THE JETS)

Indexes are a really important factor for performance, but index fragmentation very rarely is. Statistics are super helpful when they’re not out of date, and getting them up to date is much easier on your server’s nerves. Though not perfect, I’d rather take my chances here. Updating statistics can also cause a bad execution plan to get flushed out of the cache. On their own they can sometimes help queries, but you should only end up here after you’ve really tuned your indexes.

Unless you can establish a metric that makes nightly index defragmentation worthwhile, don’t jump to it as the default. Try just updating statistics. You may find that nothing at all changes, and you now have many extra hours a night to do maintenance. Like run DBCC CHECKDB. If you think index fragmentation is a performance problem, try corruption sometime. That’s not what Brent meant when he said “the fastest query is one you never make.”

Thanks for reading!

TDE and Backup Compression: Together At Last

TDE is one of those things!

You either need it, and quickly learn how many things it plays the devil with, or you don’t need it, and there but for the grace of God go you. Off you go, with your compressed backups, your instant file initialization, your simple restore processes. Sod off, junior.

But Microsoft maybe listened or figured out something by accident. I don’t know which one yet, but they seem excited about it! And I am too! If you read this blog post that’s probably also being monitored closely by Chris Hansen, you’ll see why.

Backup compression now works with TDE

Cool! Great! Everyone encrypt your data and compress your backups. It’s fun. I promise.

Not satisfied with a few meek and meager data points, I set out to see if increasing Max Transfer Size also increased the degree of compression. Why? This paragraph.

It is important to know that while backing up a TDE-enable database, the compression will kick in ONLY if MAXTRANSFERSIZE is specified in the BACKUP command. Moreover, the value of MAXTRANSFERSIZE must be greater than 65536 (64 KB). The minimum value of the MAXTRANSFERSIZE parameter is 65536, and if you specify MAXTRANSFERSIZE = 65536 in the BACKUP command, then compression will not kick in. It must be “greater than” 65536. In fact, 65537 will do just good. It is recommended that you determine your optimum MAXTRANSFERSIZE through testing, based on your workload and storage subsystem. The default value of MAXTRANSFERSIZE for most devices is 1 MB, however, if you rely on the default, and skip specifying MAXTRANSFERSIZE explicitly in your BACKUP command, compression will be skipped.

It left things open ended for me. Unfortunately for me, it doesn’t help. Fortunately for you, you don’t have to wonder about it.

Check out the exxxtra large screen cap below, and we’ll talk about a few points.

I AM GIGANTIC!

I AM GIGANTIC!

First, the database without a Max Transfer Size at the bottom was a full backup I took with compression, before applying TDE. It took a little longer because I actually backed it up to disk. All of the looped backups I took after TDE was enabled, and Max Transfer Size was set, were backed up to NUL. This was going to take long enough to process without backing up to Hyper-V VM disks and blah blah blah.

The second backup up, just like the blog man said, no compression happens when you specify 65536 as the Max Transfer Size.

You can see pretty well that the difference between compressed backup sizes with and without TDE is negligible.

The most interesting part to me was the plateau in how long each backup took after a certain Max Transfer Size. Right around the 1MB mark, it hits the low 380s, and never strays very far from there afterwards. I could have tried other stuff to make this go faster, but my main interest was testing compression levels.

There you have it

Max Transfer Size doesn’t impact compression levels, but it can help duration. If you want to keep playing with switches, you can throw in Buffer Count, and try striping backups across multiple files to ‘parallelize’ output.

Thanks for reading!

Availability Group Direct Seeding: TDE’s Frenemy

From the Mailbag

In another post I did on Direct Seeding, reader Bryan Aubuchon asked if it plays nicely with TDE. I’ll be honest with you, TDE is one of the last things I test interoperability with. It’s annoying that it breaks Instant File Initialization, and mucks up backup compression. But I totally get the need for it, so I do eventually get to it.

The TL;DR here

Is that if you encrypt a database that’s already taking part in a Direct Seeding relationship, everything is fine. If you already have an encrypted database that you want to add to your Availability Group, Direct Seeding has a tough time with it.

I don’t think this is an outright attempt to push people to AlwaysEncrypted, because it has a lot of limitations.

Let’s walk through this

Because I love reader sanity checks, here we go. Microsoft tells you how to add a database encrypted with TDE to an existing Availability Group here.

wordswordswordsblahblahblah

wordswordswordsblahblahblah

That all sounds good! So let’s follow directions. We need a database! We also need a password, and a certificate. Alright, we can do this. We’re competent adults.

/*Create databse on acceptable path to all Replicas*/
CREATE DATABASE EncryptedCrap
 ON PRIMARY 
( NAME = 'EncryptedCrap', FILENAME = 'E:\Crap\EncryptedCrap.mdf')
 LOG ON 
( NAME = 'EncryptedCrap_log', FILENAME = 'E:\Crap\EncryptedCrap_log.ldf');
 
 /*Create key*/
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'GreatP0stBrent!' 
GO 

/*Create cert*/
CREATE CERTIFICATE EncryptedCrapCert 
WITH SUBJECT = 'If you can read this I probably got fired.'

Alright, cool. We did that. Now we have to get all up in our database and scramble its bits.

/*Get into database*/
USE EncryptedCrap  
GO  

/*Create database encryption key*/
CREATE DATABASE ENCRYPTION KEY  
WITH ALGORITHM = AES_128  
ENCRYPTION BY SERVER CERTIFICATE EncryptedCrapCert 
GO 

/*Turn encryption on*/
ALTER DATABASE EncryptedCrap SET ENCRYPTION ON

SQLCMD Appreciation Header

Few things in life will make you appreciate SQLCMD mode like working with Availability Groups. You can keep your PowerShell. $.hove-it; I’m with SQLCMD.

Stick with me through the next part. You may have to do this someday.

/*Back into master*/
USE master  
GO 

/*Backup cert to fileshare*/ 
BACKUP CERTIFICATE EncryptedCrapCert   
TO FILE = '\\Sqldc01\sqlcl1-fsw\NothingImportant\EncryptedCrap.cer'  
WITH PRIVATE KEY (FILE = '\\Sqldc01\sqlcl1-fsw\NothingImportant\EncryptedCrap.pvk' ,  
ENCRYPTION BY PASSWORD = 'GreatP0stBrent!' )  
GO

:CONNECT SQLVM02\AGNODE2

USE master  
GO  

/*Set up password*/
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'GreatP0stBrent!' 
GO 

/*Restore cert from share*/  
CREATE CERTIFICATE EncryptedCrapCert  
FROM FILE = '\\Sqldc01\sqlcl1-fsw\NothingImportant\EncryptedCrap.cer'   
WITH PRIVATE KEY (FILE = '\\Sqldc01\sqlcl1-fsw\NothingImportant\EncryptedCrap.pvk',   
DECRYPTION BY PASSWORD =  'GreatP0stBrent!');
GO

:CONNECT SQLVM03\AGNODE3

USE master  
GO  

/*Set up password*/
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'GreatP0stBrent!' 
GO 

/*Restore cert from share*/  
CREATE CERTIFICATE EncryptedCrapCert  
FROM FILE = '\\Sqldc01\sqlcl1-fsw\NothingImportant\EncryptedCrap.cer'   
WITH PRIVATE KEY (FILE = '\\Sqldc01\sqlcl1-fsw\NothingImportant\EncryptedCrap.pvk',   
DECRYPTION BY PASSWORD =  'GreatP0stBrent!');
GO

:CONNECT SQLVM01\AGNODE1

USE master
GO 

ALTER AVAILABILITY GROUP SQLAG01 ADD DATABASE EncryptedCrap
GO

What did we do?

Exactly what we did. We backed up our certificate to a network share, created a private key for it, and then on two replicas we created master passwords, and created certificates using the backup of our certificate from the primary. We did this in one SSMS window. Magical. Then we added our encrypted database to the Availability Group.

If this database weren’t encrypted, everything would probably go just fine. I say probably because, you know, computers are just the worst.

But because it is encrypted, we get some errors. On our Primary Replica, we get normal startup messages, and then messages about things failing with a transient error. Not sure what a transient error is. It forgot to tie its shoelaces before running to jump on that freight car.

Log du jour

Log du jour

On our Replicas, we get a different set of messages. Backup failures. Database doesn’t exist. More transient errors. This time you left an open can of pork beans by the barrel fire.

I failed college algebra, again.

I failed college algebra, again.

Over in our Extended Events session that tracks automatic seeding, we get an error code! searching for it doesn’t really turn up much. New features. Good luck with them.

Ungoogleable errors.

Ungoogleable errors.

One bright, shiny star of error message-y goodness shows up in our Physical Seeding Extended Event session. Look at all those potentially helpful failure codes! An individual could get a lot of useful information from those.

Attempting Helpful.

Attempting Helpful.

If only you weren’t being laughed at by the Gods of HA/DR. Some of the physical_seeding Extended Events have values here, but none of the automatic seeding ones do.

Feature Complete.

Feature Complete.

As of now

I don’t have a work around for this. The alternatives are to decrypt, and then re-encrypt your database after you add it, or add it the old fashioned way. Maybe something will change in the future, but as of now, these don’t appear to be compatible.

I’ve opened a Connect Item about this. I’d appreciate votes of the upward variety, if you feel so inclined.

Thanks for reading!

One weird trick for managing a bunch of servers

Let’s face it, most people don’t have just one SQL Server

How many they tell Microsoft they have is another matter, but let the record show that I don’t condone licensing dishonesty. But going one step further, most places… Well, they’re ‘lucky’ if they have one DBA, never mind a team.

Everyone else: Give me your network people, your sysadmin, your huddled SAN group yearning to breathe free, the wretched refuse of your teeming developers.

Doing things on one server is aggravating enough. Doing things on a bunch of servers is even worse. Given some of today’s HA/DR features (I’m looking at you, Availability Groups, with your lack of a mechanism to sync anything outside of user databases. Rude.) people are more and more likely to have lots of SQL Servers that they need to tend to.

Sometimes just keeping track of them is impossible. If you’re one guy with 20 servers, have fun scrolling through the connection list in SSMS trying to remember which one is which. Because people name things well, right? Here’s SQLVM27\Instance1, SQLVM27\Instance2, SQLVM27\Instance3, and that old legacy accounting database is around here somewhere.

Register it and forget it

But don’t actually forget it. If you forget it and it goes offline, people will look at you funny. Turns out people don’t like offline servers much.

So what’s someone to do with all these servers? Register them! Hidden deep in the View menu of SSMS is the Registered Servers window

Hi there, handsome.

Hi there, handsome.

It will look pretty barren at first, just an empty folder. But you’ll fill it up quick, I’m sure. Can never have enough servers around, you know.

It’s pretty easy to populate, you can right click on the Local Server Group folder, or on servers you’re connected to in Object Explorer.

This

This

That

That

 

 

 

 

 

 

 

Either way, you get the same dialog box to add a server in. You can give it a friendly name if you want! Maybe WIN03-SQL05\Misc doesn’t tell a good story.

Joy of joys

Joy of joys

And if you hip and hop over to the Connection Properties tab, you can set all sorts of nifty stuff up. The biggest one for me was to give different types of servers different colored tabs that the bottom of SSMS is highlighted with. It’s the one you’re probably looking at now that’s a putrid yellow-ish color and tells you you’re connected and that your query has been executing for three hours. Reassuring. Anyway, I’d use this to differentiate dev from prod servers. Just make sure to choose light colors, because the black text doesn’t show up on dark colors too well.

Wonder of wonders

Wonder of wonders

Another piece of advice here is not to mix servers on different major (and sometimes minor) versions. The reason is that this feature gives you the ability to query multiple servers at once. If you’re looking at DMVs, they can have different columns in them, and you’ll just get an error. Even a simple query to sys.databases will throw you a bonk between 2012 and 2014.

By the planets!

By the planets!

I changed my mind. I hate planets.

I changed my mind. I hate planets.

Even if you’re running 2008R2, there are some pretty big differences in DMVs between SP1 and SP3. Microsoft has been known to change stuff in CUs (I’m looking at you, Extended Events).

On the plus side, you can use your multi-server connection to SELECT @@VERSION to help you decide how you should group them. If they have something better in common, like participating in Log Shipping, Mirroring, an AG, etc., all the better.

Insight

Insight

But my favorite thing, because I was a devotee to the Blitz line of stored procedures even before I got paid to like them, was that I could install them on ALL OF MY SERVERS AT ONCE! This was especially useful when updates came out. You know what it’s like to put a stored proc on 20 servers one at a time? Geeeeeet outta here!

Peanuts.

Peanuts.

Check that out. It’s on both of my servers. At once. That means simultaneously, FYI. If you have a DBA or Admin database that you keep on all your servers to hold your fancy pants scripts and tools, this is an awesome way to make sure they all have the latest and greatest.

You’re already better at your job

Even though this feature came out in 2008, I hardly see anyone using it. I found it really helpful comparing indexes and query plans across app servers that held different client data across them. It also exposes far less than Linked Servers; you need to worry less about access and level of privilege.

Just don’t forget to export your list if you change laptops!

Thanks for reading!

Psst… Hey buddy, you want some E-Discovery?

One year ago today!

Well, sort of one year ago. Who knows when this thing will get published? Only God and Brent. And part of the calendar year fell during a leap year, which was just plain cruel, like a two part Buffy episode where the second part is the first episode of the next season DAMN YOU JOSS WHEDON!

Anyway, I started working here, annoying you guys with blog posts, giving Doug rub-downs in between video takes, and walking Ernie when the Ozar family was too full of illegal caviar and albino truffles to move. I also started running down the clock on being able to work with my favorite piece of software again. You can probably guess.

Relativity!

kcura_relativitySeriously, I love this thing. Not just because many of the databases I worked with under the software were hundreds of gigs, on up to 9 terabytes, but because the people behind the software really do care about the product. The customer support is aces (Hello, Pod One), and the developers are super helpful and responsive.

Plus, it’s just plain interesting. You give lawyers this crazy interface that lets them build just about any search query they can dream of, including some really, really bad ones, and see how SQL Server reacts.

If you’re a DBA who has read the execution plan of a Relativity query where saved searches reference saved searches that reference saved searches that… you get the point! I feel your pain.

It’s not just the hardware

You can’t fix every saved search and workflow immediately, which makes right-sizing hardware super important, but that’s not the only thing. Every case is different, and they often need custom indexes.

If you’re a DBA who has watched performance tank because some new search suddenly started scanning the clustered index of your 50 million row, one terabyte Documents table with a wildcard search on Email Body and Email Subject and Email Sender and Email Recipients and Email Metadata for ‘insert super common word here’, I feel your pain.

Best in Service

My other favorite part about Relativity is that they have standards. Not last call for alcohol standards, either. They keep you, as a DBA, honest. No backups? Ding. No DBCC CHECKDB? Ding. Perf in the tank? Ding.

The challenges that you’re presented with at scale are immense. You have 100 terabytes of data and you need to check it for corruption weekly. How’s that gonna work?

Index and statistics maintenance can be super important, too. Fragmentation may not matter if you’re reading 1500 pages instead of 1000 pages, but it can sure as heck matter when you’re reading 1,500,000 pages rather than 1,000,000 pages. And all those ascending keys? SQL Server is gonna make some real bad judgement calls on those, especially prior to 2014.

It’s a wild, wild life

You have users bulk loading thousands to millions of documents, users updating records during review, and then searches running on top of all that.

I am thrilled to be able to work with my favorite product again. If you’re experiencing Relativity pains, drop us a line.

Thanks for reading!

Brent says: For a refresher on this app and how we work with it, check out our past posts on The SQL Server Components of kCura Relativity, Performance Tuning kCura Relativity, Using Partitioning to Make kCura Relativity Faster, and Tiering Relativity Databases.

Changes to auto update stats thresholds in SQL Server 2016

TL;DR

As of CTP 3.3, it’s the same behavior as Trace Flag 2371, in 2008 R2 SP1 and onward. That basically means that the bigger your table is, the fewer rows need to be modified before an automatic statistics update occurs.

Slightly longer…

The change was announced over here. At first I thought, woah, cool, they thought about this and made big changes. But no, much like Trace Flags 1117 and 1118 being enabled for tempdb, it’s just…

Remember in Mortal Kombat, when getting to fight Reptile made you cool? Then Mortal Kombat 2 came out, and he was a playable character, and everyone would call your wins cheap if you picked him? That’s sort of what this reminds me of. If you’re new to SQL, you probably won’t appreciate the differences these Trace Flags make. If you’ve been using it for a while, you’ll probably start sentences with “back in my day, we had to add startup parameters…” and chuff off to write miserably long blog posts about unique indexes.

As of 02/23/2016, it sounds like Trace Flag 8048 is also enabled by default in 2016. See quote about soft NUMA at the link.

Moderate testing

I ran tests on some fairly large tables. I tried to run them on tables from 100 million to 1 billion rows, but I blew out the data drive of our AWS instance. So, uh, if you have a bigger server to test stuff out on, be my guest.

The basic concept was:

  1. Load a bunch data into a table
  2. Update it 1000 rows at a time (I know, I know, but updating less than that took FOREVER)
  3. Run a query against it to invalidate stats
  4. If they reset, add a bunch more data and start over

What I ended up with was, well…

Eighth place.

Eighth place.

Here’s an abridged version of 10-20 million and 30-40 million rows, and how many modifications they took before a stats update occurred. If you follow the PercentMod column down, the returns diminish a bit the higher up you get. I’m not saying that I’d prefer to wait for 20% + 500 rows to modify, by any stretch. My only point here is that there’s not a set percentage to point to.

And, because you’re probably wondering, turning on Trace Flag 2371 in 2016 doesn’t make any difference. Here’s what 10-100 million look like, in 10 million row chunks.

And then SQL crashed.

And then SQL crashed.

If you can guess which side TF 2371 was on for, I’ll give you one merlin dollhairs.

Great expectations

This improvement is certainly welcome as a default, though it’s not all that ‘new’. My 2014 instance comes up with the same thresholds with 2371 enabled. Unless you’re working with pretty big tables, or used to managing statistics updates on your own, you likely won’t even notice the change.

Thanks for reading!

Brent says: It’s kinda like Microsoft is treating trace flags as alpha/beta tests for new features now. That’s right in line with how .com startups use feature flags.

Moving Databases with ALTER DATABASE

True story

A long time ago, I had to actually do stuff to databases. One thing I had to do was move data files around. Maybe some knucklehead had put system databases on the C: drive, or a LUN was filling up, or we got a new LUN. You know, whatever. Natural curiosity oft leads one to the internet. If one does not succumb to food and cats, one may find useful information. Or anonymous message boards. Sort of a toss up. What I found was this article. Weird, right? 2009. Brent said to use ALTER DATABASE. It’s new and pretty and smart people do it. What Brent didn’t do was explain how it’s done. Or link to how it’s done. I felt cold and alone. Abandoned. Afraid. “Great post, Brent”, I said sarcastically, and set out to figure out how to work this magic on my own.

I turned to BOL, the destination of all self-loathing people. If you scroll down to the bottom, way down at the bottom, the syntax is there. Of course, moving system databases is a horse of a different color. But hopefully you don’t need that one. For user databases, it’s rather more simple:

  1. Alter the file metadata to the new path
  2. Set the database offline
  3. Physically move the file
  4. Set the database back online

Easy enough!

Run ALTER DATABASE with the new location. We’re moving the data file. If we were moving the log file, it would probably end in “_log” or something. You can find all this information in sys.master_files, except where you’re moving the file to. Just don’t actually move it to C:\Whatever. You may run into problems later. Also, you need the filename. If you don’t include it, SQL won’t complain until you try to set the database back online. Yay!

ALTER DATABASE [Sample] MODIFY FILE ( NAME = Sample, FILENAME = 'C:\Whatever\Sample.mdf' );

This is the part that you need to think through. People have to be cool with the database being offline while you move the physical file. This is not a seamless transition. If you’re moving large enough databases, you may want to consider an alternate method, like Mirroring or Log Shipping. They take more work, but you get the whole near-zero-downtime thing out of it. You may want to stage a mock file move to test LUN to LUN copy speeds. See how many GB you can move per minute. That way you’ll at least be able to estimate how long the outage will last. Assuming all that is cool, go ahead and take the database offline.

ALTER DATABASE [Sample] SET OFFLINE;

Now you gotta hurry up and get that file moved. How you do that is up to you. You may prefer to just use Windows Explorer, since it has a status bar, and tells you copy speeds. Good stuff to know if people ask for updates, right? Just to fill space, here’s a PowerShell command. I still hate PowerShell.

Move-Item -Path "D:\Data\Sample.mdf" -Destination "C:\Whatever" -Force

Once that finishes, put your database back online.

ALTER DATABASE [Sample] SET ONLINE;

If you find yourself having to do this often, or if you have to migrate a group of databases, it’s probably worth scripting out.

There you have it

It’s that easy to do. Just make sure you have adequate backups, in case something goes wrong. I take no responsibility for what happens to your data files when they copy across your SAN, or anywhere else.

Thanks for reading!

How often should I run DBCC CHECKDB?

There’s an old DBA saying…

May you already have a backup restored
A half hour before your boss knows there’s corruption

What? There’s no such thing as old DBA sayings? Well, maybe if you all said something other than “no” once in a while, you’d be more quotable. Hmpf.

Anyway, this is a serious question! And there are a lot of things to consider

  • Do I have a different RTO for corruption?
  • What’s my backup retention policy?
  • How much data do I have?
  • How long are my maintenance windows?
  • Do I have a server I can offload checks to?

Recovery Time Objectification

When you’re setting these numbers with management, you need to make them aware that certain forms of corruption are more serious than others, and may take longer to recover from. If system tables or clustered indexes become corrupt, you’re potentially looking at a much more invasive procedure than if a nonclustered index gets a little wonky — something you can disable and rebuild pretty easily.

Either way, you’re looking at an RTO of at least how long it takes you to restore your largest database, assuming the corruption isn’t present in your most recent full backup. That’s why backup checksums are important. They’re not a replacement for regular consistency checks by any means, but they can provide an early warning for some types of page corruption, if you have page verification turned on, and your page is assigned a checksum.

If you use a 3rd party backup tool that doesn’t allow you to use the backup checksum option, stop using it. Seriously, that’s garbage. And turn on Trace Flag 3023 until you find a replacement that does.

Notice I’m not talking about RPO here. But there’s a simple equation you can do: the shorter your RTO for corruption, the longer your RPO. It’s real easy to run repair with allow data loss immediately. The amount of data you lose in doing so is ¯\_(?)_/¯

Which is why you need to carefully consider…

Backup retention

The shorter the period of time you keep backups, the more often you need to run DBCC CHECKDB. If you keep data for two weeks, weekly is a good starting point. If you take weekly fulls, you should consider running your DBCC checks before those happen. A corrupt backup doesn’t help you worth a lick. Garbage backup, garbage restore. If your data only goes back two weeks, and your corruption goes back a month, best of luck with your job search.

Of course, keeping backups around for a long time is physically impossible depending on…

How much data YOU have

The more you have, the harder it is to check it all. It’s not like these checks are a lightweight process. They chew up CPU, memory, disk I/O, and tempdb. They don’t cause blocking, the way a lot of people think they do, because they take the equivalent of a database snapshot to perform the checks on. It’s transactionally consistent, meaning the check is as good as your database was when the check started.

You can make things a little easier by running with the PHYSICAL ONLY option, but you lose out on some of the logical checks. The more complicated process is to break DBCC checks into pieces and run them a little every night. This is harder, but you stand a better chance of getting everything checked.

Especially if you have terabytes and terabytes of data, and really a short…

Maintenance window

Are you 24×7? Do you have nights or weekends to do this stuff? Are you juggling maintenance items alongside data loads, reports, or other internal tasks? Your server may have a different database for different customer locations, which means you have a revolving maintenance window for each zone (think North America, Europe, APAC, etc.), so at best you’re just spreading the pain around.

Or you could start…

Offloading checks

This is my absolute favorite. Sure, it can be a bear to script out yourself. Automating rotating backups and restores can be a nightmare; so many different servers with different drive letters.

Dell LiteSpeed has been automating this process since at least version 7.4, and it’s not like it costs a lot. For sure, it doesn’t cost more than you losing a bunch of data to corruption. If you’re the kind of shop that has trouble with in-place DBCC checks, it’s totally worth the price of admission.

But what about you?

Tell me how you tackle DBCC checks in the comments. You can answer the questions at the beginning of the post, or ask your own questions. Part of my job is to help you keep your job.

Thanks for reading!

Brent says: if you’re using NetApp SAN snapshots, they’ve also got great tooling to offload corruption checks to your DR site. Licensing gotchas may apply – for both SQL Server and NetApp writeable snaps.

No but really, how big should my log file be?

Most of you are going to hate this

And TL;DR, there’s a script at the end of the post. But like The Monster At The End Of This Book, it’s worth it not to skip the middle.

There are about a billion but-what-ifs that could come into play. I can’t possibly answer all of those for you. But that’s not the point of this post, anyway! If you’re in a special circumstance, using some fancy features, or doing something utterly deranged to your database, this isn’t the post, or script, for you.

I mean really, unless the size of your log file is causing you some dramatic pain, leave it alone. You should probably go invent cold fusion if log file size is the worst issue in your database. Congratulations.

This is also a lousy place to ask me if you can shrink your log file. I have no idea how or why it got that size. There’s free space now because you’re using FULL recovery model and you took a log backup, or you’re in SIMPLE and your database hit a CHECKPOINT. No magic there. It may very well grow to that size again, so shrinking it could be a really dumb idea.

So what’s the point? Lots of people ask me this question: clients, Office Hours attendees, random passerby on the street who recognize me (even without my Robot). I usually give them the same answer and explanation, unless I have ample evidence that their fancy and/or deranged ways require a different estimate.

From the ivory tower

A good STARTING POINT for your log file is twice the size of the largest index in your database, or 25% of the database size. Whichever is larger.

Why?

If the largest object in your database is larger than 25% of your database, you are likely running some type of maintenance. Index rebuilds require the size of the object being rebuilt in log space. I usually rule of thumb twice that space, in case you’re doing anything else while you’re doing that maintenance, like ETL, reports, dragging data to and fro, purging data, whatever. If you’re only ever reorganizing the largest object, you may not need all that space. Are you sure you’re ONLY ever reorganizing that? I’ll wait.

But 25% seems so random!

Well, kinda. but you’re here for a starting point. If you’re not Super DBA and taking baselines and trending your database file sizes over time, random is better than nothing. It buys you some leeway, too.

  • If you miss a log backup (maintenance plans got you down?)
  • If you’re not taking frequent enough log backups (can I interest you in RPO/RTO insurance?)
  • If you run other long/large transactions (SSIS won’t save you)

You’ll have a fair amount of room to do your dirty work. Most sane and rational people consider this to be a positive thing.

But what if my log file still grows?

Well, then you found out you need a bigger log file. Or you need to take log backups more frequently. Perhaps those hourly log backups aren’t working out as you planned, hm?

And if your log file never grows, you’ll look really smart. And you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.

Show me the script already

It’s all right under here. Don’t forget to change the USE statement. All sizes are in GB. If your database is smaller than 1GB, you’re one of those lucky DBAs who can take vacations and stuff. Go do that. Life is short.

If your database is under 1GB, and your log file is over 1GB, start taking log backups. I’m pretty sure you’re not.

USE [StackOverflow] 
--You'll probably want to use your own database here
--Unless you work at Stack Overflow
--No, I'm not writing this to loop through all of your databases

;
WITH    [log_size]
          AS ( SELECT TOP 1
                        SCHEMA_NAME([t].[schema_id]) AS [schema_name] ,
                        [t].[name] AS [table_name] ,
                        [i].[name] ,
                        [p].[rows] AS [row_count] ,
                        CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18,
                                                              2)) AS [index_total_space_gb] ,
                        ( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] ,
                        ( SELECT    ( SUM([mf].[size]) * 8 ) / 1024 / 1024
                          FROM      [sys].[master_files] AS [mf]
                          WHERE     [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] ,
                        ( SELECT    CAST(( SUM([mf].[size]) * 8 ) / 1024
                                    / 1024 AS INT)
                          FROM      [sys].[master_files] AS [mf]
                          WHERE     [mf].[database_id] = DB_ID()
                                    AND [mf].[type_desc] = 'LOG' ) AS [current_log_size_(gb)] ,
                        ( SELECT    CAST(( SUM([mf].[size]) * 8 ) / 1024
                                    / 1024 * .25 AS INT)
                          FROM      [sys].[master_files] AS [mf]
                          WHERE     [mf].[database_id] = DB_ID()
                                    AND [mf].[type_desc] = 'ROWS' ) AS [25%_of_database_(gb)]
               FROM     [sys].[tables] [t]
               INNER JOIN [sys].[indexes] [i]
               ON       [t].[object_id] = [i].[object_id]
               INNER JOIN [sys].[partitions] [p]
               ON       [i].[object_id] = [p].[object_id]
                        AND [i].[index_id] = [p].[index_id]
               INNER JOIN [sys].[allocation_units] [a]
               ON       [p].[partition_id] = [a].[container_id]
               WHERE    [t].[is_ms_shipped] = 0
               GROUP BY SCHEMA_NAME([t].[schema_id]) ,
                        [t].[name] ,
                        [i].[name] ,
                        [p].[rows]
               ORDER BY [index_total_space_gb] DESC)
     SELECT * ,
            CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)]
                 THEN [ls].[largest_index_times_two_(gb)]
                 ELSE [ls].[25%_of_database_(gb)]
            END AS [maybe_this_is_a_good_log_size(gb)]
     FROM   [log_size] AS [ls]
OPTION  ( RECOMPILE );
css.php