Blog

One weird trick for managing a bunch of servers

Let’s face it, most people don’t have just one SQL Server

How many they tell Microsoft they have is another matter, but let the record show that I don’t condone licensing dishonesty. But going one step further, most places… Well, they’re ‘lucky’ if they have one DBA, never mind a team.

Everyone else: Give me your network people, your sysadmin, your huddled SAN group yearning to breathe free, the wretched refuse of your teeming developers.

Doing things on one server is aggravating enough. Doing things on a bunch of servers is even worse. Given some of today’s HA/DR features (I’m looking at you, Availability Groups, with your lack of a mechanism to sync anything outside of user databases. Rude.) people are more and more likely to have lots of SQL Servers that they need to tend to.

Sometimes just keeping track of them is impossible. If you’re one guy with 20 servers, have fun scrolling through the connection list in SSMS trying to remember which one is which. Because people name things well, right? Here’s SQLVM27\Instance1, SQLVM27\Instance2, SQLVM27\Instance3, and that old legacy accounting database is around here somewhere.

Register it and forget it

But don’t actually forget it. If you forget it and it goes offline, people will look at you funny. Turns out people don’t like offline servers much.

So what’s someone to do with all these servers? Register them! Hidden deep in the View menu of SSMS is the Registered Servers window

Hi there, handsome.

Hi there, handsome.

It will look pretty barren at first, just an empty folder. But you’ll fill it up quick, I’m sure. Can never have enough servers around, you know.

It’s pretty easy to populate, you can right click on the Local Server Group folder, or on servers you’re connected to in Object Explorer.

This

This

That

That

 

 

 

 

 

 

 

Either way, you get the same dialog box to add a server in. You can give it a friendly name if you want! Maybe WIN03-SQL05\Misc doesn’t tell a good story.

Joy of joys

Joy of joys

And if you hip and hop over to the Connection Properties tab, you can set all sorts of nifty stuff up. The biggest one for me was to give different types of servers different colored tabs that the bottom of SSMS is highlighted with. It’s the one you’re probably looking at now that’s a putrid yellow-ish color and tells you you’re connected and that your query has been executing for three hours. Reassuring. Anyway, I’d use this to differentiate dev from prod servers. Just make sure to choose light colors, because the black text doesn’t show up on dark colors too well.

Wonder of wonders

Wonder of wonders

Another piece of advice here is not to mix servers on different major (and sometimes minor) versions. The reason is that this feature gives you the ability to query multiple servers at once. If you’re looking at DMVs, they can have different columns in them, and you’ll just get an error. Even a simple query to sys.databases will throw you a bonk between 2012 and 2014.

By the planets!

By the planets!

I changed my mind. I hate planets.

I changed my mind. I hate planets.

Even if you’re running 2008R2, there are some pretty big differences in DMVs between SP1 and SP3. Microsoft has been known to change stuff in CUs (I’m looking at you, Extended Events).

On the plus side, you can use your multi-server connection to SELECT @@VERSION to help you decide how you should group them. If they have something better in common, like participating in Log Shipping, Mirroring, an AG, etc., all the better.

Insight

Insight

But my favorite thing, because I was a devotee to the Blitz line of stored procedures even before I got paid to like them, was that I could install them on ALL OF MY SERVERS AT ONCE! This was especially useful when updates came out. You know what it’s like to put a stored proc on 20 servers one at a time? Geeeeeet outta here!

Peanuts.

Peanuts.

Check that out. It’s on both of my servers. At once. That means simultaneously, FYI. If you have a DBA or Admin database that you keep on all your servers to hold your fancy pants scripts and tools, this is an awesome way to make sure they all have the latest and greatest.

You’re already better at your job

Even though this feature came out in 2008, I hardly see anyone using it. I found it really helpful comparing indexes and query plans across app servers that held different client data across them. It also exposes far less than Linked Servers; you need to worry less about access and level of privilege.

Just don’t forget to export your list if you change laptops!

Thanks for reading!

Psst… Hey buddy, you want some E-Discovery?

One year ago today!

Well, sort of one year ago. Who knows when this thing will get published? Only God and Brent. And part of the calendar year fell during a leap year, which was just plain cruel, like a two part Buffy episode where the second part is the first episode of the next season DAMN YOU JOSS WHEDON!

Anyway, I started working here, annoying you guys with blog posts, giving Doug rub-downs in between video takes, and walking Ernie when the Ozar family was too full of illegal caviar and albino truffles to move. I also started running down the clock on being able to work with my favorite piece of software again. You can probably guess.

Relativity!

kcura_relativitySeriously, I love this thing. Not just because many of the databases I worked with under the software were hundreds of gigs, on up to 9 terabytes, but because the people behind the software really do care about the product. The customer support is aces (Hello, Pod One), and the developers are super helpful and responsive.

Plus, it’s just plain interesting. You give lawyers this crazy interface that lets them build just about any search query they can dream of, including some really, really bad ones, and see how SQL Server reacts.

If you’re a DBA who has read the execution plan of a Relativity query where saved searches reference saved searches that reference saved searches that… you get the point! I feel your pain.

It’s not just the hardware

You can’t fix every saved search and workflow immediately, which makes right-sizing hardware super important, but that’s not the only thing. Every case is different, and they often need custom indexes.

If you’re a DBA who has watched performance tank because some new search suddenly started scanning the clustered index of your 50 million row, one terabyte Documents table with a wildcard search on Email Body and Email Subject and Email Sender and Email Recipients and Email Metadata for ‘insert super common word here’, I feel your pain.

Best in Service

My other favorite part about Relativity is that they have standards. Not last call for alcohol standards, either. They keep you, as a DBA, honest. No backups? Ding. No DBCC CHECKDB? Ding. Perf in the tank? Ding.

The challenges that you’re presented with at scale are immense. You have 100 terabytes of data and you need to check it for corruption weekly. How’s that gonna work?

Index and statistics maintenance can be super important, too. Fragmentation may not matter if you’re reading 1500 pages instead of 1000 pages, but it can sure as heck matter when you’re reading 1,500,000 pages rather than 1,000,000 pages. And all those ascending keys? SQL Server is gonna make some real bad judgement calls on those, especially prior to 2014.

It’s a wild, wild life

You have users bulk loading thousands to millions of documents, users updating records during review, and then searches running on top of all that.

I am thrilled to be able to work with my favorite product again. If you’re experiencing Relativity pains, drop us a line.

Thanks for reading!

Brent says: For a refresher on this app and how we work with it, check out our past posts on The SQL Server Components of kCura Relativity, Performance Tuning kCura Relativity, Using Partitioning to Make kCura Relativity Faster, and Tiering Relativity Databases.

Changes to auto update stats thresholds in SQL Server 2016

TL;DR

As of CTP 3.3, it’s the same behavior as Trace Flag 2371, in 2008 R2 SP1 and onward. That basically means that the bigger your table is, the fewer rows need to be modified before an automatic statistics update occurs.

Slightly longer…

The change was announced over here. At first I thought, woah, cool, they thought about this and made big changes. But no, much like Trace Flags 1117 and 1118 being enabled for tempdb, it’s just…

Remember in Mortal Kombat, when getting to fight Reptile made you cool? Then Mortal Kombat 2 came out, and he was a playable character, and everyone would call your wins cheap if you picked him? That’s sort of what this reminds me of. If you’re new to SQL, you probably won’t appreciate the differences these Trace Flags make. If you’ve been using it for a while, you’ll probably start sentences with “back in my day, we had to add startup parameters…” and chuff off to write miserably long blog posts about unique indexes.

As of 02/23/2016, it sounds like Trace Flag 8048 is also enabled by default in 2016. See quote about soft NUMA at the link.

Moderate testing

I ran tests on some fairly large tables. I tried to run them on tables from 100 million to 1 billion rows, but I blew out the data drive of our AWS instance. So, uh, if you have a bigger server to test stuff out on, be my guest.

The basic concept was:

  1. Load a bunch data into a table
  2. Update it 1000 rows at a time (I know, I know, but updating less than that took FOREVER)
  3. Run a query against it to invalidate stats
  4. If they reset, add a bunch more data and start over

What I ended up with was, well…

Eighth place.

Eighth place.

Here’s an abridged version of 10-20 million and 30-40 million rows, and how many modifications they took before a stats update occurred. If you follow the PercentMod column down, the returns diminish a bit the higher up you get. I’m not saying that I’d prefer to wait for 20% + 500 rows to modify, by any stretch. My only point here is that there’s not a set percentage to point to.

And, because you’re probably wondering, turning on Trace Flag 2371 in 2016 doesn’t make any difference. Here’s what 10-100 million look like, in 10 million row chunks.

And then SQL crashed.

And then SQL crashed.

If you can guess which side TF 2371 was on for, I’ll give you one merlin dollhairs.

Great expectations

This improvement is certainly welcome as a default, though it’s not all that ‘new’. My 2014 instance comes up with the same thresholds with 2371 enabled. Unless you’re working with pretty big tables, or used to managing statistics updates on your own, you likely won’t even notice the change.

Thanks for reading!

Brent says: It’s kinda like Microsoft is treating trace flags as alpha/beta tests for new features now. That’s right in line with how .com startups use feature flags.

Moving Databases with ALTER DATABASE

True story

A long time ago, I had to actually do stuff to databases. One thing I had to do was move data files around. Maybe some knucklehead had put system databases on the C: drive, or a LUN was filling up, or we got a new LUN. You know, whatever. Natural curiosity oft leads one to the internet. If one does not succumb to food and cats, one may find useful information. Or anonymous message boards. Sort of a toss up. What I found was this article. Weird, right? 2009. Brent said to use ALTER DATABASE. It’s new and pretty and smart people do it. What Brent didn’t do was explain how it’s done. Or link to how it’s done. I felt cold and alone. Abandoned. Afraid. “Great post, Brent”, I said sarcastically, and set out to figure out how to work this magic on my own.

I turned to BOL, the destination of all self-loathing people. If you scroll down to the bottom, way down at the bottom, the syntax is there. Of course, moving system databases is a horse of a different color. But hopefully you don’t need that one. For user databases, it’s rather more simple:

  1. Alter the file metadata to the new path
  2. Set the database offline
  3. Physically move the file
  4. Set the database back online

Easy enough!

Run ALTER DATABASE with the new location. We’re moving the data file. If we were moving the log file, it would probably end in “_log” or something. You can find all this information in sys.master_files, except where you’re moving the file to. Just don’t actually move it to C:\Whatever. You may run into problems later. Also, you need the filename. If you don’t include it, SQL won’t complain until you try to set the database back online. Yay!

ALTER DATABASE [Sample] MODIFY FILE ( NAME = Sample, FILENAME = 'C:\Whatever\Sample.mdf' );

This is the part that you need to think through. People have to be cool with the database being offline while you move the physical file. This is not a seamless transition. If you’re moving large enough databases, you may want to consider an alternate method, like Mirroring or Log Shipping. They take more work, but you get the whole near-zero-downtime thing out of it. You may want to stage a mock file move to test LUN to LUN copy speeds. See how many GB you can move per minute. That way you’ll at least be able to estimate how long the outage will last. Assuming all that is cool, go ahead and take the database offline.

ALTER DATABASE [Sample] SET OFFLINE;

Now you gotta hurry up and get that file moved. How you do that is up to you. You may prefer to just use Windows Explorer, since it has a status bar, and tells you copy speeds. Good stuff to know if people ask for updates, right? Just to fill space, here’s a PowerShell command. I still hate PowerShell.

Move-Item -Path "D:\Data\Sample.mdf" -Destination "C:\Whatever" -Force

Once that finishes, put your database back online.

ALTER DATABASE [Sample] SET ONLINE;

If you find yourself having to do this often, or if you have to migrate a group of databases, it’s probably worth scripting out.

There you have it

It’s that easy to do. Just make sure you have adequate backups, in case something goes wrong. I take no responsibility for what happens to your data files when they copy across your SAN, or anywhere else.

Thanks for reading!

How often should I run DBCC CHECKDB?

There’s an old DBA saying…

May you already have a backup restored
A half hour before your boss knows there’s corruption

What? There’s no such thing as old DBA sayings? Well, maybe if you all said something other than “no” once in a while, you’d be more quotable. Hmpf.

Anyway, this is a serious question! And there are a lot of things to consider

  • Do I have a different RTO for corruption?
  • What’s my backup retention policy?
  • How much data do I have?
  • How long are my maintenance windows?
  • Do I have a server I can offload checks to?

Recovery Time Objectification

When you’re setting these numbers with management, you need to make them aware that certain forms of corruption are more serious than others, and may take longer to recover from. If system tables or clustered indexes become corrupt, you’re potentially looking at a much more invasive procedure than if a nonclustered index gets a little wonky — something you can disable and rebuild pretty easily.

Either way, you’re looking at an RTO of at least how long it takes you to restore your largest database, assuming the corruption isn’t present in your most recent full backup. That’s why backup checksums are important. They’re not a replacement for regular consistency checks by any means, but they can provide an early warning for some types of page corruption, if you have page verification turned on, and your page is assigned a checksum.

If you use a 3rd party backup tool that doesn’t allow you to use the backup checksum option, stop using it. Seriously, that’s garbage. And turn on Trace Flag 3023 until you find a replacement that does.

Notice I’m not talking about RPO here. But there’s a simple equation you can do: the shorter your RTO for corruption, the longer your RPO. It’s real easy to run repair with allow data loss immediately. The amount of data you lose in doing so is ¯\_(?)_/¯

Which is why you need to carefully consider…

Backup retention

The shorter the period of time you keep backups, the more often you need to run DBCC CHECKDB. If you keep data for two weeks, weekly is a good starting point. If you take weekly fulls, you should consider running your DBCC checks before those happen. A corrupt backup doesn’t help you worth a lick. Garbage backup, garbage restore. If your data only goes back two weeks, and your corruption goes back a month, best of luck with your job search.

Of course, keeping backups around for a long time is physically impossible depending on…

How much data YOU have

The more you have, the harder it is to check it all. It’s not like these checks are a lightweight process. They chew up CPU, memory, disk I/O, and tempdb. They don’t cause blocking, the way a lot of people think they do, because they take the equivalent of a database snapshot to perform the checks on. It’s transactionally consistent, meaning the check is as good as your database was when the check started.

You can make things a little easier by running with the PHYSICAL ONLY option, but you lose out on some of the logical checks. The more complicated process is to break DBCC checks into pieces and run them a little every night. This is harder, but you stand a better chance of getting everything checked.

Especially if you have terabytes and terabytes of data, and really a short…

Maintenance window

Are you 24×7? Do you have nights or weekends to do this stuff? Are you juggling maintenance items alongside data loads, reports, or other internal tasks? Your server may have a different database for different customer locations, which means you have a revolving maintenance window for each zone (think North America, Europe, APAC, etc.), so at best you’re just spreading the pain around.

Or you could start…

Offloading checks

This is my absolute favorite. Sure, it can be a bear to script out yourself. Automating rotating backups and restores can be a nightmare; so many different servers with different drive letters.

Dell LiteSpeed has been automating this process since at least version 7.4, and it’s not like it costs a lot. For sure, it doesn’t cost more than you losing a bunch of data to corruption. If you’re the kind of shop that has trouble with in-place DBCC checks, it’s totally worth the price of admission.

But what about you?

Tell me how you tackle DBCC checks in the comments. You can answer the questions at the beginning of the post, or ask your own questions. Part of my job is to help you keep your job.

Thanks for reading!

Brent says: if you’re using NetApp SAN snapshots, they’ve also got great tooling to offload corruption checks to your DR site. Licensing gotchas may apply – for both SQL Server and NetApp writeable snaps.

No but really, how big should my log file be?

Most of you are going to hate this

And TL;DR, there’s a script at the end of the post. But like The Monster At The End Of This Book, it’s worth it not to skip the middle.

There are about a billion but-what-ifs that could come into play. I can’t possibly answer all of those for you. But that’s not the point of this post, anyway! If you’re in a special circumstance, using some fancy features, or doing something utterly deranged to your database, this isn’t the post, or script, for you.

I mean really, unless the size of your log file is causing you some dramatic pain, leave it alone. You should probably go invent cold fusion if log file size is the worst issue in your database. Congratulations.

This is also a lousy place to ask me if you can shrink your log file. I have no idea how or why it got that size. There’s free space now because you’re using FULL recovery model and you took a log backup, or you’re in SIMPLE and your database hit a CHECKPOINT. No magic there. It may very well grow to that size again, so shrinking it could be a really dumb idea.

So what’s the point? Lots of people ask me this question: clients, Office Hours attendees, random passerby on the street who recognize me (even without my Robot). I usually give them the same answer and explanation, unless I have ample evidence that their fancy and/or deranged ways require a different estimate.

From the ivory tower

A good STARTING POINT for your log file is twice the size of the largest index in your database, or 25% of the database size. Whichever is larger.

Why?

If the largest object in your database is larger than 25% of your database, you are likely running some type of maintenance. Index rebuilds require the size of the object being rebuilt in log space. I usually rule of thumb twice that space, in case you’re doing anything else while you’re doing that maintenance, like ETL, reports, dragging data to and fro, purging data, whatever. If you’re only ever reorganizing the largest object, you may not need all that space. Are you sure you’re ONLY ever reorganizing that? I’ll wait.

But 25% seems so random!

Well, kinda. but you’re here for a starting point. If you’re not Super DBA and taking baselines and trending your database file sizes over time, random is better than nothing. It buys you some leeway, too.

  • If you miss a log backup (maintenance plans got you down?)
  • If you’re not taking frequent enough log backups (can I interest you in RPO/RTO insurance?)
  • If you run other long/large transactions (SSIS won’t save you)

You’ll have a fair amount of room to do your dirty work. Most sane and rational people consider this to be a positive thing.

But what if my log file still grows?

Well, then you found out you need a bigger log file. Or you need to take log backups more frequently. Perhaps those hourly log backups aren’t working out as you planned, hm?

And if your log file never grows, you’ll look really smart. And you’ll never have to wait for your log file to expand. They don’t benefit from Instant File Initialization the way data files do.

Show me the script already

It’s all right under here. Don’t forget to change the USE statement. All sizes are in GB. If your database is smaller than 1GB, you’re one of those lucky DBAs who can take vacations and stuff. Go do that. Life is short.

If your database is under 1GB, and your log file is over 1GB, start taking log backups. I’m pretty sure you’re not.

USE [StackOverflow] 
--You'll probably want to use your own database here
--Unless you work at Stack Overflow
--No, I'm not writing this to loop through all of your databases

;
WITH    [log_size]
          AS ( SELECT TOP 1
                        SCHEMA_NAME([t].[schema_id]) AS [schema_name] ,
                        [t].[name] AS [table_name] ,
                        [i].[name] ,
                        [p].[rows] AS [row_count] ,
                        CAST(( SUM([a].[total_pages]) * 8. ) / 1024. / 1024. AS DECIMAL(18,
                                                              2)) AS [index_total_space_gb] ,
                        ( SUM([a].[total_pages]) * 8 ) / 1024 / 1024 * 2 AS [largest_index_times_two_(gb)] ,
                        ( SELECT    ( SUM([mf].[size]) * 8 ) / 1024 / 1024
                          FROM      [sys].[master_files] AS [mf]
                          WHERE     [mf].[database_id] = DB_ID() ) AS [database_size_(gb)] ,
                        ( SELECT    CAST(( SUM([mf].[size]) * 8 ) / 1024
                                    / 1024 AS INT)
                          FROM      [sys].[master_files] AS [mf]
                          WHERE     [mf].[database_id] = DB_ID()
                                    AND [mf].[type_desc] = 'LOG' ) AS [current_log_size_(gb)] ,
                        ( SELECT    CAST(( SUM([mf].[size]) * 8 ) / 1024
                                    / 1024 * .25 AS INT)
                          FROM      [sys].[master_files] AS [mf]
                          WHERE     [mf].[database_id] = DB_ID()
                                    AND [mf].[type_desc] = 'ROWS' ) AS [25%_of_database_(gb)]
               FROM     [sys].[tables] [t]
               INNER JOIN [sys].[indexes] [i]
               ON       [t].[object_id] = [i].[object_id]
               INNER JOIN [sys].[partitions] [p]
               ON       [i].[object_id] = [p].[object_id]
                        AND [i].[index_id] = [p].[index_id]
               INNER JOIN [sys].[allocation_units] [a]
               ON       [p].[partition_id] = [a].[container_id]
               WHERE    [t].[is_ms_shipped] = 0
               GROUP BY SCHEMA_NAME([t].[schema_id]) ,
                        [t].[name] ,
                        [i].[name] ,
                        [p].[rows]
               ORDER BY [index_total_space_gb] DESC)
     SELECT * ,
            CASE WHEN [ls].[largest_index_times_two_(gb)] > [ls].[25%_of_database_(gb)]
                 THEN [ls].[largest_index_times_two_(gb)]
                 ELSE [ls].[25%_of_database_(gb)]
            END AS [maybe_this_is_a_good_log_size(gb)]
     FROM   [log_size] AS [ls]
OPTION  ( RECOMPILE );

A funny thing happened on my way to set up Mirroring…

I’ve set up Mirroring about a billion times

I’m not bragging about that. I’d rather say that I set up a billion AGs, and not one of them ever failed. But then I’d be lying to you; those things fail like government programs. One thing I’d never done, though, is set up Mirroring with a Witness. I never wanted automatic failover, because it’s only one database at a time. If for some reason one database out of all that I had mirrored ever turned Ramblin’ Man and failed over to another server, there would understandably be some application consternation. Not to mention any maintenance and internal operations. They don’t react well to sudden database unavailability.

Of course, doing anything for the first time is horrible. Just ask my second wife.

Here’s where things got awkward

I have my databases! This is my top secret development environment. Stack Overflow is in an AG, and I had set up two other Mirrors: one synch and one asynch. I wanted to have a variety of setups to test some scripts against.

Everything looks good!

Everything looks good!

Alright, let’s set up Mirroring…

Configuring stuff is cool, right?

Configuring stuff is cool, right?

Yeah yeah next next next

Yeah yeah next next next

Service accounts whatever BORING

Service accounts whatever BORING

GREEN LIGHT GO!

GREEN LIGHT GO!

This is so easy. Seriously. Why doesn’t everyone do this? Why do you complicate your short, short lives with Availability Groups? Are they AlwaysOn? Are they Always On? WHO KNOWS? Not even Microsoft.

I'm hitting this button and jumping into a mile of Laphroaig.

I’m hitting this button and jumping into a mile of Laphroaig.

HIGH FIVES ALL ARO-

Y THO?

Y THO?

This is the error text:

The ALTER DATABASE command could not be sent to the remote server instance 'TCP://ORACLEDB.darling.com:5022'. The database mirroring configuration was not changed. Verify that the server is connected, and try again.

Super Sleuth

Alright, that’s silly. I used the GUI. Instead of going to bed I’ll spend some time checking all my VM network settings. BRB.

I’m back. They were all correct. I could ping and telnet and set up linked servers and RDP. What in the name of Shub-Niggurath is going on with this thing?

I can even see the Endpoint! So close, and yet so far~~

The Endpoint is showing up on the Witness and what is this?

The Endpoint is showing up on the Witness and what is this?

Where are we now?

This is a good time for a quick recap

  1. Mirroring is up and running synchronously
  2. The endpoint is configured on the witness
  3. We get an error when we try to connect the witness

TO THE ERROR LOG!

I should have done this hours ago.

I should have done this hours ago.

Well whaddya know? That’s a really good clue. Encryption and stuff. There’s no compatible algorithm. Ain’t that somethin’? You’d think that Microsoft would be cool about setting up the same kind of encryption across all the different Endpoints, if using different encryption would cause the setup to fail. Right guys? Heh. Right? Hey, hello?

NOPE.

Alright, let’s see what I need to be a matchmaker.

The Cure - Primary

The Cure – Primary

Oh. AES. Okay. Cool. Thanks.

Oh. AES. Okay. Cool. Thanks.

RC4 WHY WOULD YOU DO THAT?

RC4 WHY WOULD YOU DO THAT?

Since we have them both scripted out already, let’s just drop and re-create the Witness Endpoint with the right encryption algorithm.

INTO THE TREES

INTO THE TREES

That did not result in a forest fire. I’m hopeful. Sort of. It’s been a long night and I think I can see tomorrow from here.

Scriptin'

Scriptin’

Meanwhile, back on the Primary…

I AM SO SMRT

I AM SO SMRT

It worked! Now I have a Witness, and I can shut all my VMs down. That was so much fun.

What did we learn?

Microsoft hates you and doesn’t want you to sleep. Just kidding. Mostly. But seriously, why would they do that?

It mostly goes to show that it’s always a smart idea to use that little script button at the top of (most) GUIs in SSMS. Who knows what kind of foolishness you’ll find? A little reading can save you a lot of time troubleshooting errors that make you feel insane.

Thanks for reading!

Another reason why scalar functions in computed columns is a bad idea

As if there weren’t enough reasons

In my last blog post I talked about different things that cause plans, or zones of plans, to execute serially. One of the items I covered was computed columns that reference scalar functions. We know that they’ll make queries go parallel, but what about other SQL stuff?

Oh no my index is fragmented

If you’re running Expensive Edition, index rebuilds can be both online and parallel. That’s pretty cool, because it keeps all your gadgets and gizmos mostly available during the whole operation, and the parallel bit usually makes things faster.

That is, unless you have a computed column in there that references a scalar function. I decided to write my test function to not perform any data access so it could be persisted. It’s dead simple, and I’m tacking it on to a column in the PostLinks table of the Stack Overflow database.

CREATE FUNCTION dbo.PIDMultiplier (@pid int)
RETURNS INT
    WITH RETURNS NULL ON NULL INPUT,
         SCHEMABINDING
AS
    BEGIN
        DECLARE @Out BIGINT;
        SELECT  @Out = @pid * 2
        RETURN @Out;
    END;
GO

ALTER TABLE [dbo].[PostLinks]
ADD [Multiplied] AS dbo.[PIDMultiplier]([PostId]) PERSISTED

For this one, all we have to do is turn on actual execution plans and rebuild the index, then drop the column and rebuild again.

ALTER TABLE [dbo].[PostLinks] REBUILD WITH (ONLINE = ON)

ALTER TABLE [dbo].[PostLinks] DROP COLUMN [Multiplied]

Here are my execution plans. The rebuild I ran when the table had my computed column in it stayed serial.

Hi, I'm garbage.

Hi, I’m garbage.

Parallel, sans computed column:

Dude, you're getting a parallel.

Dude, you’re getting a parallel.

But there’s a bigger fish in the pond

Probably the most important maintenance item you should be doing, aside from backups, is running DBCC CHECKDB. Seriously, if you’re not doing them both, start today. Ola Hallengren has basically done all the work for you. Back when I had a real job, I used his scripts everywhere.

Before we were so rudely interrupted by a soap box, we were talking about parallelism. This part was a little bit more complicated, but don’t worry, you don’t have to follow along. Just look at the pretty pictures. Sleep now. Yes. Sleep.

The first couple times I tried, the DBCC check never went parallel. Since I’m on my laptop, and not a production server, I can set Cost Threshold for Parallelism to 0. You read that right, ZE-RO! Hold onto your drool dish.

With that set, I fire up Ye Olde Oaken sp_BlitzTrace so I can capture everything with Extended Events. You’ll need all three commands, but you’ll probably have to change @SessionId, and you may have to change @TargetPath. Run the first command to start your session up.

EXEC [dbo].[sp_BlitzTrace] @SessionId = 61 , @Action = 'start' , @TargetPath = 'c:\temp\' , @TraceParallelism = 1 , @TraceExecutionPlansAndKillMyPerformance = 1 

EXEC [dbo].[sp_BlitzTrace] @Action = 'stop' 

EXEC [dbo].[sp_BlitzTrace] @Action = 'read'

With that running, toss in your DBCC command. I’m only using DBCC CHECKTABLE here to simplify. Rest assured, if you run DBCC CHECKDB, the CHECKTABLE part is included. The only checks that DBCC CHECKDB doesn’t run are CHECKIDENT and CHECKCONSTRAINT. Everything else is included.

DBCC CHECKTABLE('dbo.PostLinks') WITH NO_INFOMSGS, ALL_ERRORMSGS

ALTER TABLE [dbo].[PostLinks]
ADD [Multiplied] AS dbo.[PIDMultiplier]([PostId]) PERSISTED

Run DBCC CHECKTABLE, add the computed column back, and then run it again. When those finish, run the sp_BlitzTrace commands to stop and read session data. You should see execution plans for each run, and they should be way different.

Hell Yeah.

Hell Yeah.

Hell No.

Hell No.

So even DBCC checks are serialized. Crazy, right? I’d been hearing about performance hits to varying degrees when running DBCC checks against tables with computed columns for a while, but never knew why. There may be a separate reason for regular computed columns vs. ones that reference scalar functions. When I took the equivalent SQL out of a function, the DBCC check ran parallel.

ALTER TABLE [dbo].[PostLinks]
ADD [Multiplied] AS [PostId] * 2 PERSISTED

Of course, those online index rebuilds running single threaded might be a blessing in disguise, if you haven’t patched SQL recently.

I don’t have much of a grand closing paragraph here. These things can seriously mess you up for a lot of reasons. If you’re a vendor, please get away from using scalar functions, and please please don’t use them in computed columns.

Thanks for reading!

Still Serial After All These Years

With each new version of SQL comes a slew of new stuff

While some changes are cosmetic, others bewildering, and the rest falling somewhere between “who cares about JSON?” and “OH MY GOD TAKE MY MONEY!”, but not really my money, because I only buy developer edition. Aaron Bertrand has done a better job finding, and teaching you how to find new features than I could. Head over to his blog if you want to dive in.

What I wanted to look at was something much more near and dear to my heart: Parallelism.

(It’s so special I capitalized it.)

In the past, there were a number of things that caused entire plans, or sections of plans, to be serial. Scalar UDFs are probably the first one everyone thinks of. They’re bad. Really bad. They’re so bad that if you define a computed column with a scalar UDF, every query that hits the table will run serially even if you don’t select that column. So, like, don’t do that.

What else causes perfectly parallel plan performance plotzing?

Total:

  • UDFs (Scalar, MSTVF)
  • Modifying table variables

Zone:

  • Backwards scan
  • Recursive CTE
  • TOP
  • Aggregate
  • Sequence
  • System tables

Other:

  • CLR functions (that perform data access)
  • Dynamic cursors
  • System functions

We’ll just look at the items from the first two lists. The stuff in Other is there because I couldn’t write a CLR function if I had Adam Machanic telling me what to type, cursors don’t need any worse of a reputation, and it would take me a year of Sundays to list every internal function and test it.

I’m going to depart from my normal format a bit, and put all the code at the end. It’s really just a mess of boring SELECT statements. The only thing I should say up front is that I’m leaning heavily on the use of an undocumented Trace Flag: 8649. I use it because it ‘forces’ parallelism by dropping the cost threshold for parallelism to 0 for each query. So if a parallel plan is possible, we’ll get one. Or part of one. You get the idea.

Just, you know, don’t use it in production unless you really know what you’re doing. It’s pretty helpful to use as a developer, on a development server, to figure out why queries aren’t going parallel. Or why they’re partially parallel.

All of this was run on 2016 CTP 3.1, so if RTM comes along, and something here is different, that’s why. Of course, backwards scans are probably close to 15 years old, so don’t sit on your thumbs waiting for them to get parallel support.

Backwards scan!

This is what happens when your ORDER BY is the opposite of your index.

This is what happens when your ORDER BY is the opposite of your index.

Scalar!

Not only do they run serial, but they run once for every row returned. Have a nice day!

Not only do they run serial, but they run once for every row returned. Have a nice day!

Table with computed column

Hint: the Id column isn't the computed one.

Hint: the Id column isn’t the computed one.

MSTVF

Multi-statement Table Valued Garbage

Multi-statement Table Valued Garbage

Table Variable Insert

You probably can't name one good reason to use a table variable.

You probably can’t name one good reason to use a table variable.

Top

Top o' the nothin!

Top o’ the nothin’!

Aggregating

You're making me wanna use Excel, here.

You’re making me wanna use Excel, here.

Row Number (or any windowing/ranking function)

You're gonna get all those row numbers one by one.

You’re gonna get all those row numbers one by one.

Accessing System Tables

Twisted SYSter

Twisted SYS-ter

The recursive part of a recursive CTE

This part was fun. I liked this part.

This part was fun. I liked this part.

Picture time is over

Now you get to listen to me prattle on and on about how much money you’re wasting on licensing by having all your queries run serially. Unless you have a SharePoint server; then you have… many other problems. If I had to pick a top three list that I see people falling victim to regularly, it would be:

  1. Scalar functions
  2. Table variables
  3. Unsupported ORDER BYs

They’re all relatively easy items to fix, and by the looks of it, we’ll be fixing them on SQL Server 2016 as well. Maybe Query Store will make that easier.

Thanks for reading!

Das Code

USE [StackOverflow];
SET NOCOUNT ON;
GO

/*Scalar*/
CREATE FUNCTION [dbo].[ScalarValueReturner] ( @id INT )
RETURNS INT
       WITH RETURNS NULL ON NULL INPUT,
            SCHEMABINDING
AS
    BEGIN
        RETURN @id * 1;
    END;
GO

/*MSTVF*/
CREATE FUNCTION [dbo].[MSTVFValueReturner] ( @id INT )
RETURNS @Out TABLE ( [Returner] INT )
       WITH SCHEMABINDING
AS
    BEGIN
        INSERT  INTO @Out
                ( [Returner] )
        SELECT  @id;
        RETURN;
    END;
GO

SELECT  [Id]
FROM    [dbo].[Users]
WHERE   [Id] BETWEEN 1 AND 10000
ORDER BY [Id] DESC
OPTION  ( QUERYTRACEON 8649, RECOMPILE );


SELECT  [Id] ,
        [dbo].[ScalarValueReturner]([Id])
FROM    [dbo].[Users]
WHERE   [Id] BETWEEN 1 AND 10000
OPTION  ( QUERYTRACEON 8649, RECOMPILE );

SELECT  [Id] ,
        [ca].[Returner]
FROM    [dbo].[Users]
CROSS APPLY [dbo].[MSTVFValueReturner]([Id]) AS [ca]
WHERE   [Id] BETWEEN 1 AND 10000
OPTION  ( QUERYTRACEON 8649, RECOMPILE );

DECLARE @t TABLE ( [Id] INT );
INSERT  @t
SELECT  [Id]
FROM    [dbo].[Users]
WHERE   [Id] BETWEEN 1 AND 10000
OPTION  ( QUERYTRACEON 8649, RECOMPILE );

SELECT TOP ( 10000 )
        [u].[Id] ,
        [ca].[Id] AS [Returner]
FROM    [dbo].[Users] AS [u]
CROSS APPLY ( SELECT    [u2].[Id]
              FROM      [dbo].[Users] AS [u2]
              WHERE     [u2].[Id] = [u].[Id] ) AS [ca]
WHERE   [u].[Id] BETWEEN 1 AND 10000
OPTION  ( QUERYTRACEON 8649, RECOMPILE );

SELECT  AVG(CAST([dt].[Reputation] AS BIGINT)) AS [WhateverMan]
FROM    ( SELECT    [u].[Id] ,
                    [u].[Reputation]
          FROM      [dbo].[Users] [u]
          JOIN      [dbo].[Posts] [p]
          ON        [u].[Id] = [p].[OwnerUserId]
          WHERE     [p].[OwnerUserId] > 0
                    AND [u].[Reputation] > 0 ) AS [dt]
WHERE   [dt].[Id] BETWEEN 1 AND 10000
OPTION  ( QUERYTRACEON 8649, RECOMPILE );

SELECT  [Id] ,
        ROW_NUMBER() OVER ( PARTITION BY [Id] ORDER BY [Id] ) AS [NoParallel]
FROM    [dbo].[Users]
WHERE   [Id] BETWEEN 1 AND 10000
OPTION  ( QUERYTRACEON 8649, RECOMPILE );


SELECT TOP ( 10000 )
        [sc1].[column_id]
FROM    [master].[sys].[columns] AS [sc1]
CROSS JOIN [master].[sys].[columns] AS [sc2]
CROSS JOIN [master].[sys].[columns] AS [sc3]
CROSS JOIN [master].[sys].[columns] AS [sc4]
CROSS JOIN [master].[sys].[columns] AS [sc5]
OPTION  ( QUERYTRACEON 8649, RECOMPILE );

CREATE TABLE [dbo].[NorseGods]
    (
      [GodID] INT NOT NULL ,
      [GodName] NVARCHAR(30) NOT NULL ,
      [Title] NVARCHAR(100) NOT NULL ,
      [ManagerID] INT NULL ,
      CONSTRAINT [PK_GodID] PRIMARY KEY CLUSTERED ( [GodID] )
    );

INSERT  INTO [dbo].[NorseGods]
        ( [GodID], [GodName], [Title], [ManagerID] )
VALUES  ( 1,    N'Odin',  N'War and stuff', NULL )
,       ( 2,  N'Thor',    N'Thunder, etc.', 1 )
,       ( 3,  N'Hel',     N'Underworld!',   2 )
,       ( 4,  N'Loki',    N'Tricksy',       3 )
,       ( 5,  N'Vali',    N'Payback',       3 )
,       ( 6,  N'Freyja',  N'Making babies', 2 )
,       ( 7,  N'Hoenir',  N'Quiet time',    6 )
,       ( 8,   N'Eir',    N'Feeling good',  2 )
,       ( 9,   N'Magni',  N'Weightlifting', 8 );

WITH    [Valhalla]
          AS ( SELECT   [ManagerID] ,
                        [GodID] ,
            [GodName],
                        [Title] ,
                        0 AS [MidgardLevel]
               FROM     [dbo].[NorseGods]
               WHERE    [ManagerID] IS NULL
               UNION ALL
               SELECT   [ng].[ManagerID] ,
                        [ng].[GodID] ,
            [ng].[GodName],
                        [ng].[Title] ,
                        [v].[MidgardLevel] + 1
               FROM     [dbo].[NorseGods] AS [ng]
               INNER JOIN [Valhalla] AS [v]
               ON       [ng].[ManagerID] = [v].[GodID]
             )
    SELECT  [Valhalla].[ManagerID] ,
            [Valhalla].[GodID] ,
      [Valhalla].[GodName] ,
            [Valhalla].[Title] ,
            [Valhalla].[MidgardLevel]
    FROM    [Valhalla]
    ORDER BY [Valhalla].[ManagerID]
  OPTION (QUERYTRACEON 8649, RECOMPILE)

Filtered Indexes: Just Add Includes

I found a quirky thing recently

While playing with filtered indexes, I noticed something odd. By ‘playing with’ I mean ‘calling them horrible names’ and ‘admiring the way other platforms implemented them‘.

I sort of wrote about a similar topic in discussing indexing for windowing functions. It turns out that a recent annoyance could also be solved by putting the column my filter expression is predicated on in the included columns definition. That’s the fanciest sentence I’ve ever written, BTW. If you want more, don’t get your hopes up.

Ready for Horrible

Let’s create our initial index. As usual, we’re using the Stack Overflow database. We’ll look at a small group of users who have a Reputation over 400k. I dunno, it’s a nice number. There are like 8 of them.

CREATE UNIQUE NONCLUSTERED INDEX [Users_400k_Club] 
ON dbo.[Users] ([DisplayName], [Id]) 
WHERE [Reputation] > 400000;

With that in place, we’ll run some queries that should make excellent use of our thoughtful and considerate index.

--Will I Nill I?
SELECT [u].[Id], [u].[DisplayName]
FROM [dbo].[Users] AS [u]
WHERE [u].[Reputation] > 400000

SELECT [u].[Id], [u].[DisplayName]
FROM [dbo].[Users] AS [u]
WHERE [u].[Reputation] > 400000 AND [u].[Reputation] < 450000

SELECT [u].[Id], [u].[DisplayName]
FROM [dbo].[Users] AS [u]
WHERE [u].[Reputation] > 400001

SELECT [u].[Id], [u].[DisplayName]
FROM [dbo].[Users] AS [u]
WHERE [u].[Reputation] > 500000

If you were a betting organism, which ones would you say use our index? Money on the table, folks! Step right up!

Yes, Sorta, No, No.

Yes, Sorta, No, No.

That didn’t go well at all. Only the first query really used it. The second query needed a key lookup to figure out the less than filter, and the last two not only ignored it, but told me I need to create an index. The nerve!

Send me your money

Let’s make our index better:

CREATE UNIQUE NONCLUSTERED INDEX [Users_400k_Club] 
ON dbo.[Users] ([DisplayName], [Id])
INCLUDE([Reputation]) 
WHERE [Reputation] > 400000
WITH (DROP_EXISTING = ON)
;

Run those queries again. You don’t even have to recompile them.

Can't you tell by the way I run every time you make eyes at me?

Can’t you tell by the way I run every time you make eyes at me?

They all magically found a way to use our New and Improved index.

What was the point?

When I first started caring about indexes, and filtering them, I would get so mad when these precious little Bloody Mary recipes didn’t get used.

I followed all the rules!
There were no errors!

But why oh why didn’t SQL use my filtered indexes for even smaller subsets of the filter condition? It seemed insane to me that SQL would know the filter for the index is on (x > y), but wouldn’t use them even if (z > x).

The solution was to put the filtered column in the include list. This lets SQL generate statistics on the column, and much like getting rid of the predicate key lookup, allows you to search within the filtered index subset for even more specific information.

css.php